Content Pipeline

Oct 31, 2004 (10:10 AM EST)

Read the Original Article at

In plumbing, a pipe that's not connected to anything else isn't much use. But a main line that's connected to a network of subsidiary pipes is powerful because it can distribute or gather water throughout a building, town or city.

Similarly, in large organizations with multiple and often disparate content and document management repositories, it's increasingly important to have interconnections to varied content so researchers, product developers, administrators, marketers and customers can gain access to all unstructured information.

Mergers, acquisitions and legacy departmental initiatives have left many organizations with a mishmash of management systems from suppliers including IBM, EMC, FileNet, OpenText, Vignette, Interwoven and dozens of smaller companies, many of which have been acquired by these larger enterprise content management (ECM) vendors. In fact, 78 percent of companies have more than one content repository and 43 percent have six or more, according to Forrester research.

Here are six common scenarios that call for integration of content from previously isolated sources:

  • You want to standardize on a new content management platform but still make use of legacy systems.
  • New regulations or laws such as the Sarbanes-Oxley Act require your company to provide access control and audit trails for certain classes of documents stored in incompatible systems.
  • Your customer service or help desk department needs to handle customer queries more efficiently. Thus, customer service reps need unified access to customer correspondence, e-mails, technical product documents, transcripts of voice conversations and other information in order to address inquiries in a single call.
  • A manager preparing for a product launch needs to access technical publications, marketing materials from previous campaigns and regulatory submissions housed in various repositories.
  • A merger or acquisition brings together two companies that need to provide access to the multiple content stores of the combined organization.
  • Your company needs to build new business processes that require access to information managed by multiple systems.

Executive Summary

Four Paths to Content

In the aftermath of mergers, acquisitions and departmental document and content management initiatives, many companies are bogged down with isolated and incompatible repositories. To avoid the cost and complexity of ripping out legacy systems and migrating content, organizations have turned to content integration software, federated search and portals to create connections between sources. The most recent trend is a movement toward an enterprise information integration (EII) approach that addresses both data and content integration issues.

Which approach is right for you? Content integration software enables content editing, updating and workflow. Federated search provides simpler content and document access. Portals aggregate content with basic on-screen integration. EII is the choice when processes require access to structured and unstructured data. Most large enterprises will require a combination of approaches to foster free-flowing information.

Companies used to address these needs by building custom integrations or migrating content to the preferred management solution. These projects often turned out to be expensive and time-consuming. In a merger situation, for example, "It may not be practical to migrate a terabyte of content from one application to another," says Gartner analyst Ken Chin. "Content integration makes a lot of sense."

Content integration is both a specific breed of middleware software and, to some, a catch-all term for a range of approaches to providing unified access to content. The alternatives include federated search, portals and a new hybrid approach called enterprise information integration (EII). As this article explains, each of these approaches has a best fit with different scenarios and user needs. You may find that the best solution lies in combining techniques and technologies to your best advantage.

Content Integration: Middleware for Content

Content integration software acts as a middle layer between incompatible content management systems and external applications. It provides a single application programming interface that adapts to each content repository as well as schema mapping to handle variations in tagging among them.

Content integration software supports applications such as customer service, product marketing, contract management and others that require access to unstructured content, and it enables related processes by bringing needed content into workflows. Two-way integration lets users edit and update content as well as read it.

Content integration software is relatively inexpensive and provides prebuilt, quickly deployable connectors to popular document and content management systems as well as portals and enterprise applications. A typical installation costs around $300,000 — compared to enterprise application integration projects, which often cost $1 million or more.

Be forewarned that content integration software won't plug and play with any content management system. If you have content repositories that aren't among the top sellers, you may still need to build connectors and customize the solution to your needs.

There are few independent content integration software suppliers, and the field recently narrowed with IBM's October acquisition of Venetica. Among the first and most widely deployed content integration products, Venetica's VeniceBridge software offers connections to more than 20 leading content stores, including IBM's DB2 Content Manager products.

IBM says VeniceBridge will become part of its DB2 Information Integrator product, which is EII software designed to unite structured data and unstructured content sources .

The field of independent, content-to-and-from-anywhere integration suppliers now includes only Context Media and Windfire. Most other content integration products are designed to bring content into a single management system. Several ECM vendors offer software that helps you integrate disparate repositories into their management regimes. FileNet and Interwoven, for example, use VeniceBridge-developed technology to provide integration between these systems and third-party content management silos. Mobius, Vignette, Day, Documentum and others similarly use homegrown content integration modules to gain access to content in other systems.

The development of content integration software parallels a trend of the mid 1990s in which middleware technology overtook databases as the cornerstone of IT architecture, according to Ovum analyst Laurent Lachal. "In increasingly distributed environments, it's the plumbing that counts more than the water tank," he says.

Trying to hunt down content integration software users is a frustrating experience as few companies are aware of the technology and fewer still actually use it or see a need for it.

"Right now organizations are trying to develop enterprise content management strategies," says Forrester analyst Connie Moore. "CIOs and information architects are starting ECM projects and looking at lots of issues: governance, architecture, where to start, how to assign metadata, what's the scope, how do they develop requirements across the organization. The next piece behind that is 'how do I integrate my existing systems with the new [systems]?'" Moore predicts content integration will come into its own in 12 to 18 months as companies complete these ECM initiatives.

CMP Media LLC, which publishes Transform along with InformationWeek, Network Computing and 33 other technology and health care magazines, had a content integration challenge. Magazine groups in different locations were using four incompatible content management systems. Editors at the Manhasset headquarters were posting content to their magazines' Web sites via Interwoven; the West Coast-based software development group was using a semi-customized program called Nucleus; the electronics group was using a homegrown system called Mason/CopyDesk; another division was using a program called Continent.

CMP planned to standardize on Interwoven, but "in advance of anything else, we decided to aggregate all the CMP content into one spot and then make it available to each publication and create an archive for CMP, so [we] could search for anything on voiceover IP across the entire organization," says Howard Roth, independent consultant to CMP. " We also wanted [to normalize] content to a common XML scheme, categorize by subject and company we're writing about and capture for each article a secure private link that we could license to customers."

In 2003, the company created an enterprisewide taxonomy and began using a hosted service from Context Media. Once each hour, CMP now feeds new content to the ASP service, which normalizes the XML and applies taxonomy software from Nstein that groups the information into 1,600 categories.

Using CMP-designed XML style sheets, Context Media distributes the content by e-mail, FTP, RSS and the Web. Abstracts or descriptions crafted for magazines sites (sometimes simply the first paragraph of an article) are repurposed in the syndicated offerings.

The content syndication service, called Acumen, was rolled out in May and so far has four corporate customers. Personal subscribers are offered daily customized newsletters called InfoPaks that provide very specific slices of technology coverage, such as all articles on nanotechnology. CMP also sells its feeds to academic libraries.

The virtual archive was recently made available internally and should prove useful to writers who want to know how CMP has covered a subject or organization. With a user name and password, writers can search all CMP articles published in the last three years.

Will RSS eventually compete with such offerings? "The answer is yes, it probably will as it develops," Roth says. "But I use RSS and I'm still pretty unimpressed with the value of the content I wind up getting — it tends to be too broad and has too many errors." CMP does provide an RSS version of its syndicated content.

For now, the company is focusing on new ways of packaging and delivering content over its virtual repository.

For a view of how content looks today, click here. For a view of how content integration will look in the future, click here.

Crawling Content with Federated Search

When you simply need to give people enterprisewide access to information, a search-based "integration" approach may be all you need. The crucial question is, do your users need to edit and update content or bring it into workflows? If not, a search engine can crawl repositories, intranets and the Web and bring your users the content they're after. Federated search presents one interface and acts as an intermediary to different content stores, deploying searches, collecting responses and displaying a single list of results.

The federated search approach was a perfect fit for Cleary, Gottlieb, Steen & Hamilton, a law firm with more than 800 lawyers working out of 12 offices in major cities including New York, London, Paris and Rome. Several offices have the same types of practices, so there's a frequent need to share the same content and documents, including precedents and past deal documents.

"It's very common for lawyers in multiple offices to work together on the same matter," says Brent Miller, director of knowledge management. "We have an extensive knowledge management effort to organize and collect useful information that spans across the offices."

The firm stores documents and 20 practice-specific threaded discussion forums in Lotus Notes databases (the forums are tightly integrated with e-mail so the firm's many Blackberry users can participate). Other documents are stored in an Interwoven iManage document management system, within which the firm created a "virtual fileroom" for important e-mail messages.

Searching across the individual iManage libraries in each of the 12 offices was impossible, largely due to network performance issues. In addition, the firm needed to provide easy access to thousands of intranet pages and content on SEC and selected other Web sites.

Although the firm had the tools and human resources to handle some categorization of information, users lacked the ability to perform open-ended searches across all content stores. In September 2003, Cleary, Gottlieb chose MindServer federated search software from Recommind to crawl all of these sources.

"Because of the dominance of Lexis/Nexis and Westlaw, every lawyer has been taught how to use search tools," Miller says. "Having strong search capabilities across collections is a natural and necessary tool that lawyers expect."

The MindServer project is still in its pilot stage, but Miller expects the federated search to be rolled out to all 2,000 users by the end of the year. Miller declined to divulge the firm's expenditures, but says the search capabilities were required.

"You have to give users tools to access information themselves so they're not dependent on others," Miller says. "It's not that there will be a tangible ROI that we can document, it's a necessity to make this data accessible."

Portals Open Doorways into Content

Portal software provides doorways (or windows depending on which metaphor you prefer) to multiple applications. "Portlets" or "gadgets" for different applications and sources deliver portions of data, content or code to the user's desktop. The search function in most portals provides read-only, one-way access to content. While the content isn't integrated, it does coexist in a visual presentation. In some applications, such as executive dashboards of company performance, this on-screen integration is all users need.

Portal technology is an alternative to content integration when you need to access and aggregate information, yet Lachal of Ovum says the two approaches can also be complementary.

"Portal technology can rely on content integration software, enterprise application integration and EII to more easily access back-end systems," he says.

Portals often provide a richer user experience than content management software alone, yet individual portlets are limited in that they only display information from a particular back-end repository. What portlets need to become true content integration tools is a layer of abstraction so you don't have to write one portlet for each back-end management system you want to expose through the portal. With abstraction, you could create a universal integration portlet.

For now, portal vendors including IBM, Plumtree, Tibco, BEA and others offer plentiful integrations with content management systems. And BEA's WebLogic Portal has a Virtual Content Repository that works with software from FileNet, Documentum and FatWire.

Westinghouse deployed a Plumtree portal four years ago in part to give customers access to documents over its extranet, eliminating mailing of CDs, e-mailing and file transfers. More recently the company has begun turning the portal inward to provide 700 employees working all over the world with access to diverse content. Thus far, about 170,000 documents in Windows NT and Unix network shared files, SQL Server databases and other databases have been exposed through the portal.

"We have huge volumes of information on network file shares that are difficult to access — you have to have permissions, you have to know where everything lives, you have to have a map to them," says software engineer Darlene Daverio, adding that the portal provides faster, easier access.

Westinghouse stores more than a million documents in EMC's Documentum software, the corporate standard for certain classes of documents. The portal normally provides access to those documents and others stored in Lotus Notes, but those connections recently broke when the company upgraded to the latest version of the Plumtree portal. Westinghouse is waiting for new portlets to get these repositories plugged back into the portal. This hiccup aside, the portal has aggregated huge volumes of disparate content in one place.

"The end user doesn't have to know where [content] lives and how to navigate the GUI of each of those back-end systems," Daverio says. "Now we're able to use a standard interface for all different systems and search through all of them from one place. I can do a keyword search against NT file shares, Documentum and Web sites all at the same time. It would be [much more] labor intensive to search those independently."

EII Unites Structured and Unstructured Data

Sometimes your need to access information goes beyond unstructured content; you need to pull information from databases, too. For example, in a mortgage process you may need to access documents such as the loan application and home appraisal report as well as database information such as credit bureau data, current rates and customer account status. In such cases EII software may come into play. EII isn't exactly new; rather, it's the latest incarnation of database integration software formerly known as "heterogeneous distributed database," "virtual centralized database," "federated database," "data integration system" or "enterprise data access."

Ovum defines EII as "a packaged data federation middleware solution that provides access to, and a unified view of, multiple types of data sources."

Just as content integration software forms a middleware layer for accessing different content repositories, EII provides a middleware layer for accessing data from different databases. Some EII vendors are moving toward extending access to unstructured information. Although none have quite succeeded in spanning both data types, some vendors have added free-text indexing that lets users search unstructured data by keywords or concepts.

"Historically, companies have focused on the data side, [yet] content is critical to many business processes," says Moore of Forrester, adding that the next level of progression is to integrate both of those silos using the same tool.

Providers of EII that are embracing content include Actuate, Journee, SAP and, with its Venetica purchase, IBM. EMC is also building an EII platform as part of its Information Lifecycle Management framework.

"Our customers were saying 'it's not just about content, it's about information," says Stuart Levinson, president of Venetica. "Customers of IBM's [heretofore database-oriented] DB2 Information Integrator were saying, 'it's not just about structured information.'"

By 2007, a new "organic information abstraction" (OIA) layer will emerge to connect separate environments of data, content and text, according to a recent Forrester report by Laura Orlov. "OIA will provide a set of services and metadata that harness insight from these assets — without complex and brittle customization."

Firms will need a common view across structured and unstructured information to drive new customer service, compliance, and sales and marketing applications, Forrester contends, and it advocates building up expertise in areas such as taxonomy and metadata development that cross both domains.

Bringing Together the Pipelines

While each of these approaches provides a piece of the content integration puzzle, no one option will address all of your information access challenges. Large organizations will need several of these technologies in combination to achieve a free flow of information throughout the enterprise.

"All these paths deliver parts of the answer," says Lachal of Ovum. "The hard work is not so much understanding these technologies and pitting them against each other, but realizing the extent to which mixing a little this and a little that will enable you to fit your needs and budget constraints."

Although each approach might work on its own if your needs are very specific, in the grand scheme of structured and unstructured data, you'll need to understand how all these technologies work together to find the right point of convergence.

Want a content integration product chart? Get one here.

[This same chart is also available as a PDF: Download Here]

Web Links

The Content Integration Imperative, by Forrester analyst Connie Moore, downloads/ The_Content_ Integration_Imperative.pdf

IBM Defends ECM Leadership, Ovum Comments, content/c,49666

Doculabs' Analysis of IBM's Purchase of Venetica Corporation, research/ lspeaks_ibm-venetica.htm

Read more:
Look to Standards for Content Integration - At least three standards initiatives underway are aimed at meeting interoperability and reuse challenges.