Apr 25, 2008 (08:04 PM EDT)
Is User Monitoring The Next Wave In Enterprise App Management?
Read the Original Article at InformationWeek
Large enterprises have spent mountains of cash tracking how their complex, multitier applications perform while traversing the network. But unless you're managing end-user data at the packet level, you're not getting the full picture of how customers and employees are affected by performance degradations--and when it comes right down to it, all the high-level metrics in the world don't matter if end users are lighting up the help desk complaining of sluggish performance while customers abandon their shopping carts.
Technologies exist to monitor every aspect of traffic, from keystroke to database, and many of the 30 or so software vendors we talked to for this article believe end-user performance monitoring will be the next big wave in enterprise application management spending.
We're not so sure.
First, few organizations are ready for this shift. We've spent a fortune over the decades on infrastructure management systems that do little to capture the user experience. Moreover, end-user-centric management is harder than conventional infrastructure management. Vendors will try to contradict this, pitching appliances that can be set up in a snap. Don't be fooled: The real work comes in correlating and aggregating mountains of logs, as well as capturing and incorporating app-specific data that requires client- and server-side agents. And if SOA, mashups, and Web 2.0 figure prominently in your network, be aware that management vendors have been slow to adopt standards that use these technologies to correlate complex data streams.
Meanwhile, IT organizations face pressure to limit staff growth, even as the complexity of the architecture increases. One IT manager told us that he sees determining the cause of application problems as more art than science. In his organization, the help desk is the first indicator of a problem. Only after enough calls come in from irate users does troubleshooting begin. By then, there isn't time to coolly assess the impact or weigh this issue versus other problems already in the queue. IT is in firefighting mode.
Most user management and packet-capture technologies collect massive amounts of network data, all the better for troubleshooting. But because they typically don't have agents installed on user desktops or the app server, they provide only limited overall visibility into the cause of a slowdown.
Alternative approaches that use agents, however, also present problems. An agent may be the best way to capture the actual user experience, but that accuracy needs to be weighed against the higher cost and maintenance requirements of this approach. To help you decide which is the best route, here's a rundown of user-monitoring methods.
CAPTURE THE PACKET
Network capture technology has been around for more than 20 years and is found in offerings from Coradiant, Fluke Networks, NetQoS, and others. These products measure traffic from the point in the network where the device is installed to where the TCP protocol is terminated, analyzing end-user activity in real time.
When considering user management tools, drill down to how data is actually captured. Coradiant's TrueSight, for example, is an appliance that can be deployed quickly without changes to the network or application infrastructure. The TrueSight Incident Management system connects via a network tap or through a spanning port on a network switch and captures HTTP/HTTPS request/response pairs for every Web application. The company recently added an automated appliance that baselines all application activity, then separates real issues from noisy Web traffic and prioritizes problems. TrueSight stands out because of its ability to analyze low-level metrics, such as TCP retransmissions, out-of-order packets, and SSL decryption, making it extraordinarily useful for Web operations teams.
Like TrueSight, Network Instruments' Observer uses packet capture to provide application-specific details, including error counts, types of transactions, failed transactions, and other user-specific metrics. Observer's ability to present traffic by TCP streams and reconstruct those streams lets you browse through the sequence of requests and responses and quickly jump to files, tables, or e-mails that were transferred. Observer also can monitor applications, such as VoIP, that depend on more than one TCP connection. For example, if VoIP analysis indicates call setup times are increasing, Observer will show whether the client, call manager, or network is causing the problem.
NetScout's nGenius monitors test response time of key business apps, providing a broad context for analyzing problems. It tests application traffic against service-level delivery and measurements for troubleshooting problems with end users. NetScout's approach provides a context for application-response time that includes traffic volume, utilization, error conditions, alarms, hosts, and packet captures. However, like many of these appliances, once a performance problem is detected, nGenius doesn't offer server agents to indicate which app component is at fault.
Opnet Technologies, with Ace Live, and Fluke Networks, with Visual Performance Manager, also can help IT uncover the root cause of application performance problems that impact end users. Ace Live provides visibility for all transactions and users across the enterprise, with detailed real-time and historical information about performance, utilization, route quality, ISP performance, and end-user response times.
Fluke sees client traffic to Web servers, requests to application servers, and subsequent queries to database servers. In addition to these n-tier applications, it also watches performance to and from streaming media and DNS servers.
These vendors are working to build more analytics and root-cause determination into their product modules. However, while it doesn't get any easier than installing a bunch of appliances in your network, the high cost of these devices--often more than half a million dollars--typically relegates them to environments where the volume of packet capture justifies the technology's cost.
BEYOND NETWORK CAPTURE
Network packet capture products' primary strength is in measuring transaction response times. But there are other ways to approach end-user monitoring.
Synthetic user monitoring: Some vendors, including Nimsoft and Precise Software, employ synthetic user traffic to simulate user data. While sometimes criticized because it's not "actual" user data, synthetic monitoring does have its place in preventive maintenance. For example, synthetic transaction tools are useful to ensure that your applications are working, even if no one is interacting with them at that moment. Think of synthetic transaction monitoring as more proactive, while real-time user monitoring with network packet capture or agents tends to be reactive. There is a role for both in organizations that need to ensure near 100% availability of Web-based apps.
User agents: Knoa Software, PremiTech, Serden Technologies, and Symphoniq rely on agents. This method of data collection is independent of the network protocol and relies on desktop agents, which are generally better at reflecting the true end-user experience than network capture. It's also true that most IT shops will balk at installing additional desktop agents, but stop and think before you discount this option--these clients provide visibility into system errors and the cumbersome user interfaces that cause navigation problems.
Combination approach: Compuware, CA-Wily, and HP-Mercury are among the relatively few vendors that offer several data collection choices for end-user management, including agentless and synthetic end-user monitoring combined with deep network and server monitoring. This flexibility will help large enterprises that are concerned with scale and custom requirements and don't mind a little extra configuration and maintenance. These combined systems provide back-end transactional analysis, monitoring not only front-end user transactions, but also how those transactions are being delivered within the back-end application architecture, between the Web server and application server, and to the database server.
Since user monitoring is central to application-centric service-level agreements, in-depth end-to-end transaction visibility is required for business process and component-level application and infrastructure. Compuware's ClientVantage focuses on both Web-based (HTTP/HTTPS) and many non-Web applications, including Citrix and Oracle Forms. Like Compuware, CA-Wily can help by offering a broad mix of user management tools. Wily will correlate end-user SLAs and systemwide SLAs into overall health reports. This helps organizations understand the relationship between back-end system downtime and the end-user experience.
Wily's Customer Experience Manager, or CEM, can detect when an end user experiences a performance problem or transaction error. Alerts are generated based on specific errors or group thresholds for users or business transactions; once a transaction violates a threshold, CEM captures it and aggregates similar defective transactions into an incident.
(click image for larger view)