Service-Level Agreements Come Of Age

Nov 28, 2008 (07:11 PM EST)

Read the Original Article at

In the late 1990s, functional service-level agreements were the holy grail for telecommunication service providers -- highly sought-after, but always just out of reach, as customers demanded high levels of service from their circuits and application services. Although most providers offered SLAs, few could were able to monitor them correctly or in real time.

Telecoms have strived to master SLAs, but that's only part of the story. An even more interesting and often unnoticed transformation is occurring as enterprise, government, and academic environments are also moving to SLA-driven services. Industries far removed from telecom now recognize that defining an end-to-end, customer-focused IT department is crucial to providing consistent, reliable, measurable, and usage-based service.

However, as was the case with telecom in the '90s, IT managers in these other sectors seldom have the right elements in place to generate and manage a successful SLA. The data that comprises an effective SLA -- service catalogs, defined processes, and holistic management and monitoring systems -- is still only a dream for many IT organizations.

InformationWeek Reports

Put another way, SLA-driven IT shops might be the way of the future, but many IT managers don't know where to start in the definition and management of SLAs. We will try to unravel some of that mystery here.

Getting Started With SLAs
When you think of SLAs, you might also think of penalties, contract termination, "free" services, and other woes that all too often were part of doing business with Internet service providers. It's little wonder that some organizations, attempting to avoid negative connotations, prefer to use the terms SLE (service-level expectation) or SLG (service-level goal).

But whatever they're called, SLAs do one thing: They define a specific level of service that is provided to a customer. These agreements can also define cost, usage levels, or other helpful data points that will allow both sides of the business (provider and user) to be on the same page regarding the level of overall service offered and received. Some examples of SLAs might be 99% availability, 48-hour server provisioning after a request is made, notification of an outage within five minutes of the occurrence, or security patch deployments within 24 hours of their release.

The first step in getting started with SLAs is definition of the service. What exactly are you offering to your customers? Applications? Network capability? IT services? Whatever they are, you should store them in a service catalog that is accessible to your customers. This can be as simple as a Word document or HTML pages; however, software vendors like Digital Fuel, NewScale, and others now offer service catalogs that enable customers to order IT services like they'd order a book from

After you define your services, you need to define the expectations of the user community in the form of service-level requirements. Without these, organizations can never measure and manage the user experience in a meaningful way.

Service-level requirements are a balance of customer desires and operational reality. Your customer may request 100% availability, but this might not be an operational reality. Before offering metrics, you need to take a hard look at several aspects of your organization. What tools do you have on hand to monitor the environment? I saw one organization that had committed to more than 300 SLAs, but only had tools to monitor about half of them. Reporting the rest of the SLAs required an army of people to collect and report on data every month.

Many SLAs will involve traditional fault and performance tools that need to provide end-to-end management capabilities of applications, or calculate availability of services. Software vendors such as BMC, CA, EMC, Hewlett-Packard, and IBM all have good monitoring and management solutions for larger organizations. Other vendors, including Ipswitch WhatsUp, Kace, Nimsoft, ScienceLogic,, and SolarWinds also provide enterprise monitoring and SLA reporting for IT environments of all sizes.

Calculating SLAs can be resource-intensive. Using a best-practices framework like ITIL, Six Sigma, eTOM, or other IT-centric process methodology will help maximize the efficiency here. Be sure to calculate the time it takes people to execute components of the SLA and include that in your overall time calculation. Automate your processes as much as possible; this can dramatically save time. Runbook automation products like BMC's RBA, HP's Opsware/PAS, NetIQ's Aegis, Opalis, and others can automate some components of the operations environment.

It's also critical to define, prioritize, and track the progress of each aspect of the SLA, and to monitor SLA operational level requirements (OLRs) for organizations such as suppliers (network, hardware, or application vendors). Different service providers may be involved in different parts of the agreement, so it's essential to ensure that they understand and are accountable for their impact on the end-user experience.

During this process, you should focus on prioritizing OLRs and limiting their scope to key success factors in service delivery. Defining too many OLRs makes management of the environment and SLA overwhelming and ultimately unproductive.

After you've defined expectations, you need to assess your ability to realistically meet those expectations. In many service provider environments -- to the dismay of IT -- sales or marketing will agree to an SLA to close a deal, then inform IT about the SLA after the fact. This situation rarely results in a happy customer. So, before committing to an SLA, assess your operational readiness and identify areas of improvement to ensure that the SLA strategy can be implemented and supported, both tactically and strategically.

Seven SLA Steps
  • Develop realistic agreements
  • Make sure your customers know what they're agreeing to
  • Map all of the data elements required to launch the service-level agreement
  • Deploy tools that can monitor SLA compliance
  • Put in thresholds to alert IT to issues before they impact an SLA
  • Develop automated SLA reports
  • Monitor the SLAs and seek ways to adjust and improve measurements
With realistic SLA services and expectations defined, the last step is reporting. Although many of the tools discussed here have integrated reporting engines, you may also look to products like Integrein, Managed Objects, Oblicore, or OpTier that focus on end-to-end reporting via integration of existing data sources. For complex environments, these tools focus on reporting, not on the underlying data collection. (Most vendors offer separate products for data collection.)

Regardless of the reporting tools used, customers should have full transparency into the metrics --ideally, in real time. Organizations that manage and monitor IT operations may cringe at this idea, but customers want to see what is going on and might not want to wait for aggregated monthly reports to be e-mailed to them.

Online executive dashboards, when implemented in the context of SLAs, provide management with focused, actionable views of real-time service assurance information. Understanding how to interpret the SLAs will also drive effective system implementation by feeding the development and training activities related to technology, workflows, and CRM that are required for successful implementation.

SLA Hurdles
In measuring services, the silos of technology necessary for delivering service, such as networks, servers, applications, firewalls, and other devices, need to be transparent. Unfortunately, these silos are typically operated and managed independently, making consolidated end-to-end views more difficult. Products like IBM's Tivoli Business System Manager or CA's Network and Systems Manager aim to intertwine these components to provide a single-service view, but it can be services-intensive to set up and manage these services if the components change. Still, for large environments, these types of tools are important for SLA management.

The use of a configuration management database (CMDB) or shared information data (SID) model can also help define configuration of items and the relationships these items have to the service levels. BMC's Atrium, CA's CMDB, IBM's Maximo, and Symantec's Alteris have substantial market share here. Another crop of vendors focus on populating the information to the CMDB. Tideway, IBM Tivoli's Application Dependency Discovery Manager, EMC's Smarts Application Discovery Manager, and other products discover application topology for the SLAs, then populate the CMDB.

Because silos are operated and managed independently, many organizations fail to correlate user expectations for service with the technologies that deliver the service. This disconnect creates disjointed, inefficient services -- and ultimately, dissatisfied users. A successful SLA strategy encompasses not only the technical aspects of the infrastructure, but also the operational processes involved in delivering service, such as change, configuration, incident, release, and problem management. Organizations must have established operational processes and workflows rolled out in concert with the SLA solution if they want to be successful.

Operations: The Missing SLA Link
A successful SLA strategy also hinges on organizations working to ensure that operators and engineers can resolve known outages and system failures. Consequently, the strategy must include the ability to collect configuration information on network and server assets, access customer information for business impact analysis, and provide data on all internal or external SLAs.

Understanding your culture, current operational structure, and the roles and responsibilities of key stakeholders are critical for successful SLAs. Many management tools allow you to set up proactive monitoring thresholds that can prevent the violation of SLAs by providing an early warning alert on key issues. With effective processes, operations can proactively reduce or prevent outages by monitoring service degradation throughout the infrastructure, and can restore service before the customer ever notices.

Customers also benefit because they can see real-time and historical service views, and the move to a customer-focused service management strategy will help increase customer satisfaction while also reducing the frustration that's often part of the job for IT organizations.

Hope, Change, And SLAs
Although the move to SLA and service-centric IT organizations has been a progression over the past 10 years, it has moved into the mainstream of IT management. An entire software industry has been born around SLA management, and the proliferation of ITIL has helped more people use SLAs.

Managing and reporting on SLAs is hard work. In the end, though, IT exists to serve the needs of the users. I remember the first time I heard the joke that IT would be easy if not for the users.

A service-centric approach to SLAs really means a user-centric approach. Ensuring that users know what services IT makes available, what level of service is provided, and the ability to verify the level of service offered can propel many IT shops into the stratosphere and make the overall relationship between IT and the users radically different than it is in many organizations I see. This reality need not be an impossible dream: With the right tools, processes, and planning, it's possible today.

Impact Assessment: Internal Service-Level Agreements

(click image for larger view)

Michael Biddick is CTO of Windward IT Solutions, a firm that helps organizations improve operational efficiency. He is also a contributing editor for InformationWeek. Write to him at