TechWeb

IBM Answers Big Data Competition

Nov 06, 2013 (04:11 AM EST)

Read the Original Article at http://www.informationweek.com/news/showArticle.jhtml?articleID=240163563


It's easy to lose track of the news streaming out of a large-scale event like this week's 11,000-attendee IBM Information On Demand (IOD) conference in Las Vegas. It's all the more difficult given IBM's penchant for spinning high-concept yarns about "smart," "predictive" and "cognitive" capabilities that can make sense of the unfathomable "2.5 quintillion bytes of data" that IBM says the world generates each day.

In contrast to Microsoft, Oracle and SAP, IBM is much less inclined to talk about discrete products than it is capabilities that can be assembled into solutions with the help of IBM Global Business Services consultants with deep industry expertise.

But announcements are kind of obligatory at big annual tech events, and IBM served up plenty of them at IOD, including a mix of recently released and soon-to-be-released big data and analytics services and capabilities. Here are key highlights of what's new, what's coming and what distinguishes IBM's offerings from similar-sounding offerings that already exist.

[ Want more on IBM's latest cloud infrastructure moves? Read IBM Shifts SmartCloud Customers To SoftLayer. ]

What's New

IBM SmartCloud Analytics Predictive Insights is software aimed at transforming the high-scale machine data spinning out of IT systems -- networks, servers, storage systems, applications and so on -- into business intelligence. In the past, these log files and event streams were either used for simplistic, stove-pipe monitoring and diagnosis or they were entirely ignored.

In the big data era, some have realized that IT monitoring and event data might reveal leading indicators that can help IT anticipate and prevent problems rather than diagnose failures after the fact. Where many IT monitoring systems are all about setting thresholds and alerts for one system at a time, the idea behind Predictive Insights is to combine large sets of information and find correlations and anomalies in data that yield predictive insights.

Consolidated Communications, a cable operator headquartered in Illinois, is using Predictive Insights to track some 80,000 streams of data across its systems to monitor the health of its video delivery network. By spotting anomalies that couldn't be seen by studying systems in isolation, the cable operator reports it has avoided service disruptions and related costs of approximately $300,000 per year.

Splunk has been a pioneer in doing big data analysis across myriad IT system sources, but IBM says the Predictive Insights service is different from Splunk and other offerings in that it's an analytic-correlation and pattern-detection environment rather than an open-ended search-and-discovery tool. In other words, it surfaces conditions worthy of investigation on its own rather than relying on humans to drive the analysis.

Also on the "what's new" list are an update to IBM's SmartCloud Virtual Storage Center and three advances tied to Hadoop. The Storage Center is software for your data center that applies machine learning and analytics to virtualized storage environments to automate complex migration and storage-tiering decisions.

Storage choices typically revolve around the tradeoffs between fast data-access speeds and cost of capacity. By analyzing usage patterns, the Storage Center identifies the best storage choice for a given set of data, automatically making the change without admin assistance or interruptions to data access. Storage Center reportedly helped IBM itself reduce per-terabyte storage costs by 50% at the company's Boulder, Colo., data center.

The three Hadoop-related introductions are:

-- IBM PureData System for Hadoop. Released in September, this is IBM's Hadoop appliance incorporating the IBM BigInsights Hadoop distribution and complementary software. IBM says the difference from Apache, Cloudera, Hortonworks and other "standard" Hadoop deployments is four times faster performance thanks to cluster-management and high-performance computing capabilities adapted from IBM's Platform Computing acquisition.

-- InfoSphere Data Privacy for Hadoop. Coming later this quarter, this is a data-masking and data-activity-monitoring system that works across Hadoop as well as NoSQL and relational data sources, according to IBM. Data masking conceals sensitive data such as social security numbers at points of replication so companies can go beyond access controls to ensure data privacy. The data activity monitoring capability tells administrators who is accessing data and when data-access patterns are atypical -- even for authorized users.

-- InfoSphere Governance Dashboard. Another tool that works across multiple data sources including Hadoop, relational and non-relational databases, this dashboard gives data-management professionals an understanding of the lineage, state of quality and state of governance of data sets under management. The software is said to work hand-in-hand with ETL, data-privacy and data-security tools to ensure that governance policies are enforced.




Tableau Software might be the darling of data visualization, but IBM says it's working on a more powerful way to simplify data analysis through Project NEO, which it describes as bringing data discovery to the masses. Currently in IBM's labs, NEO starts by simplifying the hard part of data analysis, which is mixing together disparate data sets without creating a mess.

IBM says NEO makes the data-modeling step a self-service proposition for business users thanks to a built-in ontology engine that handles metadata mapping behind the scenes. Users simply drag and drop desired sources into the NEO framework -- from a data warehouse, operational systems, cloud sources like Salesforce.com, or third-party enrichment databases such as Acxiom or Experian -- and the ontology engine does the mapping work.

A number of vendors have introduced business user-friendly data-mashup capabilities -- Microsoft, Oracle Endeca and MicroStrategy to name a few -- but IBM's NEO project blend with natural-language query and data visualization is unique.

Once a user chooses desired data sets, NEO's next trick is natural-language query, whereby users can ask questions and make requests in plain English, such as: "Show me the top sellers by region," "Who are the top salespeople?" or "List top customers in the Northeast." The NEO technology automatically pulls in the right data sources and presents the requested data or analysis in a suitable visualization.

[ Want more on IBM's BLU Acceleration in-memory option? Read Inside IBM's Big Data, Hadoop Moves. ]

NEO's final trick is displaying a series of highlights that cut across the data selected. For example, in addition to the requested visualization, you'll see small visualizations across the top of the interface highlighting related insights such as top sellers, bottom sellers, top customers, average customer spend or distinct counts, such as number of customers or number of sales by region. Built-in algorithms continue to surface new highlights as you explore and drill down in the primary data-visualization window. If one of the highlights grabs your interest, you simply click on the item and it moves to the center of analysis for drill-down exploration.

Neo will show up as early as January as a beta release before becoming generally available as part of a Cognos release in mid 2014. External data sources are expected to include Salesforce.com, Excel/.CSV uploads and popular third-party enrichment-data sources.

In another preview featured at the Information On Demand event, IBM demonstrated BLU Acceleration for Cloud. This is a coming, cloud-based appearance of IBM BLU Acceleration for DB2 in-memory technology, which was announced in April and released in June.

BLU Acceleration for Cloud is not just a database-as-a-service, it's a complete in-memory data warehousing environment in the cloud. The obvious comparison here is the Amazon Web Services RedShift data warehousing service, but IBM insists there's no comparison given the in-memory, parallel processing and unique storage and compression capabilities of BLU. IBM says BLU can crunch 10 terabytes down to 1 terabyte, bring that 1 terabyte into memory, and effectively crunch it again down to 10 gigabytes. With data-skipping techniques, BLU then focuses on the 1 gigabyte that matters in a query without wading through the other 9 gigabytes of irrelevant data. The result is performance that is eight to 25 times faster than DB2 without BLU.

BLU compression will reduce cloud-storage costs while in-memory analysis and data skipping will ensure state-of-the-art performance. But the other distinction between BLU Acceleration For Cloud and other competitors is the inclusion of data warehousing tools including InfoSphere Data Click for data loading, InfoSphere data architect and data studio for data modeling, and IBM Cognos BI for ad hoc query, dashboarding, data-visualization and reporting. IBM did not disclose a release data for BLU Acceleration for Cloud.

Unprecedented Collection

As we've seen with other software giants including Microsoft and Oracle, IBM has often weighed in after the innovators on new market trends in recent years. But when it does weigh in, it brings as much of its enormous software portfolio as possible to bear. This is the pattern once again with this year's Information On Demand announcements.

Companies like Splunk were ahead of the game in connecting the dots across big data from IT systems. But IBM SmartCloud Analytics Predictive Insights, for example, blends IBM's InfoSphere Streams and Tivoli assets in with an analytics-driven, preemptive approach to IT systems analysis.

The likes of Tableau and Tibco Spotfire made data visualization a hot trend, but IBM is now simplifying back-end data blending as well as front-end analysis with natural-language query.

BLU wasn't the first option for in-memory analysis and it's following AWS and others into cloud-based data warehousing. But IBM has brought together an unprecedented collection of compression and performance-enhancing techniques in BLU, and it's adding complementary data-management and BI services for a more complete, single-vendor cloud environment.

In short, IBM's approach is to combine and refine the best of what's out there. The question is whether it is moving fast enough to prevent innovators from becoming entrenched among would-be customers.

Growth is a sign that new products and services are catching on, but growth has been sorely lacking in IBM's financial performance in recent quarters. True, it is hardware losses that have more than offset IBM's growth in categories including cloud and software. But IBM's software growth, at least, has been tepid compared to that of nimble innovators. We'll see if these new and coming offerings give it a much-needed shot in the arm.