TechWeb

Low-Cost Options For Predictive Analytics Challenge SAS, IBM

Jul 26, 2011 (03:07 PM EDT)

Read the Original Article at http://www.informationweek.com/news/showArticle.jhtml?articleID=231002687


Rear-view-mirror reporting of financial and operating performance is old news; forward-looking analytics are where it's at. Sharp companies know this, but graduating from the basics of business intelligence to advanced analytics requires expertise among your people, and software for statistical modeling, data analysis, and scoring.

Can your organization afford analytic expertise and software?

We've covered the people dimension of this issue more than once, addressing both the salary trends among analytics professionals and the need for better analytics education.

On the software side of the question, the vendor with the largest marketshare by far in advanced analytic tools is SAS, which held 35.2% of the market in 2010, according to IDC figures. IBM with its SPSS unit held the second highest share at 16.2% while Microsoft was third with just 1.9% of the market. The also-ran commercial competitors tend to slam SAS in particular as the highest-cost provider. At least one comparison puts the vendor's statistical package at the top of the price heap at about $6,000 per user for first year and about half that in subsequent years. That's multiples of the cost of some of the other vendors' software.

Among the score of smaller vendors that each have less than 1% of the market are startups Alpine Data Labs and Revolution Analytics, both of which are using low software prices among their competitive weapons, as they try to grab share of a market for advanced analytic tools that grew 8.7% last year, according to IDC stats.

Alpine Data Lab's starting price is $100,000 per year for 20 users for a subscription, but that's for a big-data in-database deployment that's tough to compare to the other two. Revolution says a $25,000 per-year deployment on a low-end server will comfortably support 8 to 10 years. Without sharing any of the prices quoted by others, I asked SAS for its latest entry-level pricing structure and the figures weren't as pricey as the competitors suggest. More on that below.

Alpine Miner

Founded last year, Alpine was incubated and spun out of Greenplum, the massively parallel processing (MPP) database vendor acquired by EMC last year. EMC is now among a handful of venture capital investors in the company, which entered the U.S. market in May.

The company's product is Alpine Miner, and given the company's MPP heritage, it's no surprise that the emphasis is on in-database processing. As the name suggests, this approach handles iterative modeling and scoring steps inside the database, taking advantage of MPP processing power and avoiding cumbersome and time-consuming movement of large data sets from the database off to a separate analytic server for analysis, and then copying results back to the database.

The Greenplum database (now featured in EMC-powered appliances) is of course one of the databases Alpine Miner works with, but a 2.0 version of the product released last week added compatibility with Oracle Exadata and the PostgreSQL open-source database. It’s a Java-based product, and the upgrade also added time-series analyses, support for repeatable user-defined functions, and support for C or R programming, in addition to Java.

Alpine says its hallmark is ease of use, with a visual interface that lets "business users" select icons representing various analytic functions that can be run within the database against large data sets. No need for writing code or many of the kludgy steps associated with rival analytic products, says Alpine. I'm guessing those business users will have to be a least hip to the basic concepts and methods of statistical and predictive analytics, even if they don't have to be hard-core data jockeys or code slingers. Alpine Miner costs $100,000 for a 15-user perpetual license plus 22% maintenance per year. If you prefer expensing this cost, you can also subscribe to the on-premises software for $100,000 per year for 20 licenses with no maintenance fees.

Just how broad and deep can a newbie product be? There are 15 common modeling techniques available, including sampling, logistic regression, linear regression, decision tree, neural network, time series, and lift analyses. In the scoring vein, predictive operators apply logistic and linear regression, naive bayes, tree, and neural network models to dataset prediction.

Alpine Miner is essentially a "greatest hits" selection of the most popular algorithms and functions. There's little doubt it will add more functionality, but that's it for now.

SAS's entry-level SAS’s Analytics Pro product (detailed below for price comparison) supports 16 statistical methods as well as a battery of data-visualization, mapping, and plotting options. But when you're ready to go deeper, SAS has more than 200 software products and applications, with lots more algorithms and techniques available.

The open source R programming language for statistical computing is even more extensive, with more than 3,000 community-developed analytical applications and more than 4,000 of user-created packages available with specialized statistical techniques, graphical devices, import/export capabilities, and reporting tools.




Revolution Analytics

Four-year-old Revolution Analytics provides commercial support for the R programming language as well as tools, integration, consulting and training aimed at making an open-source product enterprise ready.

Riding R's coattails is a good idea. It's used by 43% of data miners, according to the Rexer's Annual Data Mining Survey, and it has been embraced and supported by commercial software vendors including SAS, SPSS, InformationBuilders, and Tibco.

Revolution's core Revolution R Enterprise deployment is said to improve performance over standard R by adding support for multithreading when using multi-processor and multi-core hardware. A RevoScaleR package offers widely-used statistical algorithms optimized for big-data analysis (meaning tens of terabytes or more) in clustered environments such as Microsoft Windows HPC Server. Revolution says this high-performance computing approach on commodity hardware far surpasses the speed and scalability of conventional analytic servers at a fraction of the cost.

The R community offers plenty of ready-to-run statistical and data-analysis techniques and analytic applications incorporating those methods. Revolution Analytics, which is run by a bunch of former SPSS and SAS executives, provides development, debugging, and deployment tools as well as the aforementioned support and consulting to keep your people productive.

As far as MPP environments are concerned, Revolution runs in database on IBM Netezza. That's a very short list, but the company says it's working on similar partnerships with other leading MPP vendors.

A recent distinction for Revolution is the release of R extension packages to work with Hadoop, the open-source storage and data-processing environment. The packages provide connectivity to the HDFS file system and HBase as well as Hadoop streaming so you can create MapReduce jobs in R for iterative, super-high-scale data processing on Hadoop. MapReduce is well suited for processing large-scale unstructured information such as all the comments associated with your brand in Facebook, Twitter or other social networks.

Revolution says the cost of deployments depend on the power and capacity of the server, with deployment on a small, eight-core server costing $25,000 per year, including maintenance support. There's no limit on the number of users, but the company says a conservative approach would reserve one core per user, for a total of eight to ten users.

A SAS Comparison

Alpine and Revolution both pick on SAS when the topic turns to pricing. I've heard a few claims about comparable deployments being a fraction of the cost, so I thought I'd go straight to the source. SAS says SAS Analytics Pro, which includes the Base SAS server, SAS/STAT for statistical analysis, and SAS/GRAPH for data visualization, costs $8,000 per year for the first user and $1,710 for each additional user per year. This includes full support. So that's $19,970 for eight users or $23,390 for ten users.

That doesn't sound nearly as high as I would have anticipated, given the competitive claims. It also doesn't jibe with aforementioned Wikipedia-published comparison.

I'm guessing the comment field below will be see a few war stories and "yeah, but" analyses, particularly as the size of the deployments scales into the hundreds or thousands of users.

The choice of software and analysis of cost should be driven largely by the professionals who will be asked to use the products. Let the results, not just initial software cost, justify the selection. Familiarity can be a big productivity advantage. The popularity of R speaks volumes, and so, too, does SAS's market share among commercial software providers.

Alpine Miner is a bit of a different animal; it's purpose-built for big-data deployments. Keep this product's scale of data analysis in mind when trying to develop comparable pricing analyses. The number of supported users is in the same league as the other two yet costs are higher. The difference is that Alpine is running in a big-data MPP environment, not on a low-end server.

Having more options is a healthy sign for the analytics market. You'll have a lot to consider when weighing the cost of expertise and software. If you want to focus on what’s next – the big financial risks ahead, the best customers likely to drive your bottom line, the customers likely to bolt, and the products most likely to sell -- you can't afford not to get into advanced analytics.