Jun 28, 2010 (10:06 AM EDT)
Data Warehousing Shifts to Analytics Arms Race
Read the Original Article at InformationWeek
Netezza, Aster Data Systems and Infobright last week announced their latest salvoes in what's shaping up to be an analytics arms race. Netezza announced new i-Class analytics functionality due in August that will enable companies using its TwinFin data warehousing platform to program in-database analytics in various languages and approaches including Java, C, C++, Python and Hadoop MapReduce. In-database processing speeds analysis because it eliminates the step of moving data out of and results back into the warehouse. Support for non-SQL programming lets developers stick with the languages and approaches they are used to.
The i-Class functionality will also support matrix-manipulation approaches, such as those supported by SAS, IBM SPSS and the R programming language. The support will be provided through integrated Eclipse IDE plug-ins, the R graphical user interface, and a library with 40 starter-kit functions with advanced techniques scalable in TwinFin's massively parallel processing environment.
"We've done a lot of work to abstract the fact that the programmer or analyst using these functions is running on a data warehouse," said Phil Francisco, vice president of product marketing and product management at Netezza.
The second chapter of Netezza's coming upgrade is the 6.0 release of the Netezza Performance Server, also expected in the third quarter. The database is said to double capacity and query performance through improved compression, clustered base tables, and workload management enhancements supporting guaranteed resource allocation for query processing (the latter a Teradata strong suit that many competitors are seeking to match).
Aster Data's latest release of its nCluster server emphasises MapReduce with the inclusion of more than 30 ready-to-run SQL-MapReduce analytic packages and more than 40 sets of "power-user" MapReduce function sets available in Java or C. MapReduce is a framework introduced by Google that supports distributed computing on large data sets; it offers speed and reliability advantages in certain types of applications scaling from tens of terabytes up to petabytes of data.
Examples of the SQL-MapReduce functions added to nCluster include text processing, cluster analysis and unpack data transformations. Functions supported in Java and C include Monte Carlo simulation, histograms and linear algebra.
"The packages make it easy for mainstream enterprises and partners to leverage more than 1,000 analytic functions out of the box , put some custom code around it and get their applications up and running in a hurry," said Sharmila Mulligan, executive vice president of marketing at Aster Data.
Infobright differs from Netezza and Aster Data in that it focuses on mainstream data volumes ranging from hundreds of gigbytes up to tens of terabytes. This is also the core market for Oracle and Microsoft SQL Server, but Infobright says its alternative database, which is built on MySQL and designed for analytic applications, delivers higher performance at a lower cost and with less administrative work.
"There are no indexes to create in Infobright, no data partitioning and none of the ongoing DBA tuning that Oracle and Microsoft SQL Server have always required," said Susan Davis, vice president of marketing at InfoBright. "Customers tell us they are doing 90% less work and incurring half the cost in terms of license cost and storage requirements."
Infobright's column-store database runs on commodity symmetric multiprocessor hardware from third-party vendors such as Dell. The Infobright Enterprise Edition 3.4 upgrade just released is said to boost query performance up to 500 percent and delete performance by more than 1,600 percent. The upgrade also provides improved query performance while data is loading, enhanced workload management features and support for multi-language data.
Oracle has addressed the scalability challenge posed by competitors such as Netezza and Aster Data with its Exadata appliance. Microsoft, too, will enter the massively parallel processing market with its release of the SQL Server Parallel Data Warehouse Edition later this summer. But Oracle and Microsoft have yet to articulate plans for in-database analytic processing and application support.