Feb 22, 2010 (09:02 AM EST)
Upstarts Keep Data Warehousing Competitive
Read the Original Article at InformationWeek
Vendor consolidation may be the rule in many IT categories, but data warehousing is proving to be an exception. That much is clear as The Data Warehousing Institute (TDWI) World Conference gets underway in Las Vegas this week with a flurry of announcements from upstart vendors.
Taken individually, the headlines aren't earth shattering: Aster Data has improved support for in-database analytic processing; Kognitio has landed GroupM as a major new customer; ParAccel has partnered with Fusion-io to support flash-memory-supercharged processing; Vertica has upgraded its query workload and resource management features. But taken together, the announcements underscore that the data warehousing universe is expanding, with alternative providers getting stronger despite the pressures of a weak global economy.
Competition has been stable at the top of the data warehousing market for years, with Teradata, Oracle and IBM once again topping the "leaders' quadrant" in the latest Gartner Magic Quadrant (MQ) report for data warehousing, released earlier this month. Microsoft and Sybase are in the leaders' quadrant, too, but the upstarts are hoping to follow in the footsteps of Netezza. The only alternative provider that has made it into the top-right corner of Gartner's report, Netezza has used the combination of competitive pricing and fast query performance to win more than 300 customers to date.
Growth in data warehousing is being fueled by the so-called "big-data era." With Web sites, enterprise applications and networks cranking out data by the terabytes per day (and sometimes per hour), it seems there's more than enough room for up-and-coming vendors promising high performance and petabyte-level scalability at a lower cost -- at least compared to what Teradata, Oracle or IBM might charge.
Aster Data, among the newest additions to Gartner's MQ, today announced Aster Data nCluster 4.5, an upgrade of its core product featuring a combined SQL/MapReduce visual development environment, a suite of prebuilt analytics modules, support for Fusion-io flash-memory drives, and a new management console for optimizing query performance.
Popularlized by Google, MapReduce has quickly become the default choice for many kinds of data transformations and analytic processes while SQL remains the prevailing query language. Aster Data isn't the only vendor to support MapReduce -- Greenplum and Cloudera also support the approach -- but by offering a unified, visual environment for both SQL and MapReduce development, Aster is hoping to win over the many firms that combine the two approaches.
Aster Data's calling card is in-database analytic processing, an increasingly popular approach that speeds processing by running applications next to the data rather than extracting data and processing in the application environment. Teradata and Netezza have led the way in making in-database processing a reality, but Gartner says Aster's four-tier architecture is particularly well suited to the approach.
Citing the example of an online gaming site, Aster Data says it helped the customer move a java-based risk-analysis application inside nCluster for faster and more complete analysis.
"It used to take them 90 minutes to do the analysis on a subset of the data," says Sharmila Mulligan, executive vice president of global marketing at Aster Data. "The app now runs every 15 minutes against the entire data set, and it returns results within 90 seconds."
Most data warehouse upstart vendors -- including Aster, Greenplum, Kognitio, ParAccel and Vertica -- have designed their database software for massively parallel processing on industry-standard hardware from the likes of Dell, EMC, HP and IBM. Adding a high-end performance option to that mix, ParAccel, a column-store database provider, today announced support for solid state drives (SSDs) from Fusion-io (a move also announced today by Aster Data). SSDs replace spinning, mechanical disks with flash memory chips for faster performance. ParAccel, another vendor that made the Gartner MQ report for the first time this year, is promising 15 times faster query performance with optional flash SSDs installed by the hardware supplier of the customer's choice.
"Flash per megabyte is quite expensive, but in terms of dollars-per-megabyte-per-second it's very close to conventional disk prices now," says Barry Zane, chief technology officer at ParAccel. "In other words, performance-per-dollar is attractive and in keeping with our column-store database performance."
Column-store databases are typically faster than conventional, row-oriented databases (such as Oracle, IBM DB2 and Microsoft SQL Server) in analytic applications because they can query specific data attributes in columns -- such as zip codes, product stock numbers or transaction totals -- while skipping all the other data, row-by-row, that might not be relevant to a query. Some relational vendors, including Oracle, have added columnar data compression, but Zane says they have not matched all the performance advantages of column-oriented databases.
"We don't need the indexes, projections and materialized views that relational databases need to perform at scale, so we don't suffer from bloat and end up penalizing performance," he explains.
The data warehousing industry leaders have already embraced flash technology. Oracle put a huge emphasis on flash memory in the Sun-Oracle V2 upgrade of Exadata announced last October. Teradata's SSD-based Extreme Performance Server 4555 was also announced last October, and it's slated for general availability in the first half of this year. IBM has demonstrated an SSD-based test appliance, but it has yet to announce or ship a production SSD-enabled data warehouse appliance.
Vertica stands out among alternative vendors in that it has more than 120 customers and is the largest column-store database vendor after Sybase, with its Sybase IQ product. Tomorrow, Vertica will announce upgrades including better workload and resource management features aimed at handling mixed query workloads and many more users.
"The biggest obstacle to tackling enterprise data warehouse workloads is having big batch jobs competing with smaller queries that could otherwise be answered instantaneously," says David Menninger, Vertica's vice president of marketing and product management. "We've created resource pools that administrators can dedicate to different users and types of activities, so you can guarantee that resources are available when crucial queries come in."
Rounding out this week's announcements, Kognitio today announced that GroupM, a division of advertising giant WPP, has asked it to build an analytical environment to match and monitor advertising placements on global scale. GroupM operates in more than 80 countries, and data from these local operations will ultimately be fed into a centralized Kognitio database that will reportedly analyze almost one-third of the world's advertising spend across multiple media outlets.
Up-and-coming new-media companies, telecos and financial firms are typical of the firms selecting alternative database providers. What they all have in common is huge data processing demands. It's clear why young firms would be willing to try upstart vendors: they often have little capital to spare, and they are less likely to be deeply invested in technologies from incumbents such as Teradata, Oracle and IBM.
But even well-established firms such as Verizon and JP Morgan (Vertica), Intuit and ComScore (Aster Data), Officemax (ParAccel) and GE Capital (Kognitio) are turning to the new wave of data warehouse vendors to meet their ever-growing data analysis needs. Whether the selections were made based on price, performance or personal attention from the highest levels at these upstart vendors, it's clear that competition is alive and well in the data warehousing market.