Aug 29, 2012 (09:08 AM EDT)
Aerospike Vies To Advance NoSQL Database
Read the Original Article at InformationWeek
Real-time database vendor Aerospike hit most of the major developmental trends in big data this week by announcing a new round of funding, the release of a new open source edition of its version of the NoSQL database, and the acquisition of a smaller vendor with more big data specific functions than Aerospike's own code.
It also changed its name from Citrusleaf, which it used since its founding in 2009, to Aerospike as a reference to the rapid growth of big data and the company's own ambition for fast growth, according to a statement.
Aerospike's main product is a distribute hash-table database designed as a NoSQL data store that processes transactions in real time, manages unstructured data as efficiently as traditional data, and scales horizontally across clusters of commoditized server and storage hardware.
Its primary purpose is to serve Web-based apps with strict latency requirements and huge volumes of data to access--gaming sites, advertising-driven sites that must display relevant ads within milliseconds, and other applications with high performance requirements and unpredictable load levels.
[ Simplify your data streams. See DataSift Tools Help Non-Techies Mine Social Web. ]
The company touts three primary features for its NoSQL database--speed, scalability, and reliability--that are traditional virtues in data management, but are particularly acute needs in big data analytics, according to Shalini Das, research director of the CIO Executive Board consultancy in Washington, D.C.
The ability to store unstructured data, including text and images, and to run analytics that can automatically add metadata that would allow even images to be used in external apps or found using standard queries is a basic prerequisite for big data; doing it with response times for both data-intake and data reporting are major advantages, Das said.
Sixty percent of companies with big data projects in process use relational databases as at least part of their data store, however, so it's not enough that a big data project use a single multifaceted database, according to Mike Boyarski, director of product marketing for Jaspersoft, another NoSQL vendor that surveyed open source big data users for an August report.
While the report showed a higher than expected percentage of companies launching big data projects as production systems rather than pilots, it also found the tools available to gather, process, structure, and search unstructured data alongside relational data (or in combination with relational databases) are far too weak to satisfy existing requirements.
"There's a lot of uncertainty of the value proposition of the tools at your disposal right now to take advantage of big data," Boyarski said. "It's a little surprising so many companies are moving forward into production despite the tools available."
Aerospike addressed the need for multi-format data support by acquiring startup database specialist Alchemy Database, whose AlchemyDB is designed to combine a relational database management system (RDBMS) with a document store, graphing capabilities, and a Redis open source key-value data store.
The combination will give Aerospike a good NoSQL key-value store and extensive data management capabilities, according to a statement from Aerospike that focused on the performance aspects of the combination.
Performance is important because many of the data sets are so large, according to Das.
SAP, Oracle, Microsoft, IBM, and a host of other major companies are building big data features into their existing databases and applications already, however, so there are plenty of big data platforms available, Das said.
And there is no shortage of NoSQL data stores, most of them open source and optimized for big data.
What is really in short supply are tools that integrate neatly with those data-crunching platforms to gather, clean, tag, store, and index unstructured data that few companies have ever tried to incorporate into their master databases, she said.
"Most of the activity in that area is from startups right now, so you won't see many of them in major products until there are some more acquisitions or until the market matures to point that these functions are more widely available," Das said.
Aerospike itself is a startup, which announced a new round of funding at the same time it announced its new product features and acquisition. Aerospike's Series B funding round raised an undisclosed amount from New Enterprise Associates, Draper Associates, and Alsop Louie Partners; the latter two were key backers during the company's first round.
Most of the best-regarded products are also either open source software or are based on open source with layers of proprietary enhancements to add new functions, Boyarski said.
Jaspersoft's main NoSQL product set is open source, as is Aerospike's, though Aerospike's proprietary enhancements shift the whole suite into the world of commercial software.
To keep from alienating open source developers and keep the latest enhancements moving into its products as well, Aerospike reversed course by releasing an open sourced version of its NoSQL database, the Aerospike Community Edition.
It shares most functions with the Enterprise version, but comes with a free unlimited license, supports a single cluster of two nodes in one data center, and has an upper data-storage limit of 200 GB.
The Enterprise Edition supports multiple data centers, multiple server clusters of any size, and replication between data centers, and offers 24-hour support.
Despite its real-time performance and claims of high throughput, Aerospike will face considerable competition both from other commercial software companies, open source software, and traditional applications and databases tweaked to provide big data-like benefits, according to Das' evaluation of the big data software market.
Jaspersoft's survey confirms the competition from unusual directions. Relational databases are the most common data store cited by respondents to the survey, followed by MongoDB (cited by 19%), Hadoop (18%), analytic databases from Teradata, Vertica and similar vendors (11%), Google's BigQuery (8%), Hbase (8%), and Cassandra (7%).
InformationWeek is conducting a survey on big data. Take our InformationWeek 2013 Big Data Survey now. Survey ends Aug. 31.