Sep 19, 2012 (04:09 AM EDT)
The Ins And Outs Of In-Memory Analytics
Read the Original Article at InformationWeek
Download the InformationWeek September special issue on in-memory analytics, distributed in an all-digital format as part of our Green Initiative
The high-performance computing market is expected to reach $220 billion by 2020, according to a study by Market Research Media, and in-memory computing is one of the fastest-growing components of that market. We believe that what has so far been an expensive niche database technology is poised to go mainstream in a big way, just in time to help businesses put to use all the big data that's piling up. When it comes to business agility, every millisecond will soon matter, if it doesn't already.
And we're hardly speed demons today. Our InformationWeek 2012 Big Data Survey of 231 business technology pros, all from companies managing a minimum of 10 TB of data, shows the No. 1 area of concern is speed of accessibility. Yet for physical storage, 85% are still using disk.
Let's be clear: Disk-based databases, with their high-latency I/O bottlenecks, place a severe constraint on how fast your business can move. Typical average response times for conventional relational database management systems are measured in seconds for online transactions and in hours for batch processing. Disk I/O is the weakest link in IT's efforts to reduce latency in high-speed analytics and transactional applications.
In some similarly demanding areas, companies have been able to reduce delays with distributed caching systems, using Memcached or Oracle Coherence, for example, to create relatively lower-latency transactional systems. Even with distributed caching, however, the persistence layer is a disk-based database from which a slice of anticipated records is cached into real memory to speed up queries. Updates must still be written to disk; thus, most distributed caching systems offer three- or four-second response times.
That's not bad, but it's not going to let us mine big data stores in near real time.
In contrast, with in-memory databases, disk I/O bottlenecks and related CPU-intensive activities are either eliminated or moved to DRAM. These activities include indexing; hashing, used for efficient indexing; list management for resolving logical and physical locations of data on disk; cache management; and all related interprocess communication activities for the many handoffs involved. As a result, an application's address space can communicate directly with the database, which is now in RAM. Subsecond response times and throughput reaching hundreds of thousands of transactions per second are the hallmarks of in-memory transactional systems.
Now, in-memory databases have been around for 30-plus years--remember IBM's IMS/VS Fast Path, circa 1978? But only now is the alignment of business need, technology advances, economics, and scalability driving mainstream adoption, as reflected by the 60% of respondents to our Big Data Survey who say they're somewhat or very likely to invest in technologies to manage big data initiatives within the next year. Their goal is to organize data into a fabric that can be searched, browsed, navigated, analyzed, and visualized, while adding standardization and scalability.
Download the InformationWeek September special issue on in-memory analytics
Our full report on in-memory analytics and databases is available free with registration.
This report includes 15 pages of action-oriented analysis and 6 informative charts: