TechWeb

What's At Stake in the Big Data Revolution?

Aug 19, 2010 (04:08 AM EDT)

Read the Original Article at http://www.informationweek.com/news/showArticle.jhtml?articleID=226800004


For practitioners, big-data success could be the key to survival or opening up vast new markets. Among vendors, the incumbents have the most to lose. But fresh survey data suggests that MySQL may be a spoiler for Oracle.

In case you missed "Fast and Big" in the August 9 issue of InformationWeek, you can download it here and read my trend analysis on in-database processing, in-memory innovations and new(ish) alternatives including MapReduce and Hadoop. Here I share some additional observations along with details from a ParAccel customer interview that wasn't completed in time for the feature.

Practitioner Stakes

I wrote a lot about Barnes & Noble in my story, and important context recently emerged when the company announced it's considering putting itself up for sale. The Aster Data deployment I wrote about has everything to do with the company's current digital predicament.

Barnes & Noble's new CEO, William Lynch, ran the BN.com site before taking the company's top post about a year ago. Marc Parrish, the VP of retention and loyalty, made it very clear to me that the company is bent on evolving from a brick-and-mortar retailer into a technology-led firm.

As various news accounts have described, BN faces a tough transition. If the company succeeds in joining the e-reading craze with its Nook and smart phone e-readers (Parrish says the company already has 20% of that market), it will have to figure out how to keep its stores in business with fewer retail book sales.

The company hopes to find a happy mix across an ecosystem that includes stores, in-store cafes, e-book downloads, affinity club membership and the BN.com Web site -- thus the enterprise data warehouse (EDW) deployment that replaced nine separate Oracle warehouses that provided siloed domains of analysis.

BN is cutting back on books in its stores and restocking with toys, games and other items that can't be downloaded through e-readers. So now, e-mail campaigns might include a coupon for something other than a book to lure you to visit a store. Once you're there, BN stores now offer free WIFI to entice you to linger. Maybe you'll buy a cup of coffee, one of those toys or games or, in a fit of nostalgia, a good old-fashioned physical book.

"We know from our analysis that if people come in and buy something at the cafe, their average order value in the store goes up," said Parrish. "And when they start buying books on the Nook, we'll be able to tie that insight into the whole ecosystem. That's the thing we're working hardest on -- getting algorithms that are cross-channel."

My latest big-data customer interview provides an example wherein potentially huge business opportunities are being uncovered.

Provisio, a medical-research support firm based in Tennessee, needs to quickly query health claims and medical records on more than 41 million U.S. citizens. (Don't worry, identities are abstracted and the database is HIPAA compliant, though Provisio can contact patients indirectly through their doctors.)

Provisio was struggling with long-running queries on a Microsoft SQL Server cluster deployment. Late last year, it switched to a ParAccel database running on HP servers.

ParAccel provides both a column-store approach and massively parallel processing. The product's compression capabilities have dramatically reduced Provisio's formerly 7-terabyte database and corresponding storage needs, according to Sean Harrison, the company's chief security officer and senior information architect. He says total costs including hardware and database licenses were in the $150,000 range.

Most importantly, a drug trial "site proximity" analysis that used to take a week or more on the old platform now takes 10 minutes and can be handled customer-self-service style through the company's iTrails Web site.

"We're not only doing what we used to do much faster, we're dreaming up new services," says Harrison. Spotting disease hot spots by zip code is one possibility, he says. Another idea is helping life insurers or health insurers fine-tune rates by zip code based on statistical disease frequencies or other factors, such as occurrences of industrial accidents.




Vendor Stakes

Among the eight companies I interviewed for the big-data feature, three replaced Oracle databases -- one with IBM's Smart Analytic System, one with Aster Data and one with an upgrade to Oracle Exadata). One firm replaced DB2 with Teradata. One firm replaced Microsoft SQL Server with ParAccel. One firm replaced a first-generation Netezza appliance with Greenplum. Two other customers didn't divulge what they were using before, but I'm guessing it was Oracle that was displaced by Netezza and Hadoop, respectively.

To state the obvious, Oracle, Microsoft and IBM are the incumbents in data warehousing, and they have the most to lose from new competition. You could argue that the whole big data pie is growing, which will benefit all vendors. But more than any other vendor, Oracle is cited as the key competitor whenever a niche player like Teradata, Sybase (with Sybase IQ), Netezza, Greenplum, ParAccel, Vertica, Kognitio or Infobright is in a competitive bid for a big-data deployment.

That says a lot about Oracle's strength, but it also speaks to its vulnerability as many practitioners upgrade their platforms in the years ahead. Exadata will have to prove its performance advantages, its manageability and its affordability.

In a major database survey we just completed, it was telling that 580 respondents placed Microsoft SQL Server and Oracle/Oracle RAC neck-and-neck as their "primary" data warehousing database, each with 34%. IBM was third with 14%, split between DB2 for Unix/Linux/Windows and DB2 for System Z. The surprise fourth-place finisher was MySQL with 7%, ahead of Teradata with 5%.

This survey question wasn't intended to provide a scientific marketshare analysis, but MySQL in fourth place in data warehousing? That had us scratching our heads, too. We defined our terms very clearly, so somebody would have to be a real greenhorn to report incorrectly. MySQL isn't unknown to data warehousing, but our survey results suggest that its influence is stronger than many would suspect. That might mean it's an ace in the hole for Oracle: An entry-level play for more than just the LAMP stack that Oracle can use -- cautiously, for fear of turning off the open-source crowd -- against Microsoft SQL Server. MySQL may not play a role in the big-data story, but perhaps it could help Oracle keep Microsoft at bay in OLTP and data warehousing while freeing it to concentrate on the high end of both markets.

We're examining our survey results by customer size, and database expert Richard Winter is leading the analysis. Another interesting (though hardly shocking) finding was the top-three influences cited in database selection: total cost of operation, higher data availability, and ease of ongoing maintenance. It will be interesting to see how marketshare shifts over the next five years.