Physical Vs. Virtual: Oracle, Others Redefine Appliances

Aug 15, 2011 (04:08 AM EDT)

Read the Original Article at

Physical appliances are fighting to hold their ground in the enterprise data center. The strength of virtualization seems unassailable: By abstracting processes from the underlying infrastructure, applications can be customized to fit user needs and business demands, rather than having to conform to hardware. Freed from physical constraints, applications can dynamically run wherever is most efficient. Technologies from firewalls to WAN optimizers to network management systems to routers that used to live on dedicated boxes are moving to virtual servers.

The result is what every business is looking for: enormous flexibility and economies of scale. Why not go virtual whenever and wherever possible?

Because there are still cases where physical appliances pay off, thanks to specialization and customization. Without the overhead of virtualization or superfluous software processes, a dedicated piece of hardware will almost always perform a given task faster than software. And hardware that's custom-designed for a single purpose--be it running the Oracle 11g database or examining packets for malicious content at wire speed--usually delivers the highest performance, albeit at a cost of less flexibility and a slower development cycle.

Like clothes dryers and air conditioners, data center appliances are optimized to do one thing very well. Most aren't truly plug and play, but some come pretty close to the ideal of a black box whose inner workings IT needn't worry about.

At the ends of the spectrum, we're not wrestling with this decision. If an unvirtualized system uses just a fraction of the available CPU and I/O power, or if a workload needs to be moved across data centers or the public cloud, virtualization is an easy choice. If the task at hand requires specialized hardware purpose-built to one function, as for high-speed, deep-packet inspection, very-high-performance routing, or large-scale OLTP or OLAP processing, purpose-built appliances win hands down.

It's the middle ground where we're struggling. Many companies take the approach that if a system isn't broken, IT shouldn't spend time and money to fix it. Then there's the law of unintended consequences. Virtualizing components of an application stack can introduce variables that affect services in unexpected ways. Meanwhile, performance management systems and some security controls will no longer work exactly as they once did. Just getting back to a stable and manageable state may imply some serious work for already overtaxed staffs.

Emblematic of this difficult decision is the database portion of the application stack. Not long ago, production databases were strictly off the virtualization track because of their tendency to gobble I/O and processing power. But recent hardware and software improvements have made virtualization a far more viable option for the database management system. The questions now: Which databases, when, why, and how?

Many IT organizations still deploy refrigerator-size, self-contained DBMS servers. Abandoning them in favor of virtualized servers on high-speed LANs using networked storage implies changes to everything from how IT teams are structured to policies around how the systems get funded and maintained. That's why, when it comes to database management systems, vendors and IT architects alike are pushing back against the "virtualize everything" movement. Their reasons shed light on the wisdom of going all virtual, all the time.

2010 State of Database Technology

Register for InformationWeek Analytics and get our full report.

Get This And All Our Reports

Big Data, Big Hardware

The market for integrated database systems, known collectively as data warehouse appliances, is booming, driven by companies that need to manage explosive growth in data volumes and by vendors embracing massively parallel processing (MPP), which spreads processing across the many CPU cores of their appliances.

Spats over terminology also shed some light. Oracle eschews the "warehouse" and "appliance" monikers, saying that a data warehouse is fundamentally no different from any other database and that the word "appliance" is a misnomer. The former claim is dubious in light of Oracle's lack of a dedicated data warehouse offering, but the company might have a point about "appliance." If you think of a data warehouse appliance as a simple plug-and-play device, you'll get a rude awakening given the integration and customization required. Exadata, for instance, has to be configured for either OLAP or OLTP. Still, most vendors of data warehousing appliances say customers can get their products up and running in days, where conventional databases take weeks or months. And as with any dedicated hardware, there's a trade-off between (relative) simplicity and (relative) flexibility. A data warehouse appliance is more flexible and complex than a typical 1U networking box, but it's simpler and less versatile than a full-featured database.

One area where IT can gain an edge with an appliance is in giving business units the power to mine large amounts of new and historical data. Today's increased storage capacity means that companies can save every mouse click or GPS coordinate, but such a trove is valuable only if it's accessible to analytics applications. This is where appliances shine: With software and hardware designed specifically for reading large volumes, data that might previously have been archived to removable media becomes available for analysis.

Most integrated systems target data warehousing. Teradata pioneered the concept, convincing customers that a data warehouse is different enough from a conventional OLTP system to require a separate installation. Netezza and Greenplum had a similar vision, and these startups made inroads by cutting the cost of tightly integrated hardware and software. Retailers use data warehouse appliances to track customer buying patterns, mostly to improve marketing and demand forecasts. Banks use these appliances to detect fraud, phone companies to plan cellular coverage based on calling patterns, and airlines to price fares. Any task that depends on demand forecasting will likely benefit from a data warehouse.

Warehouse vs. OLTP
Key differences:
OLTP processes fresh data, which usually means small data sets and a d roughly equal proportion of reads and writes.
Data warehousing usually handles older data, which implies a much larger volume and a far higher proportion of reads vs. writes.
Before building or buying a database system, ask:
What quantity of data is involved?
What type of operations will be performed?
Will it be used for data warehousing online transaction processing?

Exadata is designed to run any Oracle database, and the company pitches Exadata as the optimum platform for both OLTP and data warehousing--even though it started out as a data warehouse platform, and Oracle still compares it with rivals' data warehouse appliances. Oracle's competitors counter that the design approach that lets Exadata perform both functions has compromised its ability to handle data warehousing.

"It's a diminution of the importance of data warehousing and analytics," says Luke Lonergan, VP and CTO of EMC Data Computing, the division responsible for the Greenplum technology EMC acquired last year. Rivals don't dispute Exadata's ability to handle OLTP, but they do say data warehousing is a fundamentally different problem, pointing to enormous storage requirements and a need for fast reads. Oracle retorts that warehousing is neither that different nor that big a deal. "It's an easier problem," says Tim Shetler, VP of Exadata product management. Still, Oracle is trying to enhance its data warehousing chops by moving aggressively into storage. Its goal is to expand the capacity of Exadata so that a warehouse can scale to analyze data that previously would have been archived to removable media.

So who's right? Data warehousing specialists such as EMC Greenplum, IBM Netezza, and Teradata have scores of reference customers analyzing hundreds of terabytes, even petabytes, of data. Oracle says more than 1,000 customers have deployed Exadata (for both OLTP and OLAP), but we haven't seen it publish data warehousing references breaking into the hundreds of terabytes. Ultimately, the only way to know if an appliance will handle your workloads is through testing. Most would-be data warehouse appliance customers have already outgrown conventional deployments of leading databases such as Oracle, IBM DB2, and Microsoft SQL Server and need the scalability and MPP power of appliances. "Nobody buys Teradata for the fun of it," says Ed White, general manager of Teradata. "They all started with IBM and Oracle."

However, IBM's acquisition of Netezza and Oracle's launch of Exadata mean IT has another option: Migrate to an appliance while sticking with your existing vendor. Netezza is as tightly focused on data warehousing as the other specialist appliance vendors, and Oracle says that almost all Exadata customers were 11g users. Thus, whether one appliance architecture can suffice for both OLTP and data warehousing is really a question that applies only to Oracle customers. If you are, and you need the next level of performance and scalability, Exadata may be a good solution--provided you don't mind being locked into Oracle and that you can afford it. The lowest-cost Exadata full rack lists for $1 million. If you'd rather let performance, scalability, and price determine your selection, test your data and query loads on a short list of platforms. EMC Greenplum, IBM Netezza, Oracle Exadata, and Teradata are market leaders, but you might also consider Hewlett-Packard, Vertica, Infobright, Microsoft's new ParAccel, and SAP's Sybase IQ.

A Hard Sell

In our InformationWeek Analytics Oracle/Sun Merger Survey, fielded six months after the acquisition closed, 28% of 454 respondents said they wouldn't buy an integrated database/ data warehouse appliance, compared with just 8% who said they were sold on the concept. Of course, that left the majority in evaluation mode, and the key benefits that vendors tout for these products track what CIOs say they want now: high availability, lower total cost of ownership, and easier maintenance. The trick for vendors is to prove their lofty claims.

For IT teams making the physical vs. virtual call, specific areas to investigate include the benefit of specialized silicon and tight hardware/software integration, ease of setup, scalability, and exactly what functionality is off-loaded to hardware.

Paradoxically, integrated-system vendors claim to offer many of the same business benefits of virtualization--notably, a more agile IT department and more efficient use of hardware. Respondents to our InformationWeek Analytics State of Database Technology Survey cited both of these attributes as critical. However, there are differences between what IT wants in OLTP deployments vs. data warehouses, notably in the area of cost: TCO is tied with high availability as the most important factor in data warehousing, cited by 39% of respondents with one or more data warehouses; in operational databases, it sinks down to sixth place, at 24%, behind features such as agility and fast deployment of new databases. The most important factor for an operational database is ease of maintenance, cited by 37% of survey respondents.

Whereas virtualization reduces the need for hardware configuration by doing more in software, integrated systems ease setup by shipping devices preconfigured. And whereas virtualization uses assets more efficiently, integrated systems are designed to harmonize software and hardware for specific workloads. They also cut costs by reducing time spent on integration.

In addition, integrated systems promise to make more efficient use of a previously underutilized but highly valuable asset: the data itself.

Some companies simply salt away data for compliance purposes, but aggressive retailers, banks, and telcos in particular aim to wring value out of it. "Everyone wants every single piece of information they can get on each customer," says Teradata's White.

In this respect, the term "data warehousing" is a misnomer. The analytics applications that integrated systems are designed for transcend just sticking data in a storehouse. They demand that the data be actively worked--analyzed many times over by pattern-matching algorithms or queried by automated searches. This is the promise of Greenplum, Netezza, and Teradata, not just Exadata. It's also the promise of SAP's HANA technology, which speeds analysis by storing data in memory. SAP says it will deliver a full data warehouse product based on HANA by year's end.

Plug And Play? No Way

Integrated database systems typically require many hours of configuration. "Our target is three days to get up and running," says EMC's Lonergan. IBM's target for its Netezza appliance is 48 hours. Oracle offers similar estimates--one day to set up the Exadata hardware and another day or two to get its software running. However, we've talked with customers that took weeks to tune and tweak the database for peak performance.

Even three days may seem like an eternity to IT teams accustomed to the instant gratification of other classes of appliance, but it's a vast improvement compared with the process of building out conventional database deployments from scratch, a process that can easily take weeks or months.

Physically, database appliances also aren't the simple devices that the name implies. Most VPN or WAN optimization appliances are self-contained boxes that occupy at most a few rack units and perhaps give IT the option of upgrading a switch port or hard drive. In contrast, data warehouse "appliances" may comprise 100 or more separate boxes occupying 10 racks and strung together using high-speed Ethernet or InfiniBand. Most start at a minimum size of one-quarter rack, generally including servers configured either for computing or as storage nodes (at least one of each) and a switch to link them. For example, the Oracle Exadata X2-2 quarter-rack system uses four storage servers and two compute servers to create a system with 96 TB of raw disk capacity and 48 CPU cores.

This is a far cry from one main benefit of pervasive virtualization--namely, consolidating servers and using less physical space in the data center.

The great compensating benefit of this design, however, is scalability: IT can start with a quarter-rack system and plug in new storage or server nodes--or entire racks--as necessary. Again, being able to add on in such a piecemeal fashion is a deviation from what we usually think of as an appliance, but the intention is that each upgrade will be at least as easy as the initial installation. Because of that, some integrated system vendors argue about whether their products should even be called appliances. Phil Francisco, VP of product management at IBM Netezza, says the term is apt. Oracle's Shetler says the company doesn't call Exadata an appliance "because that suggests something that's relatively small or that requires minimal administrative support."

chart: How will the offer of highly integrated software and hardware systems affect your future purchase plans?

Architecture Choices

Whatever they call their products, integrated database vendors can be divided into two categories: those that base their systems on standard x86 hardware and those that make extra tweaks. Teradata and EMC use the same hardware as general purpose computing platforms, while Netezza adds a bit of proprietary wizardry.

IBM Netezza's key innovation is to implement SQL in hardware using custom chips to speed queries. In addition, MPP schemes spread workloads across multiple blades; for example, segments of a SQL query can be processed simultaneously for maximum performance. The whole system is orchestrated by host servers that split processes across blades and manage onboard storage.

In contrast, Oracle's Exadata appliance uses no custom chips and departs from the approach used by most of its competitors in three ways: It runs Oracle 11g database code on storage servers, it adds flash caches that speed data access, and it implements proprietary compression throughout the system. By having Oracle database code on the storage servers, those servers know exactly what to access, cutting the amount of data that needs to be sent over the internal Exadata network and then analyzed. So, instead of scanning the entire database, a query for, say, customers with more than $1 million in purchases can be executed against only the specific tables and blocks of data that are relevant, saving time and speeding useful results.

Flash caching and compression sound fairly self-explanatory, but Oracle says that it uses these techs more effectively because the system is aware of the specific needs of an Oracle database. "We compress data everywhere: in flash, on disk, even in memory buffers," says Shetler. Where most competitors claim two- to four-times compression, Oracle and many of its customers report 10-times compression, which greatly improves storage capacity and reduces the total cost of the system. However, the capacity any company will achieve is application- and data-dependent.

SAP's HANA takes yet another approach, storing as much data as possible in main-system RAM to avoid the latency involved in disk writes. "The main use case is an operational database next to a warehouse," says Prakash Darji, VP and general manager of data warehouse solutions at SAP Labs. HANA instantly mirrors and lets you analyze real-time transactional information from SAP applications--the data held in the operational database. And the business can correlate this information with historical data in the warehouse to spot important trends and opportunities to take action. In this respect, HANA is radically different from Exadata, which, when used for data warehousing, is as dependent upon (typically batch-oriented) data integration from transactional systems as any conventional data warehousing platform.

SAP argues that running applications directly on HANA will improve performance by cutting out some complexity, but its claims are still unproven because SAP's in-memory technology has yet to become a platform for transaction processing, or even a replacement for the data warehouse. The current HANA product does have customers using it for analytics. For example, Adobe uses it to sift through customer data in search of unlicensed software use. British gas and electric utility Centrica uses it to process data from millions of smart meters, forecasting demand and pricing. Though HANA is designed to be an integrated appliance, SAP doesn't sell hardware itself, instead licensing HANA to partners including Cisco, Dell, Fujitsu, and HP.

Powered Up

Of course, complex analysis takes major horsepower. So why do Teradata and EMC Greenplum eschew Netezza's custom hardware or Oracle's storage-tier techniques in favor of pure MPP power? "It's proven and scalable," says EMC's Lonergan. He points to supercomputers that use commodity components and open source operating systems and says that Greenplum can match Netezza's performance in many applications without using specialized silicon. The EMC Greenplum system also scales to more racks.

Another benefit of a standard architecture is that the vendor community is more comfortable embracing it as a platform. Havas Digital sells analytics software that runs on top of the Greenplum system and enables data-driven marketing. Katrin Ribant, executive VP at Havas Digital, says it chose Greenplum in part because it uses a lot of open source software that's easy for Havas engineers to understand.

Teradata's pitch is similar to Greenplum's, using parallel processing across multiple standard x86 cores. Like EMC Greenplum, Teradata has never tried to compete in the OLTP market and doesn't intend to, instead designing its systems to work well with companies' existing gear. "We got our start extracting data from an IBM mainframe," says White. "People bought us because we could do it better than IBM could."

Ultimately, these decisions aren't just about physical vs. virtual or standard vs. proprietary. They're about crafting a diverse IT strategy that makes the best use of resources. This is the opportunity that Oracle sees in database consolidation, and not coincidentally, it's the driving force behind virtualization. We've lived through--and literally paid the price for--customizing every system to every application. It's not just the hardware that's expensive; unique systems require unique expertise and expensive white-glove maintenance contracts, and they still represent critical points of failure. A virtualized, homogenized infrastructure addresses these concerns by making hardware and operating systems easily replaced commodities, but IT is still left managing complex applications. The challenge, then, is to strike the right balance, using a homogenized virtualized infrastructure where possible and purpose-built systems where necessary to deliver the services the business needs to thrive.