A No-Sacrifice, Affordable Data Warehouse

Oct 27, 2004 (09:10 AM EDT)

Read the Original Article at

It's not surprising that business intelligence and data warehouse solutions are very pricey. The number of users has been rising, data volume has been growing, disparity of data sources required has been widening, and the refresh rates have been quickening from weekly to near-real time. The big question is whether you're getting the bang out of your investment buck that you need.

This article picks up on a theme I introduced in "BI on a Budget." Then, I examined three ways to maximize your BI budget, through nontraditional combinations of technologies, Web services, and in-database functions. All of these remain important strategies to get the most out of the money you spend. However, there are other means as well. This article explores project tasks you can execute and technologies you can implement that guarantee to squeeze the most out of your BI budget.

Mitigate Risk

If you're really serious about maximizing your BI money, then start with your project planning. I can't tell you how many wonderfully detailed, well researched, and documented plans I've seen fail or at least stall. The reason is often attributed to the lack of explicit consideration of risk. This is a very important point. BI and data warehouse projects are initiated to support analytic applications. And, as we should all know, analytic applications are some of the trickiest applications to develop. Why? Well, put simply, analytic applications are difficult to specify. Typical IT development projects, like building an order entry system, are much easier to define than analytic applications, which are stuffed with concepts like slice-and-dice, ad hoc, data pivoting, and drill-through. All these functionalities need to support the dynamic interrogation process of an analyst.

Therefore, the risk associated with analytic applications is that users often don't know exactly what they want you to build until they start seeing part of the application. In other words, you must build an application before it's fully defined and specified. Risky, right?

So, in your project planning you must consider risk explicitly. To that end, there are two project tasks you must include and execute early in your plans: conducting a data quality audit and creating prototypes. Both will reduce much of the risk associated with analytic applications while saving considerable money.

The data quality audit answers the fundamental question of whether or not the source data supports the analysis required. Please read two of my columns for detail on how to conduct such an audit: "Data Quality Discipline" and "The Architecture of Enterprise Data Quality." For now, suffice it to say that if you don't have the data necessary to support the BI application requirements, there are only three options available:

  • Clean the data at the source before you spend any money on the BI/DW application.
  • Attempt to clean the data during the ETL portion of the application if possible. The audit will tell you if you can achieve this task and give you a good sense of what that effort might entail. You can then go to the user community, armed with quantified information, and examine how that affects the budget before you start any effort.
  • Adjust the scope of your BI/DW application.

In all cases, you save your company money, time, and resources that would have otherwise been spent trying to complete a project that was destined to fail from the start.

Another important task that you can perform is prototyping. Your goal is to quickly source the data into a rough prototype of the deliverable. It can be quickly cobbled together by sidestepping all the formal processing and persistent data structures ultimately built for the final product. The value of the prototype, however, is often immeasurable. For example, it gives the user a sense of what you're building as well as educating you on the challenges you'll face during the main project. For both the BI planners and the users, you now have an opportunity to modify the requirements, timeline, and budget — before you start any project effort in earnest. This saves money!

Open Source Software: FUD Vs. Reality

Fear, uncertainty, and doubt (FUD): the marketing strategy executed by established vendors when technically viable alternatives are available at incredible savings. FUD is used to raise concern in the minds of customers about adopting a technology with a smaller presence as opposed to a behemoth whose own product doesn't possess equivalent features or pricing. And, in many cases, it works. No decision maker wants to invest in a technology that's destined to fail or recommend a solution that's not readily recognized by company executives. Open source products are certainly nontraditional technology, which makes them an easy target for FUD.

Project planners and architects must recognize FUD when they hear it from vendors. The fact is that there are several prominent open source products that have been effective tools in IT for years. Forward thinking, enterprise-minded planners even have a software stack they explore called LAMP (Linux, Apache, MySQL, PHP/Python/Perl). Each product in the stack has a growing body of evidence to substantiate it as proven technology. Everyone in the technical community has heard of them. And a couple of the products are even familiar to executives. Put simply: Open source works. These products shouldn't be discarded as ineffective or unstable.

Of this LAMP software stack, I want to specifically draw your attention to two great open source alternative technologies: Linux and MySQL. Both of these products can support small and large BI-centric projects, and are especially important to companies on tight budgets. With no reservations, I believe that any BI-on-a-budget effort should have these two candidates on its short list. Their presence in your project will have a positive, if not dramatic impact on your budget.

Microsoft, among other vendors, often uses FUD to raise concerns about Linux. Just stop listening to this market hype! Linux works, period. And it's an excellent option for small, midsized, and even large companies. It works on single-CPU systems as well as clusters. But don't take my word for it; look at the growing body of evidence. For example, Linux clusters are used by some of the largest firms in the world, such as ETrade, OfficeMax, AT&T, J.C. Penney, Google,, Yahoo, and American Express, just to name a few.

With a Linux/Intel architecture, you not only have an operating system that gets around the Windows file system sluggishness, but also an environment that's considerably less expensive than competitive technology like pSeries/AIX or Solaris/Sun. Essentially, you have the best of both worlds: a solid Unix-version operating system that's relatively inexpensive.

Private industry and governments alike are taking advantage of Linux. From the private sector, the total cost of ownership is much lower with Linux implementations than with typical technology offerings. Moreover, Linux is more efficient. The product allows companies to customize the operating system to their particular needs, eliminating resource overhead for functions that may be irrelevant for the task at hand. With products like Windows, customizing it to your needs is impossible. There's simply no flexibility in most operating systems offered in the market today, except for Linux.

Governments are embracing Linux as well, for many of the same reasons the private sector is. Linux represents better value and offers more flexibility. The Israeli government recently announced it's standardizing on Linux. China, Germany, and other countries have announced or are in the middle of migration plans, while Russia, the United Kingdom, and Brazil are exploring the technology.

Moreover, as Linux momentum gains, the number of applications that run on the technology grows. At this point, virtually every leading Unix application runs on Linux. This list includes all the big names, like IBM and Oracle product offerings, as well as a growing number of smaller players that offer important technology, like Viador's BI portal.

Of course many will argue that much of the savings found in Linux-based systems year-to-year are more a factor of relentless cost cutting for hardware and database vendors as opposed to just the operating system. While there's some truth to that argument, the simple fact remains that Linux provides unarguable savings. A Linux license of SuSE for a 64-bit AMD Opteron, 2GHz, 8-way processor system is less than $3,000 — that's a huge savings in anyone's book!

If your budget is tight and your requirements are significant, ignore the FUD ranting and look closely at a Linux/Intel platform.

MySQL: Open Source, High Quality

Another open source offering with good features and solid performance is MySQL. Like other open source products, MySQL is considerably less expensive than competitors and, for many applications, as functional. Companies such as Google, Toyota, Intel, DaimlerChrysler, Bayer, Colgate, and Yamaha (just to name a few) have all effectively used MySQL in their applications.

There are several reasons why MySQL is so popular. For example:

Performance: It's fast, stable, and easy to use.

Proven: The product at last count had more than four million active installations.

Inexpensive: As an open source product, it's developed and marketed at a fraction of what vendors like Oracle and Microsoft spend on their databases. These savings are generally passed on to the customer.

Charles Garry, an analyst at Meta Group, described MySQL as "a disruptive technology," upsetting the entire database market. It's easy to see why, with millions of users, solid performance and stability, and a really low price. As Figure 1 shows, the price for standard and enterprise licenses from Oracle, IBM, and Microsoft for a single CPU are in the range of about $5,000 to $40,000. Now consider that MySQL Pro costs $595 per server.

Cox Communications, for instance, points to an application based on MySQL costing less than $90,000, for everything from hardware to annual licenses and support. An Oracle database license by itself was estimated at $300,000. Now that's savings! But Cox isn't the only story available. A NASA procurement office migrated from Oracle to MySQL because a license upgrade for Oracle was going to cost twice as much as its entire budget.

But customers aren't the only ones who see the value of MySQL. SAP, for instance, has a certified version of MySQL called MaxDB. With this version, you can dramatically reduce the cost of your SAP implementations, without sacrificing scalability (which is in the terabyte range), performance, or the administration features we've all come to expect from enterprise database technology.

Figure 1
Figure 1: The price for standard and enterprise licenses from Oracle, IBM, and Microsoft for a single CPU are in the $5,000 to $40,000 range. MySQL Pro costs $595.

Two Ways to Save

Whether you're trying to squeeze ambitious BI objectives into a tight budget or you simply want to maximize every dollar you spend on your BI effort, then remember the following two things.

First, by knowing more about the data you source (with a data quality audit) and the application you're attempting to build (with a prototype), you reduce risk. And, by reducing risk, you save money. Period. Never, ever confuse risk reduction with more work or more expenses. That's simply wrong. Any risk mitigation should be performed early in your project effort, before you're fully committed. Only then can you modify your budget, requirements, or both. The wrong time to negotiate change is at the back end of design and development.

Second, open source products such as Linux and MySQL are solid and proven technology for BI efforts and are relatively inexpensive. Therefore, these products should be on your short list. Don't let vendors feed you FUD. As you know, vendors are often driven by personal gain and don't have your best interests at heart. Open source products may not ultimately fit your environment or culture, but that should be your decision.

Michael L. Gonzales is the president of The Focus Group Ltd., a consulting firm specializing in data warehousing. He has written several books, including IBM Data Warehousing (Wiley, 2003). He speaks frequently at industry user conferences and conducts data warehouse courses internationally.