Security, the Cloud and the Data Warehouse

May 29, 2008 (11:05 AM EDT)

Read the Original Article at

James Dixon had a comment on my services/cloud post worth exploring as it's about a fundamental criticism that's been around since the first ASP started years ago:

"Doesn't DW-in-the-cloud suffer from the same fundamental problem as DW-as-a-Service in that you have to pump all of your proprietary, strategic, highly sensitive data outside of the firewall onto someone else's hardware?"

I think that's a valid argument, provided your company has no external network connectivity. If you have an external connection, then all bets are off. It's worth looking at some networking pre-history to see why this has been true for decades.Back before the commercial Internet, I had a map in my cube of the entire north American frame relay network for one of the telcos, including secure private leased lines that are supposed to be invisible to anyone except the lessee. This was the same telco my company leased private lines from. How did I get it? I found a reference to company frame relay PADs and default open configurations in 2600. The data of many corporations was flowing freely across these lines and probably 20 percent of the network was accessible to anyone with a modem and a few simple tools.

Companies routinely sent sensitive data between data centers over these links, ignorant of just how open they were. It's not much different than today, except that today there's more awareness of the need for internal security in addition to the network links.

If you're old enough to remember the Internet circa November 3, 1988, then you probably recall the Morris Internet worm. I do because I came to work that morning to find half my servers slowing to a crawl. It took a week to clean up the mess and figure out how to prevent similar problems in the future. We've been in a hacker arms race ever since.

What caught most people by surprise was the extent to which internal networks were interconnected or exposed to the outside. Servers we thought were safely locked behind routers and gateways had more than one network path and got infected.

Today we have a thousand times the interconnectedness and a million times the number of network nodes. This is a very roundabout way of saying that there is not much difference between your data in a third-party data center and your data in your own data center.

The data warehouse might be behind a dozen firewalls, but if you have a lone PC with connectivity to the Internet and a client-based query tool, that database is only one hop removed from the external Internet. Going from your PC to a third-party data center is likely as secure as connecting internally.

I'm really talking about a basic misunderstanding of the nature of data security: the feeling of control gives the illusion of security. Using an internal data center feels safer, in the same way driving a car can feel safer than flying, but it may actually be riskier.

Why is it that we don't think twice about using commerce or banking Web sites but we're afraid of putting our warehouse database elsewhere? Why would you trust an investment bank to provide access to all your finances via a Web site? In these cases we're talking real money as opposed to data about business operations. If that banking site is cracked, that business is going to be in a lot more pain than if your data warehouse were broken into.

Perceived risk versus actual risk is discussed constantly in the security community, yet people react mostly to perceived risks. Hence the insanity of the US airport security theater. The enterprise data reality is that most data leaks and thefts are due to poor process, lack of controls and, occasionally, bad actors - all items caused by people inside the company as opposed to outside.

As the data warehouse becomes central to more business practices, we need to begin paying attention to security. A few years ago I plugged a card security problem in a data warehouse the administrators didn't know was there. A table accessible to end users contained unencrypted credit card data with the security number, and this data was linkable to the owner's name and address. Anyone could write a report listing customers and their credit card info.

The irony is that the data was removed from the warehouse, but it was months before the source system was fixed. The source? The database underlying the internal e-commerce site. The data was simply replicated from there to the data warehouse. The risk of staff accessing the data from the warehouse was removed, but the source database still stored that data behind a Web site.

I believe that a third party offering database services is probably more aware of data security and the swiss cheese of enterprise data security loopholes than most internal IT managers and developers. With data from many companies on their servers, they face a bigger liability if they are broken into, so they are motivated to take more care than we are.

The data security question is rooted more in lack of visibility to the total IT risk picture than it is in the reality of data being inside or outside the firewall."Doesn't DW-in-the-cloud suffer from the same fundamental problem as DW-as-a-Service in that you have to pump all of your proprietary, strategic, highly sensitive data outside of the firewall onto someone else's hardware?" James Dixon posted this question in response to my last post, and it points to a fundamental criticism that has been around since the first ASP started years ago...