Nov 27, 2009 (07:11 PM EST)
Understanding Private Cloud Storage
Read the Original Article at InformationWeek
Mention cloud storage to most IT professionals and they think of Internet services like Amazon S3 and Nirvanix that store your data in their data centers.
But a storage cloud doesn't have to be public. A wide range of private cloud storage products have been introduced by vendors, including name-brand companies such as EMC, with its Atmos line, and smaller players like ParaScale and Bycast. Other vendors are slapping the "cloud" label on existing product lines. Given the amorphous definitions surrounding all things cloud, that label may or may not be accurate. What's more important than semantics, however, is finding the right architecture to suit your storage needs.
A prototypical cloud storage system is made up of a number of x86 servers, each with its own storage, most commonly using four to 16 SATA drives. Users and their applications access the system through standard file access protocols like CIFS and NFS or via object storage and retrieval protocols like SOAP and REST.
The storage nodes in a private cloud are linked together with a layer of smart software, which performs several functions. First, it maintains a global name space that allows all the storage in the cluster to be accessed as a single entity, so that administrators can add storage capacity on the back end without having to tell applications at the front end how to reach it. The software also handles drive failures and keeps data available to applications and end users.
A private cloud storage infrastructure should also be able to scale from hundreds of terabytes to multiple petabytes. That level of scalability is achieved not with a forklift upgrade, but simply by adding more servers as they're needed.
This architecture provides two major benefits. First, storage administrators can configure and provision new storage nodes quickly and inexpensively. Second, administrators can add capacity only as demand requires, instead of purchasing additional disk space to meet anticipated future growth and then having that capacity sit idle in the present.
However, there are also trade-offs. Cloud storage is best suited to unstructured data, such as medical images, engineering drawings, and Office documents. For another, because each x86 server isn't as reliable as a high-end enterprise disk array, a private cloud must store copies of the data on multiple nodes. This requires more raw disk space than an enterprise disk array using a RAID-5 or 6 system. For example, if you set a policy for your private cloud to keep three copies of a 60-GB file for data protection, it would require 180 GB of disk, whereas a 6+2 RAID-6 system would need just 80 GB.
Beyond Low Cost
Several other vendors include location-aware policy engines that copy data to nodes in specific geographical locations. Data Direct Networks' Web Object Store, Bycast's StorageGrid, and EMC's Atmos systems can specify that two copies of each object in a folder should be stored in New York and Los Angeles, and that copies also should be stored in two other locations.
This not only protects data from data center failures but can also put objects on storage clusters close to the users who need them. Bycast's policy engine takes this notion one step further by including elements, such as storage tiering, that can migrate objects from more-expensive to less-expensive disk, and even to and from tape.
Organizations planning to offer private cloud storage services to internal departments may want to consider multitenant features that allow storage to be partitioned among different groups. For example, IT could carve out one section of the private cloud for HR and another for marketing, and then charge those departments based on usage. This means having delegated administration models and/or virtual servers that restrict each group's access and visibility to only their own data and the resources assigned to them. A multitenant storage system should also include accounting features that collect usage data, such as peak utilization, that will help IT in determining chargebacks.
Given the attention that cloud computing garners these days, some vendors are rebranding existing offerings as private cloud options. This can be frustrating for potential buyers, but religious arguments over what constitutes a cloud are less important than features, capabilities, and cost.
Caringo and HDS have repositioned their content addressable storage (CAS) and redundant array of independent nodes (RAIN) systems as private cloud storage. There are some similarities. For instance, CAS/RAIN architectures tend to be built with less-expensive disks than you'd find in an enterprise SAN.
However, vendors have traditionally positioned CAS/RAIN architectures for archiving and compliance. Those use cases require more-advanced features than most private cloud providers offer, such as deduplication, or the ability to set retention and disposition policies or use hash algorithms to demonstrate that objects haven't been changed after they're saved. These advanced features let vendors charge a premium, which starts to push these products outside the low-cost boundary of a private cloud. In addition, the amounts of data CAS/RAIN storage systems are intended to hold are usually smaller, and have lower performance requirements, than a private cloud architecture.
While cluster file systems can deliver impressive performance, their reliance on expensive back-end storage makes them relatively pricey compared with RAIN architectures. Cluster file systems are more appropriate to applications, like render farms, that require high performance for individual clients.
Pick A Package
Organizations that want to get private cloud storage off the ground quickly, or prefer the comfort of one throat to choke, should consider integrated systems like Hitachi's Content Platform, EMC's Atmos, or Data Direct Networks' Web Object Store. These products come complete with storage hardware, software, processors--and in the case of Atmos, even the rack.
Those looking for cloud economics may prefer software like Bycast's StorageGrid, ParaScale's Storage Cloud, or Caringo's CAStor. Because these vendors charge for their software on a per- gigabyte basis, users can easily match capacity to cost. Meanwhile, Cleversafe sells pre-configured access, storage, and management nodes, and the adventurous can use the open source community version from Cleversafe.org.
Private cloud storage systems can bring cloud economics to the data center, allowing corporate IT to retain control over data, security, and reliability. These new architectures promise to not only reduce the up-front cost of storing many terabytes of unstructured data but also reduce the amount of manpower required to manage it.
Howard Marks is chief scientist at Networks Are Our Lives, a consulting firm.