Inside Analysis

Data Compression: An Opportunity and a Dilemma

With the release of its Real-time Compression Appliances,  IBM threw another variable into the data center equation. Roughly speaking you can view the situation in the following way:

Data Centers are expensive in terms of space, power, cooling and particularly, labor. And yet the amount of data we store regularly increases by about 55% per year. We may talk about Big Data, and indeed many data centers are assembling big pools of data, but the reality is that the data growth rate has stayed pretty constant for decades. The consequence is that every year data centers need more storage capacity (and processing capacity too, to process all that data).

The idea of pushing all that data into the public cloud to ease the strain on the data center is appealing, but sadly a good deal of data center data cannot be pushed out in that direction for a variety of reasons, most of which relate to service levels. For most data centers the outcome of this situation is to depend on technology solving the problem. So naturally, data compression is a compelling idea.

In simple terms, data compression allows you store more data on the same disk than before, and the level of compression – depending on what data is being compressed – can be very high.

IBM’s Real-Time Data Compression

IBM’s real-time data compression appliances are a neat idea. You add them to the data center network in an appropriate location and they sit there compressing data 24/7 with, IBM claims, no downside. The technology compresses data as it is written to a storage device and decompresses it when it is read without there being any need to change applications in any way. This is, by the way, unique patented technology that is baked into its own Storwize devices, but will happily work with competitors’ storage technology as well.

Critically, it imposes no overhead in performing the compression. Application performance is either unaffected or improved, and storage space will be significantly reduced. The level of compression varies with the type of data, but for most data you get somewhere between a 70%-80% reduction in data volume – and that includes database data as well as office applications and CAD/CAM data. It sits there between the application and the storage device, scrunching up the data on its way to disk or unscrunching it on its way to the application and, in tests IBM has done, it either presents no overhead or it improves performance. Conceptually, you can think of it like this. The appliance compresses data very fast, in a stream. This obviously takes some time, but the disk takes less time to write the data away, and the outcome is either a wash or a performance gain. The same is true for decompression on reading data.

A Philosophical Note

Previous attempts by the IT industry to build this kind of data compression capability failed to avoid a performance penalty and thus have had a far more limited area of application. But it is important to note that such technology does exist and is already deployed in some data centers. And in case you forget, there are a fair number of databases, not just the popular Oracle, DB2 and SQL Server databases. Many of the new scale-out databases (Vertica, ParAccel, etc.) compress data as part of their performance strategy.

Of course this does not invalidate IBM’s technology, since most data center data is not database data at all. But it does pose a question as to whether it is better to do data compression in the hardware or to use the software. To me, it’s a curious question as to whether IBM’s appliance with Oracle’s database data compression turned off is preferable to solely using Oracle’s data compression. I don’t know the answer, and I don’t think IBM does either, at the moment.

The idea of “data compression in the iron” is an idea that has legs. IBM will likely make a success of this family of devices and, in the longer term, the industry may well decide to make data compression a hardware rather than a software feature – hopefully to the point where it attains “commodity status.”

Of course, I doubt whether any of this will serve as a long term solution to handling data growth. Data likes to grow. It always has. It always will.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic [email protected], From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic [email protected], From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

One Response to "Data Compression: An Opportunity and a Dilemma"

  • Appliances in the datacenter: challenge for judgment | Real User Monitoring
    October 7, 2012 - 5:46 pm Reply

    […] More accurately, I’m skeptical of our ability to judge them usefully. The technical summary is mouth-watering: unpack one of these boxes, plug it into your storage network, and a few seconds later the load on your storage and network plummets “by up to 80 percent”. That kind of factor translates in business terms to the potential to skip a couple years of storage purchases, and to slashing power, cooling, and space requirements. […]

Leave a Reply

Your email address will not be published. Required fields are marked *