Making de-dupe more efficient

By Datalink
4/22/2011

Hi, I’m Mike Spindler. I’m the practice manager for all things data protection at Datalink. I’ll be joining my counterparts in this blog space and will offer data protection insights from time to time. Most often, my posts will reflect conversations that I’m having with customers, partners, or peers. Data de-duplication continues to be a hot topic. So, I’ll start there.

It’s interesting how the de-duplication market has evolved over the relatively short time that the technology has been around. Just a short time ago, IT folks were asking me questions regarding “inline versus post process,” “hashing versus delta difference,” “fixed versus block,” “source versus target,” or “disk versus VTL.”  Most of these questions came up as people were exploring whether or not to bring the technology to their data centers.

Now, one or two years later, IT folks are asking “how do I optimize de-duplication?” and/or “what types of data de-dupe well and which de-dupe poorly?” Since every users’ data is different, and the mix of the different data types is unique, de-duplication levels can vary. Beyond these things, a number of other factors impact de-dedupe efficiency, including retention, compression, or encryption, as well as implementation and management.

I’ll address each of these factors in upcoming posts and videos. Check out my first video to hear my thoughts on how compression (whether file or database type) impacts de-duplication efficiency and how to address it.