A Simple Approach to Wiping out Dirty Data

Thursday, March 8, 2018

A Simple Approach to Wiping out Dirty Data

When you gather long lists of customer data organically or combine databases, you will almost always accumulate some bad data. Bad data can take many forms.


Bad data can cause you a lot of problems. Keeping it from ever entering your database is almost impossible. Fortunately, techniques do exist for cleaning up bad data. Here are some of the best:

1. Drop duplicates with software

The easiest way to deal with duplicates is to keep them from ever making it into your database in the first place. That won't always happen, though. Without fail, when you start building up a sizable database of leads, duplicates will start to find their way in.

Once duplicate data has actually found its way into your database, you need to delete it. Keep any good data, though. When you find a duplicate, look at both duplicate records and merge them together into one record that contains any important un-duplicated data from both records. You won't always need to merge records; if one record is out of date, you may be able to delete it without losing any important information.

No one wants to sift through a large database record by record to find duplicates. Your CRM software, whether hosted on a cloud database like Amazon Redshift or your personal computer, should be capable of detecting duplicates and merging customer records. Run your lists through the software to clear the lion's share of duplicates.

2. Use a semi-interactive approach

You should wipe out as many duplicates by automated means as you can. If a computer can link two records by a unique identifier like social security number, it can be 100% sure that the records are duplicates. Often the situation isn't that cut and dried.

A computer can't always tell for sure when two records are duplicates, but with fuzzy matching it can detect that two records might be duplicates. Yet without the right knowledge, a computer can't make the call. That's where you come in.

Use computer software to detect and flag near-duplicates and then look at them yourself. Is one record a misspelling of another? Does a record point to the same person at a different address? Tell the computer to merge the duplicates.

3. Delete dead records

If you have a lot of inactive contacts, you should carefully consider what to do with them. You might want to keep some of them in your database, but you may find that many should be removed. Contacts who aren't interested might not want to hear from you, and you won't want to waste time and money pursuing clients who don't receive your message.

dirty data databases
Image Source

4. Canonize the data

It's a good idea to canonize all of the data in your database and to make sure it follows rational rules. For example, you might want to run all addresses past the post office to make sure that they are valid. Free text country fields can be reduced to country codes. Dates can be formatted uniformly.

You can make a lot of these changes with a spreadsheet, even better is to have CRM software do it for you. Once your data is canonized, it will be a lot easier to segment and model.

5. Cut database cruft

Database cruft exists in the form of garbage records such as fake email addresses entered by Internet users who just want to make a dialog box disappear. You might be flagged as a spammer if you email thousands of fake email addresses, so make sure that your marketing software checks which email addresses are real. Execute administrative actions on the fake ones so you don't target them with email campaigns.


Bad data is a problem that you will deal with throughout your career. Through a continuous, systematic approach, you can keep your database squeaky clean.


By  Robert Cordray Embed

About the Author - Robert Cordray is a former business consultant and entrepreneur with over 20 years of experience and a wide variety of knowledge in multiple areas of the industry. He currently resides in the Southern California area and spends his time helping consumers and business owners alike try to be successful. When he’s not reading or writing, he’s most likely with his beautiful wife and three children.



0 comments:

Post a Comment