February 3, 2025

Best Practices: Creating Deduplication Filters

Here at Cloudingo, we often get asked about the best set of filters to use in deduping. The answer is always, "it depends." Data is unique and specific to the operational needs of any organization.

That said, there is a core set of filter configurations we recommend to help users get started. These configurations also serve as a foundation when we work directly with customers.

The Goal of Deduplication

The primary objective of deduplication is to identify and merge duplicate records. Cloudingo excels at this by matching field values using customizable filters. These filters can leverage any combination of fields and offer flexible matching algorithms, enabling users to fine-tune their approach based on their data.

When developing a data management strategy, we recommend starting with a high-level view. Instead of trying to create a single "perfect" filter, think of data cleansing as a sequential process. Start with tight filters that gradually loosen as you refine your results. This method maximizes automation while minimizing the need for manual review. The ultimate goal is to achieve clean data quickly and efficiently.

Categories of Filters

There are three main categories to consider when configuring filters:

  1. Leads and Contacts: These objects often share key data points useful for duplicate matching.
  2. Accounts: Focus on fields unique to accounts, like company name and address.
  3. Lead-to-Contact or Lead-to-Account Matching: Helps consolidate records as leads progress through your system.

Below are the most reliable key fields for each category to get started:

Leads & Contacts

  • Email
  • Last Name
  • First Name
  • Phone
  • Company or Account Name

Accounts

  • Company or Account Name
  • Phone
  • Address (billing or shipping)
  • Website

Getting Started with Filters

Start with filters that use exact matches across key fields. These filters offer high reliability and are great for cleaning the "low-hanging fruit" of bad data. Once the initial pass is complete, you can adjust matching algorithms to uncover deeper matches that may require review.

Recommended Initial Filter Configurations:

Leads & Contacts

  • Email + Last Name + First Name
  • Email + Account Name
  • Last Name + First Name + Account Name
  • Last Name + First Name + Phone
  • Email + Last Name

Accounts

  • Account Name + Address
  • Account Name + Phone
  • Account Name + Website

Lead-to-Contact Matching

  • Email + Last Name + First Name
  • Email + Last Name
  • Email + Account Name
  • Last Name + First Name + Account Name

Lead-to-Account Matching

  • Account Name
  • Email /Website

Adjusting Matching Algorithms

After your initial pass with exact matches, refine your filters by loosening the matching algorithms. Here are our recommendations for commonly used fields:

  • Email: Keep this set to Exact, as email addresses are typically unique.
  • Last Name: Use a First N Characters match, where "N" is user-defined. Review results carefully to avoid false positives.
  • First Name: Add Synonym matching. This matches formal and informal variants, like Stephen, Steven, and Steve. Cloudingo's synonym lexicon is fully customizable.
  • Phone: Use Numeric Only matching to handle formatting variations (e.g., (972) 241-1534 vs. 9722411534).
  • Company/Account Name: Use the proprietary Company Clean algorithm, which normalizes variations in business names (e.g., Acme, Acme Inc., The Acme Company).
  • Website: Use the Domain algorithm to focus on the core portion of web addresses (e.g., cloudingo.com).
  • City: Use Exact or Fuzzy (or "sounds like").
  • State/Country: Use Synonym matching (e.g., US/United States, TX/Texas).
  • Street: Use Fuzzy matching to account for variations like “1234 North Street” and “1234 N. St.” Avoid general Fuzzy algorithms for early-stage data cleaning.

Experimentation and Review

One of Cloudingo's strengths is its ability to let you experiment without risk. Filters act as reports, showing you potential duplicates without altering your data until you submit a merge request. Don't hesitate to explore different configurations to see what works best for your data.

That said, always take into account the uniqueness of your data model and operational needs. You may have additional key fields relevant to your organization, or you might need to create subsets of data using Cloudingo's scope settings (ex: isolating records by record type).

Best Practices

  1. Start Tight: Begin with exact matches to clean the most obvious duplicates.
  2. Loosen Gradually: Refine filters to uncover hidden matches but proceed with caution.
  3. Review Thoroughly: Load filter results in Cloudingo's UI and inspect duplicate groups before merging.
  4. Test First: Run tests before executing mass cleanups.

By following these best practices and leveraging Cloudingo's powerful matching algorithms, you can elevate the quality of your data quickly and confidently. Remember, clean data is the foundation for better decision-making, improved AI output, and operational efficiency.

Want to learn more? Watch this quick overview of filters here.

What's to gain with high quality data?

Learn more about how your team can become a bigger player with clean, high quality data.

Leave A Comment