Revenue Wizards

Services

RevOps Course

Resources

About us

Book a Call

Services

RevOps Course

Resources

About us

Book a Call

How to Python 🐍: Update a CRM and avoid duplicates

Python can be helpful in Revenue Operations. I like to use it when I have to handle data.

Problem: 300 million pairs to check

We had recently the case, that we had to match a list of 30k accounts to a CRM that already had 10k accounts. You can bet there were tons of duplicates.

The main issue is that the CRM used often the brand name (The Lego Group) vs the legal name (Lego A/S). The new file had only legal names. Bummer.

The issue is we have to compare 300 million possible pairs. This crashes graphical tools like Excel or Google Sheets (at least, it did so for me).

Tip: Before importing any data into the CRM, you want to make sure that your data set is in the best possible shape. Once in the CRM it will become way more time intensive (expensive) to make correction.

Solution: Python & ChatGPT to the rescue

Let me guess: Python sounds intimidating? I thought the same! My coding understanding is for from good. Today, it is easier to learn than ever and ChatGPT is a game changer. In a few months, one can get enough understanding to use it for business use cases (like fuzzy matching).

Approach

You will need the two source files to match. In our case:

file1 - source data from CRM with record ID
file2 - new data from data provider

Tip: Make sure there are no duplicates within each file!

Matching variable

Next, each file should have a matching criteria. In our example they all have a field called "Account Name".

Folder Structure

You want to place both files into the same folder with the script. The results will also be later in the folder.

Fuzzy matching

Fuzzy matching is a method for approximately matching string. In English: it matches similar words. Like in the picture:

Here a detailed explanation of the concept.

For our purposes, you only need to know that for a fuzzy match, we need to set a matching %. The percentage indicates the max similarity of the words. Examples:

Too high, e.g. 97% then you will miss matches
Too low, e.g. 84% then you have a higher chance of wrong matches.

A good practice is to use 95-90%. See the 90 in the code. That is the number you want to play with.

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)

You can go below 90 but then I would do a few test runs and check the data manually for some odd matches. It all depends on your source data.

Full code

Below you will find the full script. You only need to make a few changes:

Replace the file structure of each file
Replace the matching variable ('Account Name' in this example)
Set the fuzzy matching %

Once you run the code it might take a few minutes to complete.

Hope this was helpful! It saved me countless hours of work and investing into a tall to solve the problem.

import pandas as pd
from fuzzywuzzy import fuzz, process

# Function to match name using fuzzy matching
def match_name(name, list_names, min_score=0):
    max_score = -1
    max_name = ""
    for name2 in list_names:
        score = fuzz.ratio(name, name2)
        if (score > min_score) & (score > max_score):
            max_name = name2
            max_score = score
    return max_name, max_score

# Reading the data - insert your file structure in there
df3 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file1', encoding='ISO-8859-1')
df4 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file2', encoding='ISO-8859-1')

# Convert the 'Account Name' (insert here your matching variable) columns to strings (had to be done as in some cases strange strings in were in the name)
df3['Account Name'] = df3['Account Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')
df4['Accont Name'] = df4['Accont Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)
    dict_ = {
        "Account_Name_df3": name,
        "match_name_df4": match[0],
        "score": match[1]
    }
    dict_list.append(dict_)

merge_table = pd.DataFrame(dict_list)

# Merging the fuzzy match results with the original data from df3
final_data = pd.merge(merge_table, df3, how='left', left_on='Account_Name_df3', right_on='Account Name')

# Merging the above result with df4 using an 'outer' join to include unmatched rows from df4
final_data = pd.merge(final_data, df4, how='outer', left_on='match_name_df4', right_on='Accont Name')

# Removing redundant columns
final_data.drop(columns=['Account_Name_df3', 'match_name_df4'], inplace=True)

# Exporting the result to a CSV file
final_data.to_csv('result.csv', index=False)

print("Fuzzy matching completed and results saved to 'result.csv'")

Python can be helpful in Revenue Operations. I like to use it when I have to handle data.

Problem: 300 million pairs to check

We had recently the case, that we had to match a list of 30k accounts to a CRM that already had 10k accounts. You can bet there were tons of duplicates.

The main issue is that the CRM used often the brand name (The Lego Group) vs the legal name (Lego A/S). The new file had only legal names. Bummer.

The issue is we have to compare 300 million possible pairs. This crashes graphical tools like Excel or Google Sheets (at least, it did so for me).

Solution: Python & ChatGPT to the rescue

Approach

You will need the two source files to match. In our case:

file1 - source data from CRM with record ID
file2 - new data from data provider

Tip: Make sure there are no duplicates within each file!

Matching variable

Next, each file should have a matching criteria. In our example they all have a field called "Account Name".

Folder Structure

You want to place both files into the same folder with the script. The results will also be later in the folder.

Fuzzy matching

Fuzzy matching is a method for approximately matching string. In English: it matches similar words. Like in the picture:

Here a detailed explanation of the concept.

For our purposes, you only need to know that for a fuzzy match, we need to set a matching %. The percentage indicates the max similarity of the words. Examples:

Too high, e.g. 97% then you will miss matches
Too low, e.g. 84% then you have a higher chance of wrong matches.

A good practice is to use 95-90%. See the 90 in the code. That is the number you want to play with.

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)

You can go below 90 but then I would do a few test runs and check the data manually for some odd matches. It all depends on your source data.

Full code

Below you will find the full script. You only need to make a few changes:

Replace the file structure of each file
Replace the matching variable ('Account Name' in this example)
Set the fuzzy matching %

Once you run the code it might take a few minutes to complete.

Hope this was helpful! It saved me countless hours of work and investing into a tall to solve the problem.

import pandas as pd
from fuzzywuzzy import fuzz, process

# Function to match name using fuzzy matching
def match_name(name, list_names, min_score=0):
    max_score = -1
    max_name = ""
    for name2 in list_names:
        score = fuzz.ratio(name, name2)
        if (score > min_score) & (score > max_score):
            max_name = name2
            max_score = score
    return max_name, max_score

# Reading the data - insert your file structure in there
df3 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file1', encoding='ISO-8859-1')
df4 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file2', encoding='ISO-8859-1')

# Convert the 'Account Name' (insert here your matching variable) columns to strings (had to be done as in some cases strange strings in were in the name)
df3['Account Name'] = df3['Account Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')
df4['Accont Name'] = df4['Accont Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)
    dict_ = {
        "Account_Name_df3": name,
        "match_name_df4": match[0],
        "score": match[1]
    }
    dict_list.append(dict_)

merge_table = pd.DataFrame(dict_list)

# Merging the fuzzy match results with the original data from df3
final_data = pd.merge(merge_table, df3, how='left', left_on='Account_Name_df3', right_on='Account Name')

# Merging the above result with df4 using an 'outer' join to include unmatched rows from df4
final_data = pd.merge(final_data, df4, how='outer', left_on='match_name_df4', right_on='Accont Name')

# Removing redundant columns
final_data.drop(columns=['Account_Name_df3', 'match_name_df4'], inplace=True)

# Exporting the result to a CSV file
final_data.to_csv('result.csv', index=False)

print("Fuzzy matching completed and results saved to 'result.csv'")

Blog

Stay updated with our Revenue Blog

See all Posts

Do you need a ticketing system in RevOps?

May 23, 2024

Streamline RevOps with an efficient ticketing system! Manage Marketing’s reports, Sales’ territories, and Customer Success’s scores effortlessly. Discover how to optimize requests, prioritize tasks, and maintain backlog health for smooth operations.

HubSpot vs Salesforce: Which CRM to choose in 2024

Dec 12, 2023

Salesforce vs. HubSpot: Uncover the Best CRM for Your Business - Insights on User-Friendliness, Complexity, and Ideal User Profiles

How Can RevOps Help with Fundraising?

Oct 1, 2023

RevOps is strategic, close to the business and data-driven. This blog shows how RevOps can help to close the next funding round.

Image of founder looking for Product-Market Fit

6 Ways RevOps Makes Product-Market Fit Inevitable

Sep 20, 2023

PMF it tough. 6 things how RevOps can help you reach it faster and not run out of money.

Where Should RevOps Report to? It is not what you think

Sep 8, 2023

Reporting lines are important for Revenue Operations. They will have an impact on how effective RevOps will be in your organization.

Revenue Wizards

Revenue Wizards

How to Python 🐍: Update a CRM and avoid duplicates

How to Python 🐍: Update a CRM and avoid duplicates