code-example
code-example
code-example

Aug 30, 2023

How to Python 🐍: Update a CRM and avoid duplicates

How to Python 🐍: Update a CRM and avoid duplicates

How to Python 🐍: Update a CRM and avoid duplicates

Python can be helpful in Revenue Operations. I like to use it when I have to handle data.

Problem: 300 million pairs to check

We had recently the case, that we had to match a list of 30k accounts to a CRM that already had 10k accounts. You can bet there were tons of duplicates.

The main issue is that the CRM used often the brand name (The Lego Group) vs the legal name (Lego A/S). The new file had only legal names. Bummer.

The issue is we have to compare 300 million possible pairs. This crashes graphical tools like Excel or Google Sheets (at least, it did so for me).

Tip: Before importing any data into the CRM, you want to make sure that your data set is in the best possible shape. Once in the CRM it will become way more time intensive (expensive) to make correction.


Solution: Python & ChatGPT to the rescue

Let me guess: Python sounds intimidating? I thought the same! My coding understanding is for from good. Today, it is easier to learn than ever and ChatGPT is a game changer. In a few months, one can get enough understanding to use it for business use cases (like fuzzy matching).


Approach

You will need the two source files to match. In our case:

  • file1 - source data from CRM with record ID

  • file2 - new data from data provider

Tip: Make sure there are no duplicates within each file!

Matching variable

Next, each file should have a matching criteria. In our example they all have a field called "Account Name".

Folder Structure

You want to place both files into the same folder with the script. The results will also be later in the folder.

Fuzzy matching

Fuzzy matching is a method for approximately matching string. In English: it matches similar words. Like in the picture:

Here a detailed explanation of the concept.

For our purposes, you only need to know that for a fuzzy match, we need to set a matching %. The percentage indicates the max similarity of the words. Examples:

  • Too high, e.g. 97% then you will miss matches

  • Too low, e.g. 84% then you have a higher chance of wrong matches.

A good practice is to use 95-90%. See the 90 in the code. That is the number you want to play with.

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)

You can go below 90 but then I would do a few test runs and check the data manually for some odd matches. It all depends on your source data.


Full code

Below you will find the full script. You only need to make a few changes:

  1. Replace the file structure of each file

  2. Replace the matching variable ('Account Name' in this example)

  3. Set the fuzzy matching %

Once you run the code it might take a few minutes to complete.

Hope this was helpful! It saved me countless hours of work and investing into a tall to solve the problem.

import pandas as pd
from fuzzywuzzy import fuzz, process

# Function to match name using fuzzy matching
def match_name(name, list_names, min_score=0):
    max_score = -1
    max_name = ""
    for name2 in list_names:
        score = fuzz.ratio(name, name2)
        if (score > min_score) & (score > max_score):
            max_name = name2
            max_score = score
    return max_name, max_score

# Reading the data - insert your file structure in there
df3 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file1', encoding='ISO-8859-1')
df4 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file2', encoding='ISO-8859-1')

# Convert the 'Account Name' (insert here your matching variable) columns to strings (had to be done as in some cases strange strings in were in the name)
df3['Account Name'] = df3['Account Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')
df4['Accont Name'] = df4['Accont Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)
    dict_ = {
        "Account_Name_df3": name,
        "match_name_df4": match[0],
        "score": match[1]
    }
    dict_list.append(dict_)

merge_table = pd.DataFrame(dict_list)

# Merging the fuzzy match results with the original data from df3
final_data = pd.merge(merge_table, df3, how='left', left_on='Account_Name_df3', right_on='Account Name')

# Merging the above result with df4 using an 'outer' join to include unmatched rows from df4
final_data = pd.merge(final_data, df4, how='outer', left_on='match_name_df4', right_on='Accont Name')

# Removing redundant columns
final_data.drop(columns=['Account_Name_df3', 'match_name_df4'], inplace=True)

# Exporting the result to a CSV file
final_data.to_csv('result.csv', index=False)

print("Fuzzy matching completed and results saved to 'result.csv'")

Python can be helpful in Revenue Operations. I like to use it when I have to handle data.

Problem: 300 million pairs to check

We had recently the case, that we had to match a list of 30k accounts to a CRM that already had 10k accounts. You can bet there were tons of duplicates.

The main issue is that the CRM used often the brand name (The Lego Group) vs the legal name (Lego A/S). The new file had only legal names. Bummer.

The issue is we have to compare 300 million possible pairs. This crashes graphical tools like Excel or Google Sheets (at least, it did so for me).

Tip: Before importing any data into the CRM, you want to make sure that your data set is in the best possible shape. Once in the CRM it will become way more time intensive (expensive) to make correction.


Solution: Python & ChatGPT to the rescue

Let me guess: Python sounds intimidating? I thought the same! My coding understanding is for from good. Today, it is easier to learn than ever and ChatGPT is a game changer. In a few months, one can get enough understanding to use it for business use cases (like fuzzy matching).


Approach

You will need the two source files to match. In our case:

  • file1 - source data from CRM with record ID

  • file2 - new data from data provider

Tip: Make sure there are no duplicates within each file!

Matching variable

Next, each file should have a matching criteria. In our example they all have a field called "Account Name".

Folder Structure

You want to place both files into the same folder with the script. The results will also be later in the folder.

Fuzzy matching

Fuzzy matching is a method for approximately matching string. In English: it matches similar words. Like in the picture:

Here a detailed explanation of the concept.

For our purposes, you only need to know that for a fuzzy match, we need to set a matching %. The percentage indicates the max similarity of the words. Examples:

  • Too high, e.g. 97% then you will miss matches

  • Too low, e.g. 84% then you have a higher chance of wrong matches.

A good practice is to use 95-90%. See the 90 in the code. That is the number you want to play with.

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)

You can go below 90 but then I would do a few test runs and check the data manually for some odd matches. It all depends on your source data.


Full code

Below you will find the full script. You only need to make a few changes:

  1. Replace the file structure of each file

  2. Replace the matching variable ('Account Name' in this example)

  3. Set the fuzzy matching %

Once you run the code it might take a few minutes to complete.

Hope this was helpful! It saved me countless hours of work and investing into a tall to solve the problem.

import pandas as pd
from fuzzywuzzy import fuzz, process

# Function to match name using fuzzy matching
def match_name(name, list_names, min_score=0):
    max_score = -1
    max_name = ""
    for name2 in list_names:
        score = fuzz.ratio(name, name2)
        if (score > min_score) & (score > max_score):
            max_name = name2
            max_score = score
    return max_name, max_score

# Reading the data - insert your file structure in there
df3 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file1', encoding='ISO-8859-1')
df4 = pd.read_csv('/Users/harisodobasic/Projects/duplication/file2', encoding='ISO-8859-1')

# Convert the 'Account Name' (insert here your matching variable) columns to strings (had to be done as in some cases strange strings in were in the name)
df3['Account Name'] = df3['Account Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')
df4['Accont Name'] = df4['Accont Name'].apply(lambda x: str(x) if not pd.isnull(x) else '')

# Performing the fuzzy matching
dict_list = []
for name in df3['Account Name']:
    match = match_name(name, df4['Accont Name'], 90)
    dict_ = {
        "Account_Name_df3": name,
        "match_name_df4": match[0],
        "score": match[1]
    }
    dict_list.append(dict_)

merge_table = pd.DataFrame(dict_list)

# Merging the fuzzy match results with the original data from df3
final_data = pd.merge(merge_table, df3, how='left', left_on='Account_Name_df3', right_on='Account Name')

# Merging the above result with df4 using an 'outer' join to include unmatched rows from df4
final_data = pd.merge(final_data, df4, how='outer', left_on='match_name_df4', right_on='Accont Name')

# Removing redundant columns
final_data.drop(columns=['Account_Name_df3', 'match_name_df4'], inplace=True)

# Exporting the result to a CSV file
final_data.to_csv('result.csv', index=False)

print("Fuzzy matching completed and results saved to 'result.csv'")

Blog

Blog

Stay updated with our Revenue Blog

Stay updated with our Revenue Blog

See all Posts

bdr stats
bdr stats

The BDR Productivity Blueprint: Ramp Faster, Retain Longer, Close More

The BDR Productivity Blueprint: Ramp Faster, Retain Longer, Close More

Mar 3, 2024

Boost BDR productivity and revenue by cutting ramp time and boosting tenure with AI, mentorship, culture, and strategic sales enablement.

hubspot vs salesforce
hubspot vs salesforce

HubSpot vs Salesforce: Which CRM to choose in 2024

HubSpot vs Salesforce: Which CRM to choose in 2024

Dec 12, 2023

Salesforce vs. HubSpot: Uncover the Best CRM for Your Business - Insights on User-Friendliness, Complexity, and Ideal User Profiles

picture of an email
picture of an email

Top Email Outbound Practices for 2024

Top Email Outbound Practices for 2024

Nov 16, 2023

Best practices on how to send outbound cold emails in 2024. As a response to googles new sender guidelines

what is revenue operations
what is revenue operations

What is Revenue Operations (RevOps)?

What is Revenue Operations (RevOps)?

Oct 19, 2023

Learn how Revenue Operations partners with Go-to-Market Teams to grow revenue, reduce GTM costs, and increase profitability.

revops help fundrainig
revops help fundrainig

How Can RevOps Help with Fundraising?

How Can RevOps Help with Fundraising?

Oct 1, 2023

RevOps is strategic, close to the business and data-driven. This blog shows how RevOps can help to close the next funding round.

Image of founder looking for Product-Market Fit
Image of founder looking for Product-Market Fit

6 Ways RevOps Makes Product-Market Fit Inevitable

6 Ways RevOps Makes Product-Market Fit Inevitable

Sep 20, 2023

PMF it tough. 6 things how RevOps can help you reach it faster and not run out of money.

Chart on where RevOps should report to
Chart on where RevOps should report to

Where Should RevOps Report to? It is not what you think

Where Should RevOps Report to? It is not what you think

Sep 8, 2023

Reporting lines are important for Revenue Operations. They will have an impact on how effective RevOps will be in your organization.

code-example
code-example

How to Python 🐍: Update a CRM and avoid duplicates

How to Python 🐍: Update a CRM and avoid duplicates

Aug 30, 2023

Python can be helpful in Revenue Operations. This blog looks at a live case how we helped a client dealing with duplicated and updating the CRM.

7 RevOps best practices
7 RevOps best practices

7 RevOps Best Practices for Faster Growth That You Can't Miss

7 RevOps Best Practices for Faster Growth That You Can't Miss

Aug 24, 2023

Highlighting 7 Revenue Operations (RevOps) best practices anyone should know. Point 3 will surprise you.

Founder In Space
Founder In Space

The Founders Guide to RevOps: Strategy, Processes, Technology & Analytics [2/2]

The Founders Guide to RevOps: Strategy, Processes, Technology & Analytics [2/2]

Aug 24, 2023

Revevnue Operations can be broken down into four main parts: Strategy, Processes, Technology & Analytics. Learn how each of them helps business grow.

illustration founder
illustration founder

The Founder's Guide to Revenue Operations [1/2]: Structure & Hiring

The Founder's Guide to Revenue Operations [1/2]: Structure & Hiring

Aug 14, 2023

What is RevOps? Does it really decrease in GTM expenses by 30%? How do I get started. Check this blog with answers.

illustration silo
illustration silo

Breaking Down the Marketing Silo with RevOps

Breaking Down the Marketing Silo with RevOps

Aug 8, 2023

Breaking Down the Silo Series: We know that collaboration creates growth. This episode is about how to align the Marketing team.

illustration data
illustration data

Revenue Operations vs Sales Operations

Revenue Operations vs Sales Operations

Aug 4, 2023

How does Revenue Operations (RevOps) differ from Sales Operations? RevOps is more strategic. This blog explains why.