You merge two customer lists. You import last quarter's sales data alongside this quarter's. You pull records from a CRM export and paste them below yesterday's version. Somewhere in those operations, duplicates crept in — and now your totals are wrong, your mail merge is sending two letters to the same person, and your pivot table counts one sale twice.

Duplicates are the most common data quality problem in Excel, and one of the most dangerous. They don't announce themselves. They sit quietly in your data, inflating numbers and corrupting analysis until someone notices the revenue figure doesn't match the bank statement.

Here are four ways to find and remove them — from the simplest click to the most powerful workflow.

Method 1: The Built-In "Remove Duplicates" Button

Excel has a dedicated tool for this, and for straightforward cases it's all you need.

Steps:

Click anywhere inside your data range.
Go to Data tab → Remove Duplicates.
Excel selects all columns by default. Uncheck any columns you want to ignore when comparing rows (for example, you might ignore a "Date Added" column and match only on name and email).
Click OK.
Excel tells you how many duplicates were removed and how many unique values remain.

What to know:

Excel keeps the first occurrence and deletes subsequent matches. If you want to keep the most recent entry instead, sort your data by date (newest first) before running the tool.
This is destructive — duplicates are deleted immediately. Save a copy of your file first or work on a duplicate sheet.
It compares cell values exactly. "John Smith" and "john smith" are different. "123 Main St" and "123 Main Street" are different. This matters more than you'd think.

For most one-time cleanup jobs, this is the right method. Fast, built-in, no formulas needed.

Method 2: COUNTIF to Flag Duplicates Before Deleting

Sometimes you don't want to delete duplicates blindly — you want to see them first, decide which to keep, or understand why they exist. COUNTIF lets you flag duplicates without touching your data.

Steps:

Add a helper column next to your data (e.g., column F, header: "Duplicate?").
In cell F2, enter this formula:

=IF(COUNTIF($A$2:$A2,A2)>1,"Duplicate","")

Replace $A$2:$A2 with the column you're checking for duplicates (e.g., email addresses). The mixed reference $A$2:$A2 expands as you drag down — this is what makes it detect the second occurrence onward.

Drag the formula down to the last row of your data.
Filter column F for "Duplicate" to review flagged rows.
Delete the flagged rows manually, or keep them if they turn out to be legitimate.

Checking multiple columns:

To flag duplicates based on a combination of columns (e.g., first name + last name + email), use a concatenated check:

=IF(COUNTIFS($A$2:$A2,A2,$B$2:$B2,B2,$C$2:$C2,C2)>1,"Duplicate","")

Why use this over Method 1: You get to review before deleting. In datasets where duplicates might be legitimate (same name, different person), this extra step prevents data loss.

Method 3: Conditional Formatting to Highlight Duplicates

If you want a visual scan rather than a formula column, conditional formatting highlights duplicate values directly in the cells.

Steps:

Select the range you want to check (e.g., the email column, A2:A5000).
Go to Home tab → Conditional Formatting → Highlight Cells Rules → Duplicate Values.
Choose a formatting style (the default red fill works well).
Click OK.

Every cell containing a value that appears more than once in the range is now highlighted — including the first occurrence.

Limitations:

This works on a single column only. You can't highlight based on a combination of columns without a custom formula rule.
It highlights all occurrences, not just the extras. You still need to decide which one to keep.
It's visual only — it doesn't delete anything or create a filterable flag.

When this is useful: Quick data audits. When you're eyeballing a list and want to spot patterns — maybe the same customer appears five times because five different salespeople entered them. That's a process problem, not just a data problem.

Method 4: Power Query for Deduplication at Scale

When you're working with tens of thousands of rows, refreshable data sources, or recurring imports, Power Query is the professional-grade tool.

Steps:

Select your data and go to Data tab → From Table/Range to load it into Power Query.
In the Power Query Editor, select the column(s) you want to deduplicate on.
Go to Home tab → Remove Rows → Remove Duplicates.
Click Close & Load to send the cleaned data back to Excel.

Why Power Query is different:

Non-destructive. Your source data stays untouched. The output is a separate, refreshable table.
Repeatable. Next time you update the source data, just click Refresh. The deduplication runs again automatically.
Chainable. You can combine deduplication with other transformations — rename columns, filter rows, merge tables — in a single pipeline. This pairs well with broader Excel reporting automation patterns.
Handles large datasets. Power Query processes data outside the worksheet engine, so it handles hundreds of thousands of rows without the lag you'd get with COUNTIF formulas.

The catch: Power Query still does exact matching. "Jon Smith" and "John Smith" remain two separate records. For the kind of data that arrives from multiple supplier files or PDF extractions, exact matching isn't always enough.

The Edge Cases Excel Can't Handle

All four methods above share the same fundamental limitation: they compare values character by character. If two cells aren't identical strings, Excel treats them as different records.

Real-world data doesn't work that way.

Near-duplicate names: "John Smith", "Jon Smith", "J. Smith", "Smith, John" — these are probably the same person. Excel has no way to know that.

Address variations: "123 Main St", "123 Main Street", "123 Main St.", "123 Main St, Apt 2" — same building, different strings.

Inconsistent formatting: "ABC-1234" vs "ABC 1234" vs "abc1234". Phone numbers with and without country codes. Dates as text in three different formats.

Merged datasets with different schemas: When you combine data from two systems, the same customer might have different IDs, slightly different names, and addresses entered by different people.

You can write increasingly complex formulas to handle some of these — TRIM, LOWER, SUBSTITUTE chains, fuzzy matching with helper columns. But each rule you add is brittle, handles one specific variation, and misses others. This is exactly the kind of maintenance spiral described in The Hidden Cost of Excel Automation.

If your data were always clean and consistent, you wouldn't have a duplicate problem in the first place. The messiness that creates duplicates is the same messiness that makes them hard to find.

Let AI Handle the Messy Deduplication

An AI agent doesn't compare strings. It understands meaning.

Tell it: "Deduplicate this customer list. Match on name and address, even if they're slightly different. Flag uncertain matches for my review."

The agent reads "Jon Smith at 123 Main St" and "John Smith at 123 Main Street" and recognizes them as the same person. It sees "ABC-1234" and "abc 1234" and knows they're the same product code. It identifies that two records share a phone number but have different names and flags it instead of guessing.

What this looks like in practice:

"Found 847 exact duplicates and removed them. Identified 23 probable matches based on name and address similarity — see the Review tab. Flagged 4 records that share an email address but have different names. Original data preserved in the Backup sheet."

You review 27 edge cases instead of scanning 10,000 rows. The obvious duplicates are already gone. The ambiguous ones are waiting for your judgment, with the evidence laid out.

For recurring imports, this gets even more valuable. If you pull data from a PDF extraction or merge supplier files weekly, duplicates aren't a one-time problem — they're a weekly one. Instead of running Remove Duplicates every Monday and hoping you caught everything, describe the rule once:

"Every time new data arrives, deduplicate against the master list. Exact matches: merge automatically. Fuzzy matches: flag for review. Log everything."

The agent applies the same judgment every time, learns which matches you approve, and gets better at resolving ambiguous cases. No formulas to maintain. No VBA to debug. No fragile rules that break when the data changes — and it always does.

For a broader look at what this kind of automation without VBA looks like across different Excel workflows, see how teams are replacing scripts with plain-language instructions.

Still cleaning data by hand?

Exact duplicates are easy. It's the near-duplicates, the fuzzy matches, the "is this the same customer?" decisions that eat your time. That's the part worth automating.

See how Reflexion handles data deduplication — send us a sample file and we'll show you exactly what the agent finds. Or book a quick call to walk through your specific data cleanup challenge.

How to Remove Duplicates in Excel (4 Methods)

Method 1: The Built-In "Remove Duplicates" Button

Method 2: COUNTIF to Flag Duplicates Before Deleting

Method 3: Conditional Formatting to Highlight Duplicates

Method 4: Power Query for Deduplication at Scale

The Edge Cases Excel Can't Handle

Let AI Handle the Messy Deduplication

Still cleaning data by hand?

Cite this article

Continue reading

How Distributors Consolidate Supplier Data in Excel (Without Copy-Pasting)

Automate Excel Reports Without Writing VBA

Excel Reporting: 7 Automation Patterns

I Tried Automating the Same Excel Report with VBA, Python, Zapier, and an AI Agent — Here's What Happened