ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Discussion (Misc queries) (https://www.excelbanter.com/excel-discussion-misc-queries/)
-   -   Finding Similar Data in worksheets (https://www.excelbanter.com/excel-discussion-misc-queries/174694-finding-similar-data-worksheets.html)

MEAD5432

Finding Similar Data in worksheets
 
I have a list of contacts and would like to compare it to another list from a
different department to find duplicates.

The issue is that some of the information on both lists are recorded with
slight variations (one includes Inc. while other doesn't, etc...). Normally,
I could mark duplicates using conditional formatting but it only considers
exact duplicates rather than looking for similarity.

Does anyone know of any good plugins to accomplish this or a way to create a
find duplicate function that has a bit more tolerance for variation?

Thanks.

Tfrup12

Finding Similar Data in worksheets
 
Why not try scrubbing the data first, using find & replace to take out Inc,
LLC, or other unwanted characters in both tables.

Then in one of the tables add an additional column that uses a vlookup
formula to compare the Company Name. If it gets a match the Company Name
should pop up in the column, indicating a match. ...If you have other
columns of data such as phone number, url, or email address you might not
have to scrub the data first?



"MEAD5432" wrote:

I have a list of contacts and would like to compare it to another list from a
different department to find duplicates.

The issue is that some of the information on both lists are recorded with
slight variations (one includes Inc. while other doesn't, etc...). Normally,
I could mark duplicates using conditional formatting but it only considers
exact duplicates rather than looking for similarity.

Does anyone know of any good plugins to accomplish this or a way to create a
find duplicate function that has a bit more tolerance for variation?

Thanks.


MEAD5432

Finding Similar Data in worksheets
 
Thanks for the suggestion.

Removing certain parts of the name would probably improve my ability to
check both lists and I am definitely going to give that a shot.

There is, however, the pesky problem of catching names that are spelled
slightly different; things like transposed letters or misplaced hyphens.

This is where is gets problematic. Maybe this is a problem that simply
requires human logic that can't easily be programmed...

Thanks again.

"Tfrup12" wrote:

Why not try scrubbing the data first, using find & replace to take out Inc,
LLC, or other unwanted characters in both tables.

Then in one of the tables add an additional column that uses a vlookup
formula to compare the Company Name. If it gets a match the Company Name
should pop up in the column, indicating a match. ...If you have other
columns of data such as phone number, url, or email address you might not
have to scrub the data first?



"MEAD5432" wrote:

I have a list of contacts and would like to compare it to another list from a
different department to find duplicates.

The issue is that some of the information on both lists are recorded with
slight variations (one includes Inc. while other doesn't, etc...). Normally,
I could mark duplicates using conditional formatting but it only considers
exact duplicates rather than looking for similarity.

Does anyone know of any good plugins to accomplish this or a way to create a
find duplicate function that has a bit more tolerance for variation?

Thanks.


Tfrup12

Finding Similar Data in worksheets
 
Perhaps once you have identified the ones that do not have an exact match you
can sort them out, create two new tables out of the remainder, and then use
'true' in your vlookup formula, (vs. false which yields only exact matches.)

=VLOOKUP(D12,B$12:B$18,1,TRUE)

Make sure you sort your table array 'B$12:B$18' into alphabetical order, or
else you will get miss-matches...which you might get regardless, but its the
only step that I am aware of inside of excel that lies between exact matches
and 'human logic'.

Good Luck!


"MEAD5432" wrote:

Thanks for the suggestion.

Removing certain parts of the name would probably improve my ability to
check both lists and I am definitely going to give that a shot.

There is, however, the pesky problem of catching names that are spelled
slightly different; things like transposed letters or misplaced hyphens.

This is where is gets problematic. Maybe this is a problem that simply
requires human logic that can't easily be programmed...

Thanks again.

"Tfrup12" wrote:

Why not try scrubbing the data first, using find & replace to take out Inc,
LLC, or other unwanted characters in both tables.

Then in one of the tables add an additional column that uses a vlookup
formula to compare the Company Name. If it gets a match the Company Name
should pop up in the column, indicating a match. ...If you have other
columns of data such as phone number, url, or email address you might not
have to scrub the data first?



"MEAD5432" wrote:

I have a list of contacts and would like to compare it to another list from a
different department to find duplicates.

The issue is that some of the information on both lists are recorded with
slight variations (one includes Inc. while other doesn't, etc...). Normally,
I could mark duplicates using conditional formatting but it only considers
exact duplicates rather than looking for similarity.

Does anyone know of any good plugins to accomplish this or a way to create a
find duplicate function that has a bit more tolerance for variation?

Thanks.


Tfrup12

Finding Similar Data in worksheets
 
I can't emphasize enough how careful you have to be when using 'True' in a
vlookup formula for this purpose. It will attempt to match everything, at
which point you will need to visually check your results. But if you have
legitimate similarities only off by a letter or two it will do a pretty good
job of pointing out the potential relationship. However, it will go too far
in most cases.

"MEAD5432" wrote:

Thanks for the suggestion.

Removing certain parts of the name would probably improve my ability to
check both lists and I am definitely going to give that a shot.

There is, however, the pesky problem of catching names that are spelled
slightly different; things like transposed letters or misplaced hyphens.

This is where is gets problematic. Maybe this is a problem that simply
requires human logic that can't easily be programmed...

Thanks again.

"Tfrup12" wrote:

Why not try scrubbing the data first, using find & replace to take out Inc,
LLC, or other unwanted characters in both tables.

Then in one of the tables add an additional column that uses a vlookup
formula to compare the Company Name. If it gets a match the Company Name
should pop up in the column, indicating a match. ...If you have other
columns of data such as phone number, url, or email address you might not
have to scrub the data first?



"MEAD5432" wrote:

I have a list of contacts and would like to compare it to another list from a
different department to find duplicates.

The issue is that some of the information on both lists are recorded with
slight variations (one includes Inc. while other doesn't, etc...). Normally,
I could mark duplicates using conditional formatting but it only considers
exact duplicates rather than looking for similarity.

Does anyone know of any good plugins to accomplish this or a way to create a
find duplicate function that has a bit more tolerance for variation?

Thanks.



All times are GMT +1. The time now is 01:10 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com