View Single Post
  #5   Report Post  
Posted to microsoft.public.excel.programming
Pete_UK Pete_UK is offline
external usenet poster
 
Posts: 8,856
Default Comparing Similarity of Strings and Returning Best Match

I thought I responded to this yesterday, but the post seems to have
disappeared. A search of Google Groups for Fuzzy match revealed (among
others) this post from a few years ago:

http://groups.google.com/group/micro...e82a2ec bd04b

The macro from Harlan Grove seems to be on the track of what you are
looking for.

Hope this helps.

Pete

On Dec 14, 11:37*pm, AlanR wrote:
Dear all
I have been tasked with a very manual process comparing thousands of
rows of data from 2 data dumps. I have to locate a 'best match'
Product Code from one data dump and copy its value to another data
dump. One data dump has the following columns: PO No, PO line No,
Product Code, Value. The second data dump has only PO No and Product
Code. The idea is on the second data dump to look up the relevant PO
number and most similar Product code and return the value from the
first data dump. The trouble is the product codes in both lists are
slightly different. Sometimes the beginning of the code is different,
sometimes the middle and sometimes the end! However, the characters in
both types of product codes are around 90-95% similar and easily
spottable by manual comparison. To automate this, I have tried a
number of methods so far to do a closest match, for example a vlookup
using TRUE, and I have also tried the 'FuzzyMatch' user-defined-
function that I saw on Mr Excel. Unfortunately neither is providing me
with a workable solution. Therefore I thought I'd ask the experts to
see if what I want is possible via a VBA macro. If so, I would be very
grateful of your help as doing this exercise manually is very
disheartening! Thank you, AlanR.
Example of the first data dump:
PO No * Line No Product Code * *Value
45001 * 1 * * * AB-ZZZ-HHH-45T *100
45001 * 2 * * * TY-55555-99-ZA *110
45001 * 3 * * * CCC-MODEL-XX-YYY * * * *120
45001 * 4 * * * YYYYY_35-KLMN * 130
45002 * 1 * * * TY-55555-99-ZA *140
45002 * 2 * * * CCC-MODEL-XX-YYY * * * *150
45002 * 3 * * * AB-ZZZ-HHH-45T *160
45003 * 1 * * * YYYYY_35-KLMN * 170
45003 * 2 * * * AB-ZZZ-HHH-45T *180
45004 * 1 * * * CCC-MODEL-XX-YYY * * * *190
45004 * 2 * * * KIT-MODEL678 * *200
45004 * 3 * * * HYT-JJJ-TOP10 * 210
45004 * 4 * * * AB-ZZZ-HHH-45T *220
45004 * 5 * * * GTO-GTOP-25L * *230
45004 * 6 * * * YYYYY_35-KLMN * 240
45004 * 7 * * * TY-55555-99-ZA *250

Example of the second data dump (including the 4 columns I would like
to auto-populate using VBA or formulae)
PO No. *Product Code * * * * * * * *Closest Match Line No. * *Closest
Match Product Code * Similarity% * Value
45001 * CCC-MODEL-XX-YYY-35
45001 * 45-AB-ZZZ-HHH-45T
45001 * YYYYY_35-MODEL-KLMN
45001 * TY-55555-99C-ZA
45002 * CCC-MODEL-XX-YYY-35
45002 * 45-AB-ZZZ-HHH-45T
45002 * TY-55555-99C-ZA
45003 * 45-AB-ZZZ-HHH-45T
45003 * YYYYY_35-MODEL-KLMN
45004 * KIT-MODEL678_A
45004 * CCC-MODEL-XX-YYY-35
45004 * 45-AB-ZZZ-HHH-45T
45004 * HYT-JJJ-TOP10
45004 * YYYYY_35-MODEL-KLMN
45004 * GTO-GTOP-25L
45004 * TY-55555-99C-ZA