Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Dear all
I have been tasked with a very manual process comparing thousands of rows of data from 2 data dumps. I have to locate a 'best match' Product Code from one data dump and copy its value to another data dump. One data dump has the following columns: PO No, PO line No, Product Code, Value. The second data dump has only PO No and Product Code. The idea is on the second data dump to look up the relevant PO number and most similar Product code and return the value from the first data dump. The trouble is the product codes in both lists are slightly different. Sometimes the beginning of the code is different, sometimes the middle and sometimes the end! However, the characters in both types of product codes are around 90-95% similar and easily spottable by manual comparison. To automate this, I have tried a number of methods so far to do a closest match, for example a vlookup using TRUE, and I have also tried the 'FuzzyMatch' user-defined- function that I saw on Mr Excel. Unfortunately neither is providing me with a workable solution. Therefore I thought I'd ask the experts to see if what I want is possible via a VBA macro. If so, I would be very grateful of your help as doing this exercise manually is very disheartening! Thank you, AlanR. Example of the first data dump: PO No Line No Product Code Value 45001 1 AB-ZZZ-HHH-45T 100 45001 2 TY-55555-99-ZA 110 45001 3 CCC-MODEL-XX-YYY 120 45001 4 YYYYY_35-KLMN 130 45002 1 TY-55555-99-ZA 140 45002 2 CCC-MODEL-XX-YYY 150 45002 3 AB-ZZZ-HHH-45T 160 45003 1 YYYYY_35-KLMN 170 45003 2 AB-ZZZ-HHH-45T 180 45004 1 CCC-MODEL-XX-YYY 190 45004 2 KIT-MODEL678 200 45004 3 HYT-JJJ-TOP10 210 45004 4 AB-ZZZ-HHH-45T 220 45004 5 GTO-GTOP-25L 230 45004 6 YYYYY_35-KLMN 240 45004 7 TY-55555-99-ZA 250 Example of the second data dump (including the 4 columns I would like to auto-populate using VBA or formulae) PO No. Product Code Closest Match Line No. Closest Match Product Code Similarity% Value 45001 CCC-MODEL-XX-YYY-35 45001 45-AB-ZZZ-HHH-45T 45001 YYYYY_35-MODEL-KLMN 45001 TY-55555-99C-ZA 45002 CCC-MODEL-XX-YYY-35 45002 45-AB-ZZZ-HHH-45T 45002 TY-55555-99C-ZA 45003 45-AB-ZZZ-HHH-45T 45003 YYYYY_35-MODEL-KLMN 45004 KIT-MODEL678_A 45004 CCC-MODEL-XX-YYY-35 45004 45-AB-ZZZ-HHH-45T 45004 HYT-JJJ-TOP10 45004 YYYYY_35-MODEL-KLMN 45004 GTO-GTOP-25L 45004 TY-55555-99C-ZA |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
If you ask me - you're up for an uphill struggle.
Try googling for: regexp vba might give you some clues. n Dec 14, 11:37*pm, AlanR wrote: Dear all I have been tasked with a very manual process comparing thousands of rows of data from 2 data dumps. I have to locate a 'best match' Product Code from one data dump and copy its value to another data dump. One data dump has the following columns: PO No, PO line No, Product Code, Value. The second data dump has only PO No and Product Code. The idea is on the second data dump to look up the relevant PO number and most similar Product code and return the value from the first data dump. The trouble is the product codes in both lists are slightly different. Sometimes the beginning of the code is different, sometimes the middle and sometimes the end! However, the characters in both types of product codes are around 90-95% similar and easily spottable by manual comparison. To automate this, I have tried a number of methods so far to do a closest match, for example a vlookup using TRUE, and I have also tried the 'FuzzyMatch' user-defined- function that I saw on Mr Excel. Unfortunately neither is providing me with a workable solution. Therefore I thought I'd ask the experts to see if what I want is possible via a VBA macro. If so, I would be very grateful of your help as doing this exercise manually is very disheartening! Thank you, AlanR. Example of the first data dump: PO No * Line No Product Code * *Value 45001 * 1 * * * AB-ZZZ-HHH-45T *100 45001 * 2 * * * TY-55555-99-ZA *110 45001 * 3 * * * CCC-MODEL-XX-YYY * * * *120 45001 * 4 * * * YYYYY_35-KLMN * 130 45002 * 1 * * * TY-55555-99-ZA *140 45002 * 2 * * * CCC-MODEL-XX-YYY * * * *150 45002 * 3 * * * AB-ZZZ-HHH-45T *160 45003 * 1 * * * YYYYY_35-KLMN * 170 45003 * 2 * * * AB-ZZZ-HHH-45T *180 45004 * 1 * * * CCC-MODEL-XX-YYY * * * *190 45004 * 2 * * * KIT-MODEL678 * *200 45004 * 3 * * * HYT-JJJ-TOP10 * 210 45004 * 4 * * * AB-ZZZ-HHH-45T *220 45004 * 5 * * * GTO-GTOP-25L * *230 45004 * 6 * * * YYYYY_35-KLMN * 240 45004 * 7 * * * TY-55555-99-ZA *250 Example of the second data dump (including the 4 columns I would like to auto-populate using VBA or formulae) PO No. *Product Code * * * * * * * *Closest Match Line No. * *Closest Match Product Code * Similarity% * Value 45001 * CCC-MODEL-XX-YYY-35 45001 * 45-AB-ZZZ-HHH-45T 45001 * YYYYY_35-MODEL-KLMN 45001 * TY-55555-99C-ZA 45002 * CCC-MODEL-XX-YYY-35 45002 * 45-AB-ZZZ-HHH-45T 45002 * TY-55555-99C-ZA 45003 * 45-AB-ZZZ-HHH-45T 45003 * YYYYY_35-MODEL-KLMN 45004 * KIT-MODEL678_A 45004 * CCC-MODEL-XX-YYY-35 45004 * 45-AB-ZZZ-HHH-45T 45004 * HYT-JJJ-TOP10 45004 * YYYYY_35-MODEL-KLMN 45004 * GTO-GTOP-25L 45004 * TY-55555-99C-ZA |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Dec 15, 6:08*am, AB wrote:
If you ask me - you're up for an uphill struggle. Try googling for: regexp vba might give you some clues. n Dec 14, 11:37*pm, AlanR wrote: Dear all I have been tasked with a very manual process comparing thousands of rows of data from 2 data dumps. I have to locate a 'best match' Product Code from one data dump and copy its value to another data dump. One data dump has the following columns: PO No, PO line No, Product Code, Value. The second data dump has only PO No and Product Code. The idea is on the second data dump to look up the relevant PO number and most similar Product code and return the value from the first data dump. The trouble is the product codes in both lists are slightly different. Sometimes the beginning of the code is different, sometimes the middle and sometimes the end! However, the characters in both types of product codes are around 90-95% similar and easily spottable by manual comparison. To automate this, I have tried a number of methods so far to do a closest match, for example a vlookup using TRUE, and I have also tried the 'FuzzyMatch' user-defined- function that I saw on Mr Excel. Unfortunately neither is providing me with a workable solution. Therefore I thought I'd ask the experts to see if what I want is possible via a VBA macro. If so, I would be very grateful of your help as doing this exercise manually is very disheartening! Thank you, AlanR. Example of the first data dump: PO No * Line No Product Code * *Value 45001 * 1 * * * AB-ZZZ-HHH-45T *100 45001 * 2 * * * TY-55555-99-ZA *110 45001 * 3 * * * CCC-MODEL-XX-YYY * * * *120 45001 * 4 * * * YYYYY_35-KLMN * 130 45002 * 1 * * * TY-55555-99-ZA *140 45002 * 2 * * * CCC-MODEL-XX-YYY * * * *150 45002 * 3 * * * AB-ZZZ-HHH-45T *160 45003 * 1 * * * YYYYY_35-KLMN * 170 45003 * 2 * * * AB-ZZZ-HHH-45T *180 45004 * 1 * * * CCC-MODEL-XX-YYY * * * *190 45004 * 2 * * * KIT-MODEL678 * *200 45004 * 3 * * * HYT-JJJ-TOP10 * 210 45004 * 4 * * * AB-ZZZ-HHH-45T *220 45004 * 5 * * * GTO-GTOP-25L * *230 45004 * 6 * * * YYYYY_35-KLMN * 240 45004 * 7 * * * TY-55555-99-ZA *250 Example of the second data dump (including the 4 columns I would like to auto-populate using VBA or formulae) PO No. *Product Code * * * * * * * *Closest Match Line No. * *Closest Match Product Code * Similarity% * Value 45001 * CCC-MODEL-XX-YYY-35 45001 * 45-AB-ZZZ-HHH-45T 45001 * YYYYY_35-MODEL-KLMN 45001 * TY-55555-99C-ZA 45002 * CCC-MODEL-XX-YYY-35 45002 * 45-AB-ZZZ-HHH-45T 45002 * TY-55555-99C-ZA 45003 * 45-AB-ZZZ-HHH-45T 45003 * YYYYY_35-MODEL-KLMN 45004 * KIT-MODEL678_A 45004 * CCC-MODEL-XX-YYY-35 45004 * 45-AB-ZZZ-HHH-45T 45004 * HYT-JJJ-TOP10 45004 * YYYYY_35-MODEL-KLMN 45004 * GTO-GTOP-25L 45004 * TY-55555-99C-ZA I am assuming that the entire product code and PO are one long string. If they are not, skip this code. First thing you could do is use the following code. Dim txt As String Dim x As Variant Dim i As Long Dim q As Integer txt = Cells(1, 1).Value x = Split(txt, " ") For i = 0 To UBound(x) Cells(i+1, 2) = x(i) Next i Use this code on one string in cells(1,1). This code will break the string at every space. So your first value of 45001 1 AB-ZZZ-HHH-45T 100 would come out as 45001 1 AB-ZZZ-HHH-45T 100 Now you can separate out the product code. Step 2 would be to compare the product code by chunks of strings to a database of known product codes. Consider the comparison of 45-AB-ZZZ- HHH-45T to a known product code of AB-ZZZ-HHH-45T. If you wrote some code to take 10 characters at a time and see if they exist in a another string, you'd probably get it. MyString = Cells(1, 1) X=len(MyString) for startval=1 to 4 for endval=11 to 15 partial=Mid(MyString, startval, endval) ' Here you would compare "partial" to all known product codes. If TRUE then move on to next cell. next next So your code would take the following strings and look for a match. 45-AB-ZZZ- FALSE 4-AB-ZZZ-H FALSE -AB-ZZZ-HH FALSE AB-ZZZ-HHH TRUE If the codes are unique enough that a match of 10 characters is sufficient, then this would work. You will need to determine exactly how many characters in a row have to be Good luck |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"AlanR" wrote in message
... Dear all I have been tasked with a very manual process comparing thousands of rows of data from 2 data dumps. I have to locate a 'best match' <snip From a recent post in another group ... you might get some ideas: <quote ... What it cant do is tell you if something is 'real' or not. For example, does Brand X really exist or is it just a mispelled Brand Y? You can get close with heuristic lookups and comparisons, and I will be using Levenshtein distances etc to find nearest fits when a fields value is unknown - but this is only a guess. In the end a user must make the decision and 'approve' the fields value. </quote http://groups.google.com/group/comp....1a5cc4b4?hl=en -- Clif McIrvin (clare reads his mail with moe, nomail feeds the bit bucket :-) |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I thought I responded to this yesterday, but the post seems to have
disappeared. A search of Google Groups for Fuzzy match revealed (among others) this post from a few years ago: http://groups.google.com/group/micro...e82a2ec bd04b The macro from Harlan Grove seems to be on the track of what you are looking for. Hope this helps. Pete On Dec 14, 11:37*pm, AlanR wrote: Dear all I have been tasked with a very manual process comparing thousands of rows of data from 2 data dumps. I have to locate a 'best match' Product Code from one data dump and copy its value to another data dump. One data dump has the following columns: PO No, PO line No, Product Code, Value. The second data dump has only PO No and Product Code. The idea is on the second data dump to look up the relevant PO number and most similar Product code and return the value from the first data dump. The trouble is the product codes in both lists are slightly different. Sometimes the beginning of the code is different, sometimes the middle and sometimes the end! However, the characters in both types of product codes are around 90-95% similar and easily spottable by manual comparison. To automate this, I have tried a number of methods so far to do a closest match, for example a vlookup using TRUE, and I have also tried the 'FuzzyMatch' user-defined- function that I saw on Mr Excel. Unfortunately neither is providing me with a workable solution. Therefore I thought I'd ask the experts to see if what I want is possible via a VBA macro. If so, I would be very grateful of your help as doing this exercise manually is very disheartening! Thank you, AlanR. Example of the first data dump: PO No * Line No Product Code * *Value 45001 * 1 * * * AB-ZZZ-HHH-45T *100 45001 * 2 * * * TY-55555-99-ZA *110 45001 * 3 * * * CCC-MODEL-XX-YYY * * * *120 45001 * 4 * * * YYYYY_35-KLMN * 130 45002 * 1 * * * TY-55555-99-ZA *140 45002 * 2 * * * CCC-MODEL-XX-YYY * * * *150 45002 * 3 * * * AB-ZZZ-HHH-45T *160 45003 * 1 * * * YYYYY_35-KLMN * 170 45003 * 2 * * * AB-ZZZ-HHH-45T *180 45004 * 1 * * * CCC-MODEL-XX-YYY * * * *190 45004 * 2 * * * KIT-MODEL678 * *200 45004 * 3 * * * HYT-JJJ-TOP10 * 210 45004 * 4 * * * AB-ZZZ-HHH-45T *220 45004 * 5 * * * GTO-GTOP-25L * *230 45004 * 6 * * * YYYYY_35-KLMN * 240 45004 * 7 * * * TY-55555-99-ZA *250 Example of the second data dump (including the 4 columns I would like to auto-populate using VBA or formulae) PO No. *Product Code * * * * * * * *Closest Match Line No. * *Closest Match Product Code * Similarity% * Value 45001 * CCC-MODEL-XX-YYY-35 45001 * 45-AB-ZZZ-HHH-45T 45001 * YYYYY_35-MODEL-KLMN 45001 * TY-55555-99C-ZA 45002 * CCC-MODEL-XX-YYY-35 45002 * 45-AB-ZZZ-HHH-45T 45002 * TY-55555-99C-ZA 45003 * 45-AB-ZZZ-HHH-45T 45003 * YYYYY_35-MODEL-KLMN 45004 * KIT-MODEL678_A 45004 * CCC-MODEL-XX-YYY-35 45004 * 45-AB-ZZZ-HHH-45T 45004 * HYT-JJJ-TOP10 45004 * YYYYY_35-MODEL-KLMN 45004 * GTO-GTOP-25L 45004 * TY-55555-99C-ZA |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]() |
#7
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
You're welcome.
Pete On Dec 16, 12:10*pm, " wrote: Thanks Pete for the link. http://groups.google.com/group/micro...amming/bro...- Hide quoted text - - Show quoted text - |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Comparing text strings | Excel Worksheet Functions | |||
comparing text strings | Excel Programming | |||
problem with comparing strings in VBA | Excel Discussion (Misc queries) | |||
Comparing slightly different strings | Excel Programming | |||
Evaluating similarity of text strings | Excel Worksheet Functions |