Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.misc
|
|||
|
|||
Text comparison - closest match
I have data in 2 worksheets, say WS1, WS2 WS1 contains following plus other data (here i used : to separate cells) -------------------------------------- FirstName : LastName : UserID -------------------------------------- Serine S H : Koay : WKOAS Roger : Groebi : WGROR Jill : Neubronner : WNEUJ Su Ann : Low : WLOWV -------------------------------------- WS2 contains following plus other data (here i used : to separate cells) -------------------------------------- FirstName : LastName : UserID -------------------------------------- Serine : Koay : Roger A : Groebi : Neubronner : Jill : Su : Ann Low : -------------------------------------- I need to find userID in WS2 for corresponding names from WS1. I joined First and Lastnames in both the worksheets, used VLOOKUP and got the userid in WS2, if the names matches exactly (not shown here). But if the names doesn't match exactly, as above, VLOOKUP doesn't work. Eventhough we can see the names are there, but either contain additional/less initials, reversed first and last name etc. Is there any other way I can get the result? +-------------------------------------------------------------------+ |Filename: TextStringComparison.txt | |Download: http://www.excelforum.com/attachment.php?postid=4977 | +-------------------------------------------------------------------+ -- pappu ------------------------------------------------------------------------ pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041 View this thread: http://www.excelforum.com/showthread...hreadid=558347 |
#2
Posted to microsoft.public.excel.misc
|
|||
|
|||
Text comparison - closest match
Vllokup looks for an exact match if the Range_lookup argument is set to
False. If you set it to true it will look for the closest march but your table must be sorted on the first column. I'm not sure it will do what you want but give it a try. HTH SHeila www.c-i-m-s.com "pappu" wrote: I have data in 2 worksheets, say WS1, WS2 WS1 contains following plus other data (here i used : to separate cells) -------------------------------------- FirstName : LastName : UserID -------------------------------------- Serine S H : Koay : WKOAS Roger : Groebi : WGROR Jill : Neubronner : WNEUJ Su Ann : Low : WLOWV -------------------------------------- WS2 contains following plus other data (here i used : to separate cells) -------------------------------------- FirstName : LastName : UserID -------------------------------------- Serine : Koay : Roger A : Groebi : Neubronner : Jill : Su : Ann Low : -------------------------------------- I need to find userID in WS2 for corresponding names from WS1. I joined First and Lastnames in both the worksheets, used VLOOKUP and got the userid in WS2, if the names matches exactly (not shown here). But if the names doesn't match exactly, as above, VLOOKUP doesn't work. Eventhough we can see the names are there, but either contain additional/less initials, reversed first and last name etc. Is there any other way I can get the result? +-------------------------------------------------------------------+ |Filename: TextStringComparison.txt | |Download: http://www.excelforum.com/attachment.php?postid=4977 | +-------------------------------------------------------------------+ -- pappu ------------------------------------------------------------------------ pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041 View this thread: http://www.excelforum.com/showthread...hreadid=558347 |
#3
Posted to microsoft.public.excel.misc
|
|||
|
|||
Text comparison - closest match
Here's a formulas play which might narrow it down a bit ..
I've assumed your 2 sheets are named simply as: 1, 2 where 1 houses the reference data that you're trying to match & extract into 2 A sample construct is available at: http://www.savefile.com/files/6207746 Fuzzy TextString Comparison.xls In sheet: 1, the reference source data is assumed in cols A to C, data from row2 down to a max expected row100 (say) Label in D1: ConcatName Put in D2: =TRIM(A2&" "&B2) Copy down to D100 In sheet: 2, In C2, array-entered*: =INDEX(E2:N2,MATCH(1,--ISTEXT(E2:N2),0)) In D2: =TRIM(A2&" "&B2) In E2, array-entered*: =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(D2),TRIM('1'!$D$2:$D$100))))*('1'!$D$2:$D$100< ""),0)) In F2, array-entered*: =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$D$2:$D$100),TRIM(B2))))*('1'!$D$2:$D$100< ""),0)) In G2, array-entered*, G2 copied to H2: =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(A2),TRIM('1'!$A$2:$A$100))))*('1'!$A$2:$A$100< ""),0)) In I2, array-entered*, I2 copied to J2: =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(A2),TRIM('1'!$B$2:$B$100))))*('1'!$B$2:$B$100< ""),0)) In K2, array-entered*, K2 copied to L2: =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$A$2:$A$100),TRIM(A2))))*('1'!$A$2:$A$100< ""),0)) In M2, array-entered*, M2 copied to N2: =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$B$2:$B$100),TRIM(A2))))*('1'!$B$2:$B$100< ""),0)) *Press CTRL+SHIFT+ENTER to confirm the formula (instead of just pressing ENTER) Then just select C2:N2, fill down as far as required. Col C would return the "best fuzzy comparison" results (i.e. the UserIDs from sheet: 1) Col labels placed in D1:N1 : ConcatName 2D vs 1D 1D vs 2D 2A vs 1A 2B vs 1A 2A vs 1B 2B vs 1B 1A vs 2A 1A vs 2B 1B vs 2A 1B vs 2B Col label meanings: ConcatName (concatenated names, trimmed) 2D vs 1D (compare 2's col D against 1's col D) 1D vs 2D (compare 1's col D against 2's col D, the reverse) 2A vs 1A (compare 2's col A against 1's col A) 2B vs 1A (compare 2's col B against 1's col A) 2A vs 1B (compare 2's col A against 1's col B) 2B vs 1B (compare 2's col B against 1's col B) 1A vs 2A (compare 1's col A against 2's col A) 1A vs 2B (compare 1's col A against 2's col B) 1B vs 2A (compare 1's col B against 2's col A) 1B vs 2B (compare 1's col B against 2's col B) (yup, the whole battery of possible comparisons are done <g) The array formulas in cols E to N will extract the UserIDs from sheet: 1 if the comparisons (using SEARCH) are successful. As the array formula in col C will simply grab the 1st matching text (i.e. leftmost) within cols E to N, this would yield v.roughly the "best" results, since the concat string search results (search on the longest, "most complete" name strings - the most stringent searches) are intentionally placed in the "leftmost" cols E & F. Single "FirstName" or "LastName" col searches which are less reliable due to possibility of persons having the same FirstNames / LastNames are placed further to the right. We could also try replacing SEARCH with FIND in the formulas if we wanted it to be case sensitive (SEARCH is not case sensitive). Albeit using FIND would be even more stringent a search (since the case must match as well), for your posted sample data, using FIND yields the same results [FIND is implemented in sheet: 2 (2) in the sample file]. Adapt to suit .. -- Max Singapore http://savefile.com/projects/236895 xdemechanik --- "pappu" wrote: I have data in 2 worksheets, say WS1, WS2 WS1 contains following plus other data (here i used : to separate cells) -------------------------------------- FirstName : LastName : UserID -------------------------------------- Serine S H : Koay : WKOAS Roger : Groebi : WGROR Jill : Neubronner : WNEUJ Su Ann : Low : WLOWV -------------------------------------- WS2 contains following plus other data (here i used : to separate cells) -------------------------------------- FirstName : LastName : UserID -------------------------------------- Serine : Koay : Roger A : Groebi : Neubronner : Jill : Su : Ann Low : -------------------------------------- I need to find userID in WS2 for corresponding names from WS1. I joined First and Lastnames in both the worksheets, used VLOOKUP and got the userid in WS2, if the names matches exactly (not shown here). But if the names doesn't match exactly, as above, VLOOKUP doesn't work. Eventhough we can see the names are there, but either contain additional/less initials, reversed first and last name etc. Is there any other way I can get the result? +-------------------------------------------------------------------+ |Filename: TextStringComparison.txt | |Download: http://www.excelforum.com/attachment.php?postid=4977 | +-------------------------------------------------------------------+ -- pappu ------------------------------------------------------------------------ pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041 View this thread: http://www.excelforum.com/showthread...hreadid=558347 |
#4
Posted to microsoft.public.excel.misc
|
|||
|
|||
Text comparison - closest match
Max wrote:
.... The array formulas in cols E to N will extract the UserIDs from sheet: 1 if the comparisons (using SEARCH) are successful. As the array formula in col C will simply grab the 1st matching text (i.e. leftmost) within cols E to N, this would yield v.roughly the "best" results, since the concat string search results (search on the longest, "most complete" name strings - the most stringent searches) are intentionally placed in the "leftmost" cols E & F. Single "FirstName" or "LastName" col searches which are less reliable due to possibility of persons having the same FirstNames / LastNames are placed further to the right. Just a clarification: Other than what's generally mentioned above on the intentional leftmost placements of the concat string search formulas in cols E & F, the placement sequence of the formulas in the other cols G to N is arbitrary (how I drew it up is just one sequence which can be tried). Even the sequence for cols E & F in itself is arbitrary. It doesn't necessarily mean that the searches by the formulas in col E would always return better/more reliable results than those in col F. Just that cols E and F (collectively) should return results better/more reliable than those by the other cols to the right. Play around the fuzziness to taste <g .. -- Max Singapore http://savefile.com/projects/236895 xdemechanik --- |
#5
Posted to microsoft.public.excel.misc
|
|||
|
|||
Text comparison - closest match
Looks like it will solve my problem. Thanks a lot. But my data is in almost 15000 rows. Will it be applicable? Sorry, I don't understand about Excel Array. Will check further details on it. -- pappu ------------------------------------------------------------------------ pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041 View this thread: http://www.excelforum.com/showthread...hreadid=558347 |
#6
Posted to microsoft.public.excel.misc
|
|||
|
|||
Text comparison - closest match
"pappu" wrote:
Looks like it will solve my problem. Thanks a lot. But my data is in almost 15000 rows. Will it be applicable? It should still work, notwithstanding ... As the calculation overhead is going v.heavy, try switching the book's calc mode to Manual first via clicking: Tools Options Calculation tab Check Manual OK Then when all formulas are filled, just press F9 to recalc (Go for a short break, it might take awhile for recalc to complete <g) Then just adapt the ranges within the topline formulas for cols E to N to suit the reference source data extent (15000?). For example: In E2, array-entered*: =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(D2),TRIM('1'!$D$2:$D$100))))*('1'!$D$2:$D$100< ""),0)) would become: =INDEX('1'!$C$2:$C$15000,MATCH(1,(ISNUMBER(SEARCH( TRIM(D2),TRIM('1'!$D$2:$D$15000))))*('1'!$D$2:$D$1 5000<""),0)) Then fill down the topline formulas in C2:N2 as far as you have data in sheet: 2 (I don't know how many lines you have in sheet: 2) Sorry, I don't understand about Excel Array. The special way to confirm/register the array formulas in cols E to N and in col C via pressing CTRL+SHIFT+ENTER (instead of just pressing ENTER) is crucial. If correctly confirmed, Excel will then insert/wrap curly braces { } around the formula. You can see/verify this in the formula bar. Note: *Do not* type these curly braces into the formulas -- Max Singapore http://savefile.com/projects/236895 xdemechanik --- |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
need to select closest match using vlookup if it higher or lower | Excel Discussion (Misc queries) | |||
Find the closest match to a reference number in a row of unsorted | Excel Worksheet Functions | |||
I need to use a number in one worksheet to match text in another | Excel Worksheet Functions | |||
How can I do a "best match" for text entries? | Excel Worksheet Functions | |||
Read Text File into Excel Using VBA | Excel Discussion (Misc queries) |