Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1
Default Text comparison - closest match


I have data in 2 worksheets, say WS1, WS2

WS1 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine S H : Koay : WKOAS
Roger : Groebi : WGROR
Jill : Neubronner : WNEUJ
Su Ann : Low : WLOWV
--------------------------------------

WS2 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine : Koay :
Roger A : Groebi :
Neubronner : Jill :
Su : Ann Low :
--------------------------------------

I need to find userID in WS2 for corresponding names from WS1. I joined
First and Lastnames in both the worksheets, used

VLOOKUP and got the userid in WS2, if the names matches exactly (not
shown here). But if the names doesn't match exactly, as

above, VLOOKUP doesn't work. Eventhough we can see the names are there,
but either contain additional/less initials,

reversed first and last name etc.

Is there any other way I can get the result?


+-------------------------------------------------------------------+
|Filename: TextStringComparison.txt |
|Download: http://www.excelforum.com/attachment.php?postid=4977 |
+-------------------------------------------------------------------+

--
pappu
------------------------------------------------------------------------
pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041
View this thread: http://www.excelforum.com/showthread...hreadid=558347

  #2   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 65
Default Text comparison - closest match

Vllokup looks for an exact match if the Range_lookup argument is set to
False. If you set it to true it will look for the closest march but your
table must be sorted on the first column. I'm not sure it will do what you
want but give it a try. HTH
SHeila
www.c-i-m-s.com


"pappu" wrote:


I have data in 2 worksheets, say WS1, WS2

WS1 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine S H : Koay : WKOAS
Roger : Groebi : WGROR
Jill : Neubronner : WNEUJ
Su Ann : Low : WLOWV
--------------------------------------

WS2 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine : Koay :
Roger A : Groebi :
Neubronner : Jill :
Su : Ann Low :
--------------------------------------

I need to find userID in WS2 for corresponding names from WS1. I joined
First and Lastnames in both the worksheets, used

VLOOKUP and got the userid in WS2, if the names matches exactly (not
shown here). But if the names doesn't match exactly, as

above, VLOOKUP doesn't work. Eventhough we can see the names are there,
but either contain additional/less initials,

reversed first and last name etc.

Is there any other way I can get the result?


+-------------------------------------------------------------------+
|Filename: TextStringComparison.txt |
|Download: http://www.excelforum.com/attachment.php?postid=4977 |
+-------------------------------------------------------------------+

--
pappu
------------------------------------------------------------------------
pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041
View this thread: http://www.excelforum.com/showthread...hreadid=558347


  #3   Report Post  
Posted to microsoft.public.excel.misc
Max Max is offline
external usenet poster
 
Posts: 9,221
Default Text comparison - closest match

Here's a formulas play which might narrow it down a bit ..

I've assumed your 2 sheets are named simply as: 1, 2
where 1 houses the reference data that you're trying to match & extract into 2

A sample construct is available at:
http://www.savefile.com/files/6207746
Fuzzy TextString Comparison.xls

In sheet: 1,
the reference source data is assumed in cols A to C, data from row2 down to
a max expected row100 (say)

Label in D1: ConcatName
Put in D2: =TRIM(A2&" "&B2)
Copy down to D100

In sheet: 2,

In C2, array-entered*:
=INDEX(E2:N2,MATCH(1,--ISTEXT(E2:N2),0))

In D2:
=TRIM(A2&" "&B2)

In E2, array-entered*:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(D2),TRIM('1'!$D$2:$D$100))))*('1'!$D$2:$D$100< ""),0))

In F2, array-entered*:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$D$2:$D$100),TRIM(B2))))*('1'!$D$2:$D$100< ""),0))

In G2, array-entered*, G2 copied to H2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(A2),TRIM('1'!$A$2:$A$100))))*('1'!$A$2:$A$100< ""),0))

In I2, array-entered*, I2 copied to J2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(A2),TRIM('1'!$B$2:$B$100))))*('1'!$B$2:$B$100< ""),0))

In K2, array-entered*, K2 copied to L2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$A$2:$A$100),TRIM(A2))))*('1'!$A$2:$A$100< ""),0))

In M2, array-entered*, M2 copied to N2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$B$2:$B$100),TRIM(A2))))*('1'!$B$2:$B$100< ""),0))

*Press CTRL+SHIFT+ENTER to confirm the formula
(instead of just pressing ENTER)

Then just select C2:N2, fill down as far as required.

Col C would return the "best fuzzy comparison" results
(i.e. the UserIDs from sheet: 1)

Col labels placed in D1:N1 :

ConcatName
2D vs 1D
1D vs 2D
2A vs 1A
2B vs 1A
2A vs 1B
2B vs 1B
1A vs 2A
1A vs 2B
1B vs 2A
1B vs 2B

Col label meanings:

ConcatName (concatenated names, trimmed)
2D vs 1D (compare 2's col D against 1's col D)
1D vs 2D (compare 1's col D against 2's col D, the reverse)
2A vs 1A (compare 2's col A against 1's col A)
2B vs 1A (compare 2's col B against 1's col A)
2A vs 1B (compare 2's col A against 1's col B)
2B vs 1B (compare 2's col B against 1's col B)
1A vs 2A (compare 1's col A against 2's col A)
1A vs 2B (compare 1's col A against 2's col B)
1B vs 2A (compare 1's col B against 2's col A)
1B vs 2B (compare 1's col B against 2's col B)

(yup, the whole battery of possible comparisons are done <g)

The array formulas in cols E to N will extract the UserIDs from sheet: 1 if
the comparisons (using SEARCH) are successful. As the array formula in col
C will simply grab the 1st matching text (i.e. leftmost) within cols E to N,
this would yield v.roughly the "best" results, since the concat string search
results (search on the longest, "most complete" name strings - the most
stringent searches) are intentionally placed in the "leftmost" cols E & F.
Single "FirstName" or "LastName" col searches which are less reliable due to
possibility of persons having the same FirstNames / LastNames are placed
further to the right.

We could also try replacing SEARCH with FIND in the formulas if we wanted it
to be case sensitive (SEARCH is not case sensitive). Albeit using FIND would
be even more stringent a search (since the case must match as well), for your
posted sample data, using FIND yields the same results [FIND is implemented
in sheet: 2 (2) in the sample file].

Adapt to suit ..
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
"pappu" wrote:
I have data in 2 worksheets, say WS1, WS2

WS1 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine S H : Koay : WKOAS
Roger : Groebi : WGROR
Jill : Neubronner : WNEUJ
Su Ann : Low : WLOWV
--------------------------------------

WS2 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine : Koay :
Roger A : Groebi :
Neubronner : Jill :
Su : Ann Low :
--------------------------------------

I need to find userID in WS2 for corresponding names from WS1. I joined
First and Lastnames in both the worksheets, used

VLOOKUP and got the userid in WS2, if the names matches exactly (not
shown here). But if the names doesn't match exactly, as

above, VLOOKUP doesn't work. Eventhough we can see the names are there,
but either contain additional/less initials,

reversed first and last name etc.

Is there any other way I can get the result?


+-------------------------------------------------------------------+
|Filename: TextStringComparison.txt |
|Download: http://www.excelforum.com/attachment.php?postid=4977 |
+-------------------------------------------------------------------+

--
pappu
------------------------------------------------------------------------
pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041
View this thread: http://www.excelforum.com/showthread...hreadid=558347


  #4   Report Post  
Posted to microsoft.public.excel.misc
Max Max is offline
external usenet poster
 
Posts: 9,221
Default Text comparison - closest match

Max wrote:
....
The array formulas in cols E to N will extract the UserIDs from sheet: 1 if
the comparisons (using SEARCH) are successful. As the array formula in col
C will simply grab the 1st matching text (i.e. leftmost) within cols E to N,
this would yield v.roughly the "best" results, since the concat string search
results (search on the longest, "most complete" name strings - the most
stringent searches) are intentionally placed in the "leftmost" cols E & F.
Single "FirstName" or "LastName" col searches which are less reliable due to
possibility of persons having the same FirstNames / LastNames are placed
further to the right.


Just a clarification:
Other than what's generally mentioned above on the intentional leftmost
placements of the concat string search formulas in cols E & F, the placement
sequence of the formulas in the other cols G to N is arbitrary (how I drew it
up is just one sequence which can be tried). Even the sequence for cols E & F
in itself is arbitrary. It doesn't necessarily mean that the searches by the
formulas in col E would always return better/more reliable results than those
in col F. Just that cols E and F (collectively) should return results
better/more reliable than those by the other cols to the right. Play around
the fuzziness to taste <g ..
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
  #5   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 1
Default Text comparison - closest match


Looks like it will solve my problem. Thanks a lot. But my data is in
almost 15000 rows. Will it be applicable?

Sorry, I don't understand about Excel Array. Will check further details
on it.


--
pappu
------------------------------------------------------------------------
pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041
View this thread: http://www.excelforum.com/showthread...hreadid=558347



  #6   Report Post  
Posted to microsoft.public.excel.misc
Max Max is offline
external usenet poster
 
Posts: 9,221
Default Text comparison - closest match

"pappu" wrote:
Looks like it will solve my problem. Thanks a lot. But my data is in
almost 15000 rows. Will it be applicable?


It should still work, notwithstanding ...

As the calculation overhead is going v.heavy, try switching the book's calc
mode to Manual first via clicking: Tools Options Calculation tab Check
Manual OK
Then when all formulas are filled, just press F9 to recalc (Go for a short
break, it might take awhile for recalc to complete <g)

Then just adapt the ranges within the topline formulas for cols E to N to
suit the reference source data extent (15000?). For example:

In E2, array-entered*:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(D2),TRIM('1'!$D$2:$D$100))))*('1'!$D$2:$D$100< ""),0))

would become:
=INDEX('1'!$C$2:$C$15000,MATCH(1,(ISNUMBER(SEARCH( TRIM(D2),TRIM('1'!$D$2:$D$15000))))*('1'!$D$2:$D$1 5000<""),0))

Then fill down the topline formulas in C2:N2 as far as you have data in
sheet: 2
(I don't know how many lines you have in sheet: 2)

Sorry, I don't understand about Excel Array.


The special way to confirm/register the array formulas in cols E to N and in
col C via pressing CTRL+SHIFT+ENTER (instead of just pressing ENTER) is
crucial. If correctly confirmed, Excel will then insert/wrap curly braces { }
around the formula. You can see/verify this in the formula bar.

Note: *Do not* type these curly braces into the formulas
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
need to select closest match using vlookup if it higher or lower vlookup help pls Excel Discussion (Misc queries) 1 March 1st 06 07:30 PM
Find the closest match to a reference number in a row of unsorted Nick Krill Excel Worksheet Functions 3 January 1st 06 08:33 PM
I need to use a number in one worksheet to match text in another gillesda Excel Worksheet Functions 1 July 21st 05 11:45 PM
How can I do a "best match" for text entries? NeedDataHelp Excel Worksheet Functions 1 February 25th 05 12:27 AM
Read Text File into Excel Using VBA Willie T Excel Discussion (Misc queries) 13 January 8th 05 12:37 AM


All times are GMT +1. The time now is 02:43 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"