View Single Post
  #3   Report Post  
Posted to microsoft.public.excel.misc
Max Max is offline
external usenet poster
 
Posts: 9,221
Default Text comparison - closest match

Here's a formulas play which might narrow it down a bit ..

I've assumed your 2 sheets are named simply as: 1, 2
where 1 houses the reference data that you're trying to match & extract into 2

A sample construct is available at:
http://www.savefile.com/files/6207746
Fuzzy TextString Comparison.xls

In sheet: 1,
the reference source data is assumed in cols A to C, data from row2 down to
a max expected row100 (say)

Label in D1: ConcatName
Put in D2: =TRIM(A2&" "&B2)
Copy down to D100

In sheet: 2,

In C2, array-entered*:
=INDEX(E2:N2,MATCH(1,--ISTEXT(E2:N2),0))

In D2:
=TRIM(A2&" "&B2)

In E2, array-entered*:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(D2),TRIM('1'!$D$2:$D$100))))*('1'!$D$2:$D$100< ""),0))

In F2, array-entered*:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$D$2:$D$100),TRIM(B2))))*('1'!$D$2:$D$100< ""),0))

In G2, array-entered*, G2 copied to H2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(A2),TRIM('1'!$A$2:$A$100))))*('1'!$A$2:$A$100< ""),0))

In I2, array-entered*, I2 copied to J2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM(A2),TRIM('1'!$B$2:$B$100))))*('1'!$B$2:$B$100< ""),0))

In K2, array-entered*, K2 copied to L2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$A$2:$A$100),TRIM(A2))))*('1'!$A$2:$A$100< ""),0))

In M2, array-entered*, M2 copied to N2:
=INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TR IM('1'!$B$2:$B$100),TRIM(A2))))*('1'!$B$2:$B$100< ""),0))

*Press CTRL+SHIFT+ENTER to confirm the formula
(instead of just pressing ENTER)

Then just select C2:N2, fill down as far as required.

Col C would return the "best fuzzy comparison" results
(i.e. the UserIDs from sheet: 1)

Col labels placed in D1:N1 :

ConcatName
2D vs 1D
1D vs 2D
2A vs 1A
2B vs 1A
2A vs 1B
2B vs 1B
1A vs 2A
1A vs 2B
1B vs 2A
1B vs 2B

Col label meanings:

ConcatName (concatenated names, trimmed)
2D vs 1D (compare 2's col D against 1's col D)
1D vs 2D (compare 1's col D against 2's col D, the reverse)
2A vs 1A (compare 2's col A against 1's col A)
2B vs 1A (compare 2's col B against 1's col A)
2A vs 1B (compare 2's col A against 1's col B)
2B vs 1B (compare 2's col B against 1's col B)
1A vs 2A (compare 1's col A against 2's col A)
1A vs 2B (compare 1's col A against 2's col B)
1B vs 2A (compare 1's col B against 2's col A)
1B vs 2B (compare 1's col B against 2's col B)

(yup, the whole battery of possible comparisons are done <g)

The array formulas in cols E to N will extract the UserIDs from sheet: 1 if
the comparisons (using SEARCH) are successful. As the array formula in col
C will simply grab the 1st matching text (i.e. leftmost) within cols E to N,
this would yield v.roughly the "best" results, since the concat string search
results (search on the longest, "most complete" name strings - the most
stringent searches) are intentionally placed in the "leftmost" cols E & F.
Single "FirstName" or "LastName" col searches which are less reliable due to
possibility of persons having the same FirstNames / LastNames are placed
further to the right.

We could also try replacing SEARCH with FIND in the formulas if we wanted it
to be case sensitive (SEARCH is not case sensitive). Albeit using FIND would
be even more stringent a search (since the case must match as well), for your
posted sample data, using FIND yields the same results [FIND is implemented
in sheet: 2 (2) in the sample file].

Adapt to suit ..
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
"pappu" wrote:
I have data in 2 worksheets, say WS1, WS2

WS1 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine S H : Koay : WKOAS
Roger : Groebi : WGROR
Jill : Neubronner : WNEUJ
Su Ann : Low : WLOWV
--------------------------------------

WS2 contains following plus other data (here i used : to separate
cells)
--------------------------------------
FirstName : LastName : UserID
--------------------------------------
Serine : Koay :
Roger A : Groebi :
Neubronner : Jill :
Su : Ann Low :
--------------------------------------

I need to find userID in WS2 for corresponding names from WS1. I joined
First and Lastnames in both the worksheets, used

VLOOKUP and got the userid in WS2, if the names matches exactly (not
shown here). But if the names doesn't match exactly, as

above, VLOOKUP doesn't work. Eventhough we can see the names are there,
but either contain additional/less initials,

reversed first and last name etc.

Is there any other way I can get the result?


+-------------------------------------------------------------------+
|Filename: TextStringComparison.txt |
|Download: http://www.excelforum.com/attachment.php?postid=4977 |
+-------------------------------------------------------------------+

--
pappu
------------------------------------------------------------------------
pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041
View this thread: http://www.excelforum.com/showthread...hreadid=558347