![]() |
Find position of first non-matching character
I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
Try this small UDF:
Function divergence(r1 As Range, r2 As Range) As Integer v1 = r1.Value v2 = r2.Value divergence = 0 i = Application.WorksheetFunction.Max(Len(v1), Len(v2)) For j = 1 To i ch1 = Mid(v1, j, 1) ch2 = Mid(v2, j, 1) If ch1 = ch2 Then Else divergence = j Exit Function End If Next End Function If the two strings exactly match, it will return a zero. It will also catch case changes as well. -- Gary''s Student - gsnu2007i "Greg Lovern" wrote: I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
You could try this UDF:
Option Explicit Public Function MissMatch(String1 As Variant, String2 As Variant) Dim aByte1() As Byte Dim aByte2() As Byte Dim k1 As Long Dim k2 As Long aByte1 = CStr(String1) aByte2 = CStr(String2) MissMatch = 0 If Len(CStr(String1)) 0 Then For k1 = LBound(aByte1) To UBound(aByte1) Step 2 k2 = k2 + 1 If k1 UBound(aByte2) Then MissMatch = k2 Exit For Else If aByte1(k1) < aByte2(k1) Then MissMatch = k2 Exit For End If End If Next k1 End If End Function Charles __________________________________________________ The Excel Calculation Site http://www.decisionmodels.com "Greg Lovern" wrote in message ... I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
Thanks, but I should have mentioned that I can do it in a UDF too.
What I want is to do it with Excel's built-in worksheet functions. Can it be done that way? Thanks, Greg On May 30, 9:55 am, Gary''s Student wrote: Try this small UDF: Function divergence(r1 As Range, r2 As Range) As Integer v1 = r1.Value v2 = r2.Value divergence = 0 i = Application.WorksheetFunction.Max(Len(v1), Len(v2)) For j = 1 To i ch1 = Mid(v1, j, 1) ch2 = Mid(v2, j, 1) If ch1 = ch2 Then Else divergence = j Exit Function End If Next End Function If the two strings exactly match, it will return a zero. It will also catch case changes as well. -- Gary''s Student - gsnu2007i "Greg Lovern" wrote: I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
Greg,
Array enter (enter using Ctrl-Shift-Enter): =SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2)))) With those strings in cells A2 and A3. HTH, Bernie MS Excel MVP "Greg Lovern" wrote in message ... I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
Greg,
Yes, it can be done with an array function. See my post. HTH, Bernie MS Excel MVP "Greg Lovern" wrote in message ... Thanks, but I should have mentioned that I can do it in a UDF too. What I want is to do it with Excel's built-in worksheet functions. Can it be done that way? Thanks, Greg On May 30, 9:55 am, Gary''s Student wrote: Try this small UDF: Function divergence(r1 As Range, r2 As Range) As Integer v1 = r1.Value v2 = r2.Value divergence = 0 i = Application.WorksheetFunction.Max(Len(v1), Len(v2)) For j = 1 To i ch1 = Mid(v1, j, 1) ch2 = Mid(v2, j, 1) If ch1 = ch2 Then Else divergence = j Exit Function End If Next End Function If the two strings exactly match, it will return a zero. It will also catch case changes as well. -- Gary''s Student - gsnu2007i "Greg Lovern" wrote: I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
Thanks, that's exactly what I was looking for.
Greg On May 30, 10:13 am, "Bernie Deitrick" <deitbe @ consumer dot org wrote: Greg, Array enter (enter using Ctrl-Shift-Enter): =SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2)))) With those strings in cells A2 and A3. HTH, Bernie MS Excel MVP "Greg Lovern" wrote in message ... I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
Hi Bernie,
Thanks again -- BTW, I'm not sure what the SUM is doing; it seems to work as well for me like this: =MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))), 1)<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))), 1),ROW(INDIRECT("A1:A"&LEN(A2))),LEN(A2))) Thanks, Greg On May 30, 10:13 am, "Bernie Deitrick" <deitbe @ consumer dot org wrote: Greg, Array enter (enter using Ctrl-Shift-Enter): =SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2)))) With those strings in cells A2 and A3. HTH, Bernie MS Excel MVP "Greg Lovern" wrote in message ... I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
Greg,
You're right - <Blush I modified another function that used the SUM and forgot to take it off of there... Bernie MS Excel MVP "Greg Lovern" wrote in message ... Hi Bernie, Thanks again -- BTW, I'm not sure what the SUM is doing; it seems to work as well for me like this: =MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))), 1)<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))), 1),ROW(INDIRECT("A1:A"&LEN(A2))),LEN(A2))) Thanks, Greg On May 30, 10:13 am, "Bernie Deitrick" <deitbe @ consumer dot org wrote: Greg, Array enter (enter using Ctrl-Shift-Enter): =SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2)))) With those strings in cells A2 and A3. HTH, Bernie MS Excel MVP "Greg Lovern" wrote in message ... I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg |
Find position of first non-matching character
On Fri, 30 May 2008 09:35:58 -0700 (PDT), Greg Lovern wrote:
I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg You could use this **array** function. Make sure the "99" is greater than the length of your longest string. =MATCH(FALSE,MID(A1,ROW($1:$99),1)=MID(A2,ROW($1:$ 99),1),0) To enter an **array** function, hold down <ctrl<shift while hitting <enter. Excel will place braces {...} around the formula. --ron |
Find position of first non-matching character
Excellent, thanks! :-)
Greg On May 30, 11:26 am, Ron Rosenfeld wrote: On Fri, 30 May 2008 09:35:58 -0700 (PDT), Greg Lovern wrote: I have two strings that start with the same characters, but at some unknown and varying point they begin to differ. I want the position number of the first non-matching character. For example: -- table and chair -- table plus chair -- position of first non-matching character is 7 (the "a" in "and", and the "p" in "plus"). In the above example, both strings start out "table ", then start to differ beginning with the 7th character. How can I return the position of the first non-matching character, which in this example is 7? Thanks, Greg You could use this **array** function. Make sure the "99" is greater than the length of your longest string. =MATCH(FALSE,MID(A1,ROW($1:$99),1)=MID(A2,ROW($1:$ 99),1),0) To enter an **array** function, hold down <ctrl<shift while hitting <enter. Excel will place braces {...} around the formula. --ron |
Find position of first non-matching character
On Fri, 30 May 2008 11:42:48 -0700 (PDT), Greg Lovern wrote:
Excellent, thanks! :-) Greg You're welcome. Glad to help. By the way, the formula I gave you will return a #NA error if the two strings match exactly. There are a variety of methods of handling this issue if that is not desired. Probably the simplest is to just embed it in an IF statement. e.g: =IF(A1=A2,"Match",formula) Also note that the formula, as provided, is NOT case sensitive. If you wanted a case-sensitive formula, and also to check for a complete Match, then something like: =IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE, EXACT(MID(A1,ROW(1:99),1),MID(A2,ROW(1:99),1)),0)) should work. --ron |
Find position of first non-matching character
On Fri, 30 May 2008 15:29:05 -0400, Ron Rosenfeld
wrote: Should be: =IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE, EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1)) ,0)) --ron |
Find position of first non-matching character
Ron Rosenfeld wrote...
Should be: =IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE, EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1) ),0)) Variation on this. First, it's convenient to use a defined name like seq referring to a formula like =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,256,1)) or also define a name like NCHARS referring to 256 or something else and define seq as =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,NCHARS,1)) Then use an array formula like =MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1),0 ) for case insensitive matches or =MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y",se q,1)),0) for case sensitive matches. These return LEN(A1)+1 if A1 = A2. |
Find position of first non-matching character
On Fri, 30 May 2008 15:25:35 -0700 (PDT), Harlan Grove
wrote: Ron Rosenfeld wrote... Should be: =IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE, EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1 )),0)) Variation on this. First, it's convenient to use a defined name like seq referring to a formula like =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1: $65536,256,1)) or also define a name like NCHARS referring to 256 or something else and define seq as =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1: $65536,NCHARS,1)) Then use an array formula like =MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1), 0) for case insensitive matches or =MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y",s eq,1)),0) for case sensitive matches. These return LEN(A1)+1 if A1 = A2. Certainly a different approach, to return LEN(A1)+1 if A1=A2 I have a question, Harlan: If using a defined name like seq to return a sequential array of numbers, what is the advantage of using your longer formula: =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,256,1)) versus =ROW($1:$256) At least in Excel 2007, it seems that your formula requires that Sheet1 exist in the workbook at the time the defined name is created or else it will give a #REF error. --ron |
Find position of first non-matching character
"Ron Rosenfeld" wrote in message
... On Fri, 30 May 2008 15:25:35 -0700 (PDT), Harlan Grove wrote: Ron Rosenfeld wrote... Should be: =IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE, EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99), 1)),0)) Variation on this. First, it's convenient to use a defined name like seq referring to a formula like =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1 :$65536,256,1)) or also define a name like NCHARS referring to 256 or something else and define seq as =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1 :$65536,NCHARS,1)) Then use an array formula like =MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1) ,0) for case insensitive matches or =MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y", seq,1)),0) for case sensitive matches. These return LEN(A1)+1 if A1 = A2. Certainly a different approach, to return LEN(A1)+1 if A1=A2 I have a question, Harlan: If using a defined name like seq to return a sequential array of numbers, what is the advantage of using your longer formula: =ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,256,1)) versus =ROW($1:$256) At least in Excel 2007, it seems that your formula requires that Sheet1 exist in the workbook at the time the defined name is created or else it will give a #REF error. --ron My interpretation.... ROW($1:$256) is susceptible to row insertions. Using INDIRECT takes care of that but is volatile: ROW(INDIRECT("1:256")) ROW(INDEX(....)) is an alternative to using INDIRECT. -- Biff Microsoft Excel MVP |
Find position of first non-matching character
On Sat, 31 May 2008 01:47:37 -0400, "T. Valko" wrote:
My interpretation.... ROW($1:$256) is susceptible to row insertions. Using INDIRECT takes care of that but is volatile: ROW(INDIRECT("1:256")) ROW(INDEX(....)) is an alternative to using INDIRECT. -- Biff Microsoft Excel MVP Good point. Thanks --ron |
Find position of first non-matching character
That's true but there are several other issues with the defined name method:
- Using a reference to all of the sheet is less efficient since the formula recalculates whenever any change is made anywhere on the same sheet. - Dynamic ranges recalculate at startup, causing the save changes prompt to appear by default when a workbook is closed (even if nothing has changed). - If you delete all cells on the sheet by right-clicking the top left corner and clicking delete, the name will return #ref!. A simple way around might be to refer to a hidden sheet where no changes are made but this causes additional overhead. Another option is to use an active sheet name: =row(!$1:$256) This is more efficient for recalculation and robust to row insertions, etc. but can cause an error when copied between workbooks. You could also try using a relatively defined name: =row(1:256)-row(1:1) but this causes wraparound problems when you reach the end. In summary all methods have drawbacks and the straightforward method often works fine provided few changes are planned. |
Find position of first non-matching character
Lori wrote...
That's true but there are several other issues with the defined name method: And there are counterissues when using alternatives. - Using a reference to all of the sheet is less efficient since the formula recalculates whenever any change is made anywhere on the same sheet. As also happens when using INDIRECT. However, formulas referring to the entire worksheet don't recalculate when making changes in OTHER worksheets, but formulas using INDIRECT still do. There's no way to eliminate problems with row/column insertion/ deletion other than using INDIRECT or INDEX:INDEX. - Dynamic ranges recalculate at startup, causing the save changes prompt to appear by default when a workbook is closed (even if nothing has changed). Same using either INDIRECT or INDEX:INDEX, but there's no way to avoid this AND protect against row/column insertion/deletion. It's a trade- off, but if macros were allowed, this could be addressed by something like this. Private Sub Workbook_Open() Me.Saved = True End Sub - If you delete all cells on the sheet by right-clicking the top left corner and clicking delete, the name will return #ref!. Yes, and that's the ONLY way to affect references to $1:$65536 (Excel 97 to Excel 2003) other than deleting the entire worksheet. If you have users that make a habit of deleting all cells in worksheets, you have my sympathy, but in my experience this happens rarely enough that it doesn't require special handling. But if you must guard against this, once again event handlers to the rescue. Private Sub Workbook_SheetSelectionChange(ByVal Sh As Object, ByVal Target As Range) Application.EnableCancelKey = xlDisabled Application.EnableEvents = False If Target.Address(0, 0, xlA1, 0) = "1:65536" Then MsgBox Prompt:="You may not select the entire worksheet.", _ Title:="No! No! No! Bad user!!" Target.Cells(1, 1).Select End If Application.EnableEvents = True Application.EnableCancelKey = xlInterrupt End Sub A simple way around might be to refer to a hidden sheet where no changes are made but this causes additional overhead. This is a good idea. There are usually many things that could and should be stored in hidden worksheets, and the overhead isn't all that much as long as you don't to any formatting in that hidden worksheet. Another option is to use an active sheet name: =row(!$1:$256) This is more efficient for recalculation and robust to row insertions, etc. but can cause an error when copied between workbooks. There are other problems besides copying between workbooks. You could also try using a relatively defined name: =row(1:256)-row(1:1) but this causes wraparound problems when you reach the end. This is also susceptible to selecting all cells and deleting. In summary all methods have drawbacks and the straightforward method often works fine provided few changes are planned. Using a hidden worksheet is the best of these possibilities. The overhead should be negligible compared to the benefits. |
All times are GMT +1. The time now is 03:12 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com