ExcelBanter - Find position of first non-matching character

ExcelBanter

ExcelBanter (https://www.excelbanter.com/)

- Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)

- - Find position of first non-matching character (https://www.excelbanter.com/excel-worksheet-functions/189425-find-position-first-non-matching-character.html)

Greg Lovern

Find position of first non-matching character

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Gary''s Student

Find position of first non-matching character

Try this small UDF:

Function divergence(r1 As Range, r2 As Range) As Integer
v1 = r1.Value
v2 = r2.Value
divergence = 0
i = Application.WorksheetFunction.Max(Len(v1), Len(v2))
For j = 1 To i
ch1 = Mid(v1, j, 1)
ch2 = Mid(v2, j, 1)
If ch1 = ch2 Then
Else
divergence = j
Exit Function
End If
Next
End Function

If the two strings exactly match, it will return a zero. It will also catch
case changes as well.
--
Gary''s Student - gsnu2007i

"Greg Lovern" wrote:

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Charles Williams

Find position of first non-matching character

You could try this UDF:

Option Explicit

Public Function MissMatch(String1 As Variant, String2 As Variant)
Dim aByte1() As Byte
Dim aByte2() As Byte

Dim k1 As Long
Dim k2 As Long

aByte1 = CStr(String1)
aByte2 = CStr(String2)
MissMatch = 0
If Len(CStr(String1)) 0 Then
For k1 = LBound(aByte1) To UBound(aByte1) Step 2
k2 = k2 + 1
If k1 UBound(aByte2) Then
MissMatch = k2
Exit For
Else
If aByte1(k1) < aByte2(k1) Then
MissMatch = k2
Exit For
End If
End If
Next k1
End If
End Function

Charles
__________________________________________________
The Excel Calculation Site
http://www.decisionmodels.com
"Greg Lovern" wrote in message
...
I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Greg Lovern

Find position of first non-matching character

Thanks, but I should have mentioned that I can do it in a UDF too.

What I want is to do it with Excel's built-in worksheet functions. Can
it be done that way?

Thanks,

Greg

On May 30, 9:55 am, Gary''s Student
wrote:
Try this small UDF:

Function divergence(r1 As Range, r2 As Range) As Integer
v1 = r1.Value
v2 = r2.Value
divergence = 0
i = Application.WorksheetFunction.Max(Len(v1), Len(v2))
For j = 1 To i
ch1 = Mid(v1, j, 1)
ch2 = Mid(v2, j, 1)
If ch1 = ch2 Then
Else
divergence = j
Exit Function
End If
Next
End Function

If the two strings exactly match, it will return a zero. It will also catch
case changes as well.
--
Gary''s Student - gsnu2007i

"Greg Lovern" wrote:
I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Bernie Deitrick

Find position of first non-matching character

Greg,

Array enter (enter using Ctrl-Shift-Enter):

=SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2))))

With those strings in cells A2 and A3.

HTH,
Bernie
MS Excel MVP

"Greg Lovern" wrote in message
...
I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Bernie Deitrick

Find position of first non-matching character

Greg,

Yes, it can be done with an array function. See my post.

HTH,
Bernie
MS Excel MVP

"Greg Lovern" wrote in message
...
Thanks, but I should have mentioned that I can do it in a UDF too.

What I want is to do it with Excel's built-in worksheet functions. Can
it be done that way?

Thanks,

Greg

On May 30, 9:55 am, Gary''s Student
wrote:
Try this small UDF:

Function divergence(r1 As Range, r2 As Range) As Integer
v1 = r1.Value
v2 = r2.Value
divergence = 0
i = Application.WorksheetFunction.Max(Len(v1), Len(v2))
For j = 1 To i
ch1 = Mid(v1, j, 1)
ch2 = Mid(v2, j, 1)
If ch1 = ch2 Then
Else
divergence = j
Exit Function
End If
Next
End Function

If the two strings exactly match, it will return a zero. It will also catch
case changes as well.
--
Gary''s Student - gsnu2007i

"Greg Lovern" wrote:
I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Greg Lovern

Find position of first non-matching character

Thanks, that's exactly what I was looking for.

Greg

On May 30, 10:13 am, "Bernie Deitrick" <deitbe @ consumer dot org
wrote:
Greg,

Array enter (enter using Ctrl-Shift-Enter):

=SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2))))

With those strings in cells A2 and A3.

HTH,
Bernie
MS Excel MVP

"Greg Lovern" wrote in message

...

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Greg Lovern

Find position of first non-matching character

Hi Bernie,

Thanks again -- BTW, I'm not sure what the SUM is doing; it seems to
work as well for me like this:

=MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),
1)<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),
1),ROW(INDIRECT("A1:A"&LEN(A2))),LEN(A2)))

Thanks,

Greg

On May 30, 10:13 am, "Bernie Deitrick" <deitbe @ consumer dot org
wrote:
Greg,

Array enter (enter using Ctrl-Shift-Enter):

=SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2))))

With those strings in cells A2 and A3.

HTH,
Bernie
MS Excel MVP

"Greg Lovern" wrote in message

...

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Bernie Deitrick

Find position of first non-matching character

Greg,

You're right - <Blush I modified another function that used the SUM and forgot to take it off of
there...

Bernie
MS Excel MVP

"Greg Lovern" wrote in message
...
Hi Bernie,

Thanks again -- BTW, I'm not sure what the SUM is doing; it seems to
work as well for me like this:

=MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),
1)<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),
1),ROW(INDIRECT("A1:A"&LEN(A2))),LEN(A2)))

Thanks,

Greg

On May 30, 10:13 am, "Bernie Deitrick" <deitbe @ consumer dot org
wrote:
Greg,

Array enter (enter using Ctrl-Shift-Enter):

=SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1 )<MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(IND IRECT("A1:A"&LEN(A2))),LEN(A2))))

With those strings in cells A2 and A3.

HTH,
Bernie
MS Excel MVP

"Greg Lovern" wrote in message

...

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

Ron Rosenfeld

Find position of first non-matching character

On Fri, 30 May 2008 09:35:58 -0700 (PDT), Greg Lovern wrote:

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

You could use this **array** function.

Make sure the "99" is greater than the length of your longest string.

=MATCH(FALSE,MID(A1,ROW($1:$99),1)=MID(A2,ROW($1:$ 99),1),0)

To enter an **array** function, hold down <ctrl<shift while hitting <enter.
Excel will place braces {...} around the formula.

--ron

Greg Lovern

Find position of first non-matching character

Excellent, thanks! :-)

Greg

On May 30, 11:26 am, Ron Rosenfeld wrote:
On Fri, 30 May 2008 09:35:58 -0700 (PDT), Greg Lovern wrote:
I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?

Thanks,

Greg

You could use this **array** function.

Make sure the "99" is greater than the length of your longest string.

=MATCH(FALSE,MID(A1,ROW($1:$99),1)=MID(A2,ROW($1:$ 99),1),0)

To enter an **array** function, hold down <ctrl<shift while hitting <enter.
Excel will place braces {...} around the formula.

--ron

Ron Rosenfeld

Find position of first non-matching character

On Fri, 30 May 2008 11:42:48 -0700 (PDT), Greg Lovern wrote:

Excellent, thanks! :-)

Greg

You're welcome. Glad to help.

By the way, the formula I gave you will return a #NA error if the two strings
match exactly. There are a variety of methods of handling this issue if that
is not desired.

Probably the simplest is to just embed it in an IF statement. e.g:

=IF(A1=A2,"Match",formula)

Also note that the formula, as provided, is NOT case sensitive. If you wanted
a case-sensitive formula, and also to check for a complete Match, then
something like:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW(1:99),1),MID(A2,ROW(1:99),1)),0))

should work.
--ron

Ron Rosenfeld

Find position of first non-matching character

On Fri, 30 May 2008 15:29:05 -0400, Ron Rosenfeld
wrote:

Should be:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1)) ,0))

--ron

Harlan Grove[_2_]

Find position of first non-matching character

Ron Rosenfeld wrote...
Should be:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1) ),0))

Variation on this. First, it's convenient to use a defined name like
seq referring to a formula like

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,256,1))

or also define a name like NCHARS referring to 256 or something else
and define seq as

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,NCHARS,1))

Then use an array formula like

=MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1),0 )

for case insensitive matches or

=MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y",se q,1)),0)

for case sensitive matches. These return LEN(A1)+1 if A1 = A2.

Ron Rosenfeld

Find position of first non-matching character

On Fri, 30 May 2008 15:25:35 -0700 (PDT), Harlan Grove
wrote:

Ron Rosenfeld wrote...
Should be:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1 )),0))

Variation on this. First, it's convenient to use a defined name like
seq referring to a formula like

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1: $65536,256,1))

or also define a name like NCHARS referring to 256 or something else
and define seq as

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1: $65536,NCHARS,1))

Then use an array formula like

=MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1), 0)

for case insensitive matches or

=MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y",s eq,1)),0)

for case sensitive matches. These return LEN(A1)+1 if A1 = A2.

Certainly a different approach, to return LEN(A1)+1 if A1=A2

I have a question, Harlan:

If using a defined name like seq to return a sequential array of numbers, what
is the advantage of using your longer formula:

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,256,1))

versus

=ROW($1:$256)

At least in Excel 2007, it seems that your formula requires that Sheet1 exist
in the workbook at the time the defined name is created or else it will give a
#REF error.
--ron

T. Valko

Find position of first non-matching character

"Ron Rosenfeld" wrote in message
...
On Fri, 30 May 2008 15:25:35 -0700 (PDT), Harlan Grove
wrote:

Ron Rosenfeld wrote...
Should be:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99), 1)),0))

Variation on this. First, it's convenient to use a defined name like
seq referring to a formula like

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1 :$65536,256,1))

or also define a name like NCHARS referring to 256 or something else
and define seq as

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1 :$65536,NCHARS,1))

Then use an array formula like

=MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1) ,0)

for case insensitive matches or

=MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y", seq,1)),0)

for case sensitive matches. These return LEN(A1)+1 if A1 = A2.

Certainly a different approach, to return LEN(A1)+1 if A1=A2

I have a question, Harlan:

If using a defined name like seq to return a sequential array of numbers,
what
is the advantage of using your longer formula:

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$ 65536,256,1))

versus

=ROW($1:$256)

At least in Excel 2007, it seems that your formula requires that Sheet1
exist
in the workbook at the time the defined name is created or else it will
give a
#REF error.
--ron

My interpretation....

ROW($1:$256) is susceptible to row insertions. Using INDIRECT takes care of
that but is volatile:

ROW(INDIRECT("1:256"))

ROW(INDEX(....)) is an alternative to using INDIRECT.

--
Biff
Microsoft Excel MVP

Ron Rosenfeld

Find position of first non-matching character

On Sat, 31 May 2008 01:47:37 -0400, "T. Valko" wrote:

My interpretation....

ROW($1:$256) is susceptible to row insertions. Using INDIRECT takes care of
that but is volatile:

ROW(INDIRECT("1:256"))

ROW(INDEX(....)) is an alternative to using INDIRECT.

--
Biff
Microsoft Excel MVP

Good point. Thanks
--ron

Lori

Find position of first non-matching character

That's true but there are several other issues with the defined name method:

- Using a reference to all of the sheet is less efficient since the formula
recalculates whenever any change is made anywhere on the same sheet.

- Dynamic ranges recalculate at startup, causing the save changes prompt to
appear by default when a workbook is closed (even if nothing has changed).

- If you delete all cells on the sheet by right-clicking the top left corner
and clicking delete, the name will return #ref!.

A simple way around might be to refer to a hidden sheet where no changes are
made but this causes additional overhead.

Another option is to use an active sheet name: =row(!$1:$256)
This is more efficient for recalculation and robust to row insertions, etc.
but can cause an error when copied between workbooks.

You could also try using a relatively defined name: =row(1:256)-row(1:1) but
this causes wraparound problems when you reach the end.

In summary all methods have drawbacks and the straightforward method often
works fine provided few changes are planned.

Harlan Grove[_2_]

Find position of first non-matching character

Lori wrote...
That's true but there are several other issues with the defined name method:

And there are counterissues when using alternatives.

- Using a reference to all of the sheet is less efficient since the formula
recalculates whenever any change is made anywhere on the same sheet.

As also happens when using INDIRECT. However, formulas referring to
the entire worksheet don't recalculate when making changes in OTHER
worksheets, but formulas using INDIRECT still do.

There's no way to eliminate problems with row/column insertion/
deletion other than using INDIRECT or INDEX:INDEX.

- Dynamic ranges recalculate at startup, causing the save changes prompt to
appear by default when a workbook is closed (even if nothing has changed).

Same using either INDIRECT or INDEX:INDEX, but there's no way to avoid
this AND protect against row/column insertion/deletion. It's a trade-
off, but if macros were allowed, this could be addressed by something
like this.

Private Sub Workbook_Open()
Me.Saved = True
End Sub

- If you delete all cells on the sheet by right-clicking the top left corner
and clicking delete, the name will return #ref!.

Yes, and that's the ONLY way to affect references to $1:$65536 (Excel
97 to Excel 2003) other than deleting the entire worksheet. If you
have users that make a habit of deleting all cells in worksheets, you
have my sympathy, but in my experience this happens rarely enough that
it doesn't require special handling. But if you must guard against
this, once again event handlers to the rescue.

Private Sub Workbook_SheetSelectionChange(ByVal Sh As Object, ByVal
Target As Range)
Application.EnableCancelKey = xlDisabled
Application.EnableEvents = False
If Target.Address(0, 0, xlA1, 0) = "1:65536" Then
MsgBox Prompt:="You may not select the entire worksheet.", _
Title:="No! No! No! Bad user!!"
Target.Cells(1, 1).Select
End If
Application.EnableEvents = True
Application.EnableCancelKey = xlInterrupt
End Sub

A simple way around might be to refer to a hidden sheet where no changes are
made but this causes additional overhead.

This is a good idea. There are usually many things that could and
should be stored in hidden worksheets, and the overhead isn't all that
much as long as you don't to any formatting in that hidden worksheet.

Another option is to use an active sheet name: =row(!$1:$256)
This is more efficient for recalculation and robust to row insertions, etc.
but can cause an error when copied between workbooks.

There are other problems besides copying between workbooks.

You could also try using a relatively defined name: =row(1:256)-row(1:1) but
this causes wraparound problems when you reach the end.

This is also susceptible to selecting all cells and deleting.

In summary all methods have drawbacks and the straightforward method often
works fine provided few changes are planned.

Using a hidden worksheet is the best of these possibilities. The
overhead should be negligible compared to the benefits.

All times are GMT +1. The time now is 03:12 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com