View Single Post
  #34   Report Post  
Posted to microsoft.public.excel.misc
Rick Rothstein Rick Rothstein is offline
external usenet poster
 
Posts: 5,934
Default Need formula to extract a numeric value from a free-format text

I had the damnedest time trying to find what RegExp considered a word
boundary (basically, each sight just kept say use \w without listing what it
consider the boundary). When I finally found the one I cited, I figured it
was a universal definition. Now I'm guessing there might be version
differences between the various RegExp engines. Yes, I fix to account for
the underbar is as you have shown it. For the archives, here is the UDF with
the change you indicated...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


"Ron Rosenfeld" wrote in message
...
On Sat, 18 Jul 2009 12:33:47 -0400, "Rick Rothstein"
wrote:

The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at
the
end of a sentence.


Yes, I did.


I found this definition of a word boundary in Regular Expressions...

"A word boundary represents the spot where a letter or
number meets a space, apostrophe, a period, or anything
else that isn't a letter or number"


Not quite what I understand it to be (but close)./

The definitions I've seen indicate that a word boundary "Matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or
end
of the string if the first and/or last characters in the string are word
characters."

And, at least in VBScript, a word character is a digit, letter or
underscore
e.g: [A-Za-z0-9_]



Given that, this modification of my function should do what your RegExp
solution does...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function


It comes pretty close. Just change this line to include the underscores:

If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then

--ron