View Single Post
  #22   Report Post  
Posted to microsoft.public.excel.programming
Walter Briscoe Walter Briscoe is offline
external usenet poster
 
Posts: 279
Default Regular Expression Help on syntax

In message of Sun, 10 Jan 2010 04:55:10 in
microsoft.public.excel.programming, joel
writes

Ron: have you ever looked at the 1970's unix manual volume 2b under the
YACC topic. See this webpage. Look for the link for the PDF files and
use the link : v7vol2b.pdf (819KB)

Webpage
'7th Edition Manual PDF'
(http://plan9.bell-labs.com/7thEdMan/bswv7.html)

pdf file
http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf



It is the only good description of pattern matching that I have ever
seen.

The question mark indicates any single character.


Joel, How do you conclude that? (I expect "." to match any single
character within a line and ".|\s" to match any single character).

I found this table on page 49/250 (Joel is probably looking elsewhere):

Regular expressions in Lex use the following operators:

....
x? an optional x.
x* 0,1,2, ... instances of x.
x+ 1,2,3, ... instances of x.
x?y an x or a y.
....

I am confused by that x?y. I think it means an optional x followed by a
literal y. I think there may be a glyph confusion. I suspect it is
intended to be x|y where the character - for which I know no name -
between x and y means "or" as & - ampersand - means "and".

The symbols used by Set RE = CreateObject("VBScript.RegExp")
RE.Pattern = ... are a superset of the BRE described in <http://opengrou
p.org/onlinepubs/007908775/xbd/re.html. My own view is that thinks like
\d is an unnecessary shorthand for [0-9]. I concede it is reasonable if
a range is not allowed in a character set. i.e. if [0123456789] is
needed. In VBA, I refer to <http://msdn.microsoft.com/en-
us/library/ms974570.aspx which specifies everything I want to know
other than the meanings of the exceptions given.
<http://msdn.microsoft.com/en-us/library/xe43cc8d%28VS.85%29.aspx
specifies 5019, which I hit yesterday. ;)

For my part, I would start with "-?\d+" to grab a whole number. i.e a
whole number is a minus which is optional followed by a digit one or
more times. I found the OP's situation too complicated to want to follow
and offer a suggestion.
--
Walter Briscoe