View Single Post
  #10   Report Post  
Posted to microsoft.public.excel.programming
Tushar Mehta Tushar Mehta is offline
external usenet poster
 
Posts: 1,071
Default Regular Expressions in VBA?

Thank you for your comments. Rest assured that both the technical
comments and those of a more personal nature will get all the respect
they deserve.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

In article ,
says...
"Tushar Mehta" wrote...
If the contents of the file are a fixed number of columns, each of a
fixed character width, you should follow through on David's suggestion.

If you still need Regular Expression support, in addition to the other
suggestions, you might also want to check
http://www.tmehta.com/regexp/

You have a subtle error in one of your examples in

http://www.tmehta.com/regexp/examples.htm

specifically, "The pattern that recognizes all characters other than
letters, numbers, and the underscore character is [^\w+]." You should use
\W+ (upper case W rather than lower case w, representing the Unix mindset
that case matters). Your regexp is actually an error because it doesn't
strip out + characters, and your claim about what the + char does in your
regexp is wrong because closure operators don't apply within character
classes.

Just try

=RegExpSubstitute("a+1+_","[^\w+]","")

and you'll see the result is "a+1+_". Inside character classes, *, + and ?
are treated as literal characters, so your regexp is the complement of the
SINGLE CHARACTER class made up of 'word' (token) characters *AND* +. So your
regexp also does a lot more work than it should. While you could use [^\w]+,
that's foolish compared to \W+.

You have another subtle error in "The sub-expression ((\d|.)+) isolates the
number, which may contain decimal values, and the next sub-expression (CR)
isolates the CR." Your subexpression matches any string of one or more
periods. The canonical way to express unsigned decimal numbers possibly with
decimal fractional parts is

(\d*\.)?\d+

While your subexpression may work on your test data and may work most of the
time, it doesn't take much to make it work *ALL* of the time.

You need to invest some money buying Jeffrey Friedl's "Mastering Regular
Expressions" and some time reading and understanding it. You may be an Excel
MVP, but you're a regexp newbie. If you think I'm harsh, try learning how to
use regexps for real by posting replies in comp.unix.shell, comp.lang.awk
and comp.lang.awk, where the regexp experts hang out.