View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.programming
ker_01 ker_01 is offline
external usenet poster
 
Posts: 395
Default Use 'like' or regex to match number-letter pattern substrings

There has to be an easier way that the one I'm currently working on...

I'm writing a UDF to help a colleague parse some key data out of some
inconsistently formatted raw data strings. One seach will be for machine part
numbers in the raw data string. All part numbers are in the format
Alpha-Alpha-Number-Number-Number, as shown in the following string:
"I pulled the engine apart and found that part MX452 was ok, but MV119 was
worn and needed to be replaced"

I've made the UDF with an input string parameter that must only contain the
characters "#" and "A", for example, "##AAA" for the above part numbers.
After performing it's VBA magic, the UDF should then return "MX452, MV119"

The actual pattern may differ for different searches; for example, searches
might be for an employee ID (#AAA), a tool ID (#######), a test number
(AAA#), or other (sub)strings. There may be a few where there will be a
non-alphanumeric character in a specific position (###-#A#).

One option is Regex, which I've used for fixed search strings, but my brain
hurts thinking about how to code Regex to dynamically deal with a user-input
string.

The other option (and what I've started) is to look at every substring in
the source string, and then compare each substring to the pattern to see if
it matches. I don't think this is very efficient, and given that there will
be thousands of records, it will also probably be painfully slow.

Example, based on the above string: Comparison string is 5 characters, so
check each 5-character substring ("I pul", " pull", "pulle", "ulled", lled ",
etc...)

I've looked at the "like" statement, and that seems to be a good lead. I can
use the user-input string to create something like:
[!0-9][!0-9][!A-Z][!A-Z][!A-Z]

And maybe use that to match each substring ("I pul", " pull", "pulle",
"ulled", lled ", etc...) but that still seems inefficient. However, that
seems to be the only way to know where the match is (to actually return it),
and to see if there is more than one match in the raw data string.

Are there any easier approaches to finding (and pulling) every substring
that matches a user-designated pattern?

Thanks,
Keith