Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Basic regular expression question
I'm still a newbie at all this regular expression stuff, so forgive
this newb question.... My input data strings have the following format: "Item1 scissors" "Item2 notebooks" "I3 pens" "itm4 keyboards" ...... So, each line is basically formatted like this: [string of characters] [several whitespace(s)] [string of characters] Using regular expressions, how can I store each pair of items in separate variables?? For example, if I read in the first line above, I would like my variable sNum to store the "Item1" string, and a variable named sObj would store "scissors". I guess I'm really trying to parse each pair of items and store them in variables using regular expressions, but I don't fully understand how to create my own regular expression pattern strings yet. Thanks! |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Basic regular expression question
On Sat, 16 Apr 2011 15:09:58 -0700, "Robert Crandal" wrote:
I'm still a newbie at all this regular expression stuff, so forgive this newb question.... My input data strings have the following format: "Item1 scissors" "Item2 notebooks" "I3 pens" "itm4 keyboards" ..... So, each line is basically formatted like this: [string of characters] [several whitespace(s)] [string of characters] Using regular expressions, how can I store each pair of items in separate variables?? For example, if I read in the first line above, I would like my variable sNum to store the "Item1" string, and a variable named sObj would store "scissors". I guess I'm really trying to parse each pair of items and store them in variables using regular expressions, but I don't fully understand how to create my own regular expression pattern strings yet. Thanks! What you show is two words separated by space(s). Assuming that the words contain only letters, digits and possibly an underscore, and that there are only two words in each line, the regex is fairly simple: ^(\w+)\s+(\w+) which means: Assert position at the beginning of the string «^» Match the regular expression below and capture its match into backreference number 1 «(\w+)» Match a single character that is a “word character” (letters, digits, and underscores) «\w+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» Match the regular expression below and capture its match into backreference number 2 «(\w+)» Match a single character that is a “word character” (letters, digits, and underscores) «\w+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» A sample VBA macro which captures the Item number into the first element of 2 dimensional array; and the object into the second item of the array, might look like: ================================= Option Explicit Sub foo() Dim re As Object, mc As Object Const sPat As String = "(\w+)\s+(\w+)" Dim InputData(0 To 3) As String Dim i As Long Dim Results() As String InputData(0) = "Item1 scissors" InputData(1) = "Item2 notebooks" InputData(2) = "I3 pens" InputData(3) = "itm4 keyboards" Set re = CreateObject("vbscript.regexp") re.Pattern = sPat re.Global = True For i = 0 To UBound(InputData) If re.test(InputData(i)) = True Then Set mc = re.Execute(InputData(i)) ReDim Preserve Results(0 To 1, 0 To i) Results(0, i) = mc(0).submatches(0) Results(1, i) = mc(0).submatches(1) End If Next i End Sub ========================== Hope this helps. |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Basic regular expression question
Hello Ron! I just wanted to say thank so much for your excellent help
again. That code is working great! I have a new question now. I just realized that the third element can actually contain multiple word elements. So, my data might actually look like this: "Item1 scissors" "Item2 red notebooks" "Item3 number #2 pencils" So, the data format really is: [single string of characters] [whitespace(s)] [any string of characters and optional whitespace(s)] So.... I plan to basically reuse the code you gave me previously, but I need to modify the regular expression pattern so that the variable mc(0).submatches(1) would get assigned strings like "scissors", or "red notebooks", or "number #2 pencils" How should I change the pattern string? Thankx! "Ron Rosenfeld" wrote in message ... On Sat, 16 Apr 2011 15:09:58 -0700, "Robert Crandal" wrote: What you show is two words separated by space(s). |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Basic regular expression question
I have a new question now. I just realized that the third element
can actually contain multiple word elements. So, my data might actually look Like this: "Item1 scissors" "Item2 red notebooks" "Item3 number #2 pencils" So, the data format really is: [single string of characters] [whitespace(s)] [any string of characters and optional whitespace(s)] So.... I plan to basically reuse the code you gave me previously, but I need to Modify the regular expression pattern so that the variable mc(0).submatches(1) would get assigned strings like "scissors", or "red notebooks", or "number #2 pencils" How should I change the pattern string? I'm not sure if this will be helpful to you or not as I think you are attempting to learn how to program with Regular Expressions; however, assuming those "whitespaces" you mentioned are simply normal spaces, you can do what you have asked without using Regular Expressions... straight VB is enough. Here is Ron's macro revised to perform without Regular Expressions and modified to handle the multipli-spaced data you just posted about... Sub FooToo() Dim i As Long, InputData(0 To 3) As String, Parts() As String, Results() As String InputData(0) = "Item1 scissors" InputData(1) = "Item2 notebooks" InputData(2) = "Item2 red notebooks" InputData(3) = "Item3 number #2 pencils" For i = 0 To UBound(InputData) If InStr(InputData(i), " ") Then Parts = Split(WorksheetFunction.Trim(InputData(i)), " ", 2) ReDim Preserve Results(0 To 1, 0 To i) Results(0, i) = Parts(0) Results(1, i) = Parts(1) End If Next i End Sub Rick Rothstein (MVP - Excel) |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Basic regular expression question
On Sat, 16 Apr 2011 22:37:36 -0700, "Robert Crandal" wrote:
I plan to basically reuse the code you gave me previously, but I need to modify the regular expression pattern so that the variable mc(0).submatches(1) would get assigned strings like "scissors", or "red notebooks", or "number #2 pencils" How should I change the pattern string? Thankx! If the "object" will be all on the same line: "^(\w+)\s+(.+)" However, because of the peculiarities of MS implementation in vba, if the "object" might span a second line, then you should use: "^(\w+)\s+([\s\S]+)" As an aid to writing and testing regular expressions, I would suggest a program titled RegexBuddy (www.regexbuddy.com) And, as Rick is so fond of pointing out, you can do most anything using built-in VBA methods without using Regular Expressions, and they will often run more quickly if that is an issue. However, once you become fluent in Regular Expressions, it takes much less time to develop complex string manipulations using them than using VBA. Of course, if speed is paramount, I suppose we should be writing in machine language <g. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Regular Expression Help on syntax | Excel Programming | |||
Can someone help me with this regular expression? | Excel Discussion (Misc queries) | |||
Regular Expression Conditionals (?(if)then|else) in VBA? | Excel Programming | |||
Help with regular expression | Excel Programming | |||
Regular Expression | Excel Discussion (Misc queries) |