Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Old January 30th 13, 10:14 AM posted to microsoft.public.excel.programming
external usenet poster
 
First recorded activity by ExcelBanter: Aug 2010
Posts: 158
Default Another regular expression question

My input data strings have roughly the following format:

"Item1 scissors"
"Item2 two notebooks"
"item3"
"itm4 keyboards and scissors"
"item_5 glue,paper,scissors"

My strings begin with an "item number" string followed by a
description of the item(s). It is possible that an item description
could be missing, as seen in "item3" above. (Assume that the
"item number" token will always be present)

What is a good regular expression that I can use to save the
item number in a variable named "$ItemNum" and save the
description in a variable named "$Description"?? (If the
description string is missing I want the "$Description" variable
to be set to the empty string.

Thank you.




  #2   Report Post  
Old January 30th 13, 01:04 PM posted to microsoft.public.excel.programming
external usenet poster
 
First recorded activity by ExcelBanter: Jun 2010
Posts: 1,045
Default Another regular expression question

On Wed, 30 Jan 2013 03:14:13 -0700, "Robert Crandal" wrote:

My input data strings have roughly the following format:

"Item1 scissors"
"Item2 two notebooks"
"item3"
"itm4 keyboards and scissors"
"item_5 glue,paper,scissors"

My strings begin with an "item number" string followed by a
description of the item(s). It is possible that an item description
could be missing, as seen in "item3" above. (Assume that the
"item number" token will always be present)

What is a good regular expression that I can use to save the
item number in a variable named "$ItemNum" and save the
description in a variable named "$Description"?? (If the
description string is missing I want the "$Description" variable
to be set to the empty string.

Thank you.



Your question requires clarification for me to respond.

Since this is an Excel programming group, I would normally assume you are writing about vbscript (essentially the same as Javascript) flavor of regular expressions. Subgroup naming is not supported. But you could use subgroups to save the relevant portions of your string, and then assign them to variables within your VBA routine. However, VBA does not support names which begin with "$".

If you are in the wrong group, there are flavors of regex that do support variable naming but, at least in the .NET flavor, they do not begin with the "$" as you have shown above. For example, in the .NET flavor, you might have a named capturing group called ItemNum and would use it in a replace string as $[ItemNum].

Also, there is considerable variation in your "item number" tokens, and it is not clear if you want to capture only the number (which presumeably is an integer), or the entire word. Nor is it clear what the extent of variability in your item number tokens might be, or whether we could just identify it more simply as merely (^"\w+\d+)

Item
item
itm
item_

  #3   Report Post  
Old January 30th 13, 07:01 PM posted to microsoft.public.excel.programming
external usenet poster
 
First recorded activity by ExcelBanter: Aug 2010
Posts: 158
Default Another regular expression question

Hi Ron.... I only posted this question on the Excel VBA board
because I'm using regular expressions in Excel. I forgot that
I should NOT use "$" in my variable names. For now, assume that
my variable names will be "sItem" and "sDescription".

And yes, there is variation in my "item number", and I want to capture
the ENTIRE first token string, whether it be "Item4", "item_4", or even
"b6".
Our data format looks roughly like this:

"item number string" [option string description]

So, real examples a

Item1 scissors
Item2 two notebooks
item3
i4 keyboards and scissors
item_5 glue,paper,scissors
b6 broken keyboard


Sorry about the confusion. I hope I made it more clear now. Please
let me know if you need further clarification. Thank you.

Robert




"Ron Rosenfeld" wrote in message
...

Your question requires clarification for me to respond.

Since this is an Excel programming group, I would normally assume you are
writing about vbscript (essentially the same as Javascript) flavor of
regular expressions. Subgroup naming is not supported. But you could use
subgroups to save the relevant portions of your string, and then assign
them to variables within your VBA routine. However, VBA does not support
names which begin with "$".

If you are in the wrong group, there are flavors of regex that do support
variable naming but, at least in the .NET flavor, they do not begin with
the "$" as you have shown above. For example, in the .NET flavor, you
might have a named capturing group called ItemNum and would use it in a
replace string as $[ItemNum].

Also, there is considerable variation in your "item number" tokens, and it
is not clear if you want to capture only the number (which presumeably is
an integer), or the entire word. Nor is it clear what the extent of
variability in your item number tokens might be, or whether we could just
identify it more simply as merely (^"\w+\d+)

Item
item
itm
item_


  #4   Report Post  
Old January 31st 13, 01:08 AM posted to microsoft.public.excel.programming
external usenet poster
 
First recorded activity by ExcelBanter: Jun 2010
Posts: 1,045
Default Another regular expression question

On Wed, 30 Jan 2013 12:01:20 -0700, "Robert Crandal" wrote:

Hi Ron.... I only posted this question on the Excel VBA board
because I'm using regular expressions in Excel. I forgot that
I should NOT use "$" in my variable names. For now, assume that
my variable names will be "sItem" and "sDescription".

And yes, there is variation in my "item number", and I want to capture
the ENTIRE first token string, whether it be "Item4", "item_4", or even
"b6".
Our data format looks roughly like this:

"item number string" [option string description]

So, real examples a

Item1 scissors
Item2 two notebooks
item3
i4 keyboards and scissors
item_5 glue,paper,scissors
b6 broken keyboard


Sorry about the confusion. I hope I made it more clear now. Please
let me know if you need further clarification. Thank you.

Robert




Well, the only commonality I can see is the your
Item token is always the first word.

Then there are some itervening spaces, followed by the rest of the string which comprises the description.

That being the case, a regex such as:

^(\S+)(?:\s+(.+))?

should be all that you need.

Code might look like:

============================
Option Explicit
Public sItem As String
Public sDescription As String
Sub ItemDescr(s As String)
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
With re
.Pattern = "^(\S+)(?:\s+(.+))?"
.MultiLine = False
.Global = True
If .test(s) = True Then
Set mc = .Execute(s)
sItem = mc(0).submatches(0)
sDescription = mc(0).submatches(1)
End If
End With
End Sub
=======================

You could call it in a variety of ways. Here's one, if your strings are in column A:

================================
Sub test()
Dim c As Range
For Each c In Range("A1", Cells(Rows.Count, "A").End(xlUp))
ItemDescr (c)
Debug.Print sItem, sDescription
Next c
End Sub
==================================
  #5   Report Post  
Old February 2nd 13, 07:29 AM posted to microsoft.public.excel.programming
external usenet poster
 
First recorded activity by ExcelBanter: Aug 2010
Posts: 158
Default Another regular expression question

"Ron Rosenfeld" wrote in message
...

That being the case, a regex such as:

^(\S+)(?:\s+(.+))?

should be all that you need.


Hi Ron. Just out of curiousity, what is the meaning of the colon
in the above pattern string?




  #6   Report Post  
Old February 2nd 13, 11:44 AM posted to microsoft.public.excel.programming
external usenet poster
 
First recorded activity by ExcelBanter: Jun 2010
Posts: 1,045
Default Another regular expression question

On Sat, 2 Feb 2013 00:29:55 -0700, "Robert Crandal" wrote:

"Ron Rosenfeld" wrote in message
.. .

That being the case, a regex such as:

^(\S+)(?:\s+(.+))?

should be all that you need.


Hi Ron. Just out of curiousity, what is the meaning of the colon
in the above pattern string?


You should have access to a comprehensive definition of the syntax for the relevant flavor if you are going to be using regular expressions.
Here is a link to the syntax used in the VBscript flavor: http://msdn.microsoft.com/en-us/libr...=vs.84%29.aspx

From that link:

==========================
(?attern)

Matches pattern but does not save the match, that is, the match is not stored for possible later use. This is useful for combining parts of a pattern with the "or" character (|).

industr(?:y|ies) is equivalent to industry|industries.
===========================

So using this makes the regex more efficient, as it does not have to store the part of the match that includes the spaces between the two elements that we want to match.


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
R: Basic regular expression question r[_4_] Excel Programming 0 April 17th 11 11:08 PM
R: Basic regular expression question r[_4_] Excel Programming 0 April 17th 11 10:59 PM
Basic regular expression question Robert Crandal[_2_] Excel Programming 4 April 17th 11 12:10 PM
Can someone help me with this regular expression? [email protected] Excel Discussion (Misc queries) 3 March 10th 09 07:36 PM
Help with regular expression PO Excel Programming 3 May 2nd 07 01:39 PM


All times are GMT +1. The time now is 10:20 AM.

Powered by vBulletin® Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
Copyright 2004-2019 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"

 

Copyright © 2017