Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Need help on regular expression, where we are trying to grab number values as
well as minus sign if negative. Currently we can grab all the numbers correctly, just if numbers are negative, we only grab the number not the number and the minus sign: Example for: -10 we get 10 but want -10 of course I am new to Regular Expressions and have played with the syntax but cannot get it work properly. Currently we are using "\D+(\d+)" |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Thu, 7 Jan 2010 23:27:01 -0800, Jason
wrote: Need help on regular expression, where we are trying to grab number values as well as minus sign if negative. Currently we can grab all the numbers correctly, just if numbers are negative, we only grab the number not the number and the minus sign: Example for: -10 we get 10 but want -10 of course I am new to Regular Expressions and have played with the syntax but cannot get it work properly. Currently we are using "\D+(\d+)" I note that your expression will only capture integers. And also will not capture an integer at the start of a line. For integers with an optional sign, and to allow capture of positive integers at the beginning of a line, try: "[-+]?\b\d+\b" --ron |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Hello Dan,
Based on my researches, if we want to identify the negative number, we can use the following regular expressions, ^(-)?\D+(\d+) Best regards, Ji Zhou Microsoft Online Community |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
And for more Regular Expression pattern, as well as a list of symbol usage,
you can refer the following MSDN link, http://msdn.microsoft.com/en-us/library/ms974570.aspx Hope this helps! Best regards, Ji Zhou Microsoft Online Community Support |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Neither of these suggestions would grab the negative sign, so maybe there is
more to it. I have posted what I believe to be the more relevant portions of the code below: c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb") Private Function RegexMid(s As String, sDate As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If Set re = Nothing End Function ***Here is an example of the data from the source page that we are trying to pull from </tr </tbody <tbody <tr <td<a href="/history/airport/KSTP/2010/1/1/DailyHistory.html"1</a</td <td class="bl gb" 8 </td <td class="gb" 0 </td <td class="br gb" -8 </td Thank you! |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
You need to OR to expressions like I did below "(\d+|([-+]\d+))" the pipe character is an or function to either have the minus sign or not have the minus sign -- joel ------------------------------------------------------------------------ joel's Profile: 229 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=168082 Microsoft Office Help |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Joel,
I tried using your suggestion, but it won't pull any numbers, much less the negatives. The closest I got was using: "(\D+(-)?\d+)" but it pulls "-8 instead of -8 or -8 instead of -8 "joel" wrote: You need to OR to expressions like I did below "(\d+|([-+]\d+))" the pipe character is an or function to either have the minus sign or not have the minus sign -- joel ------------------------------------------------------------------------ joel's Profile: 229 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=168082 Microsoft Office Help . |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
I don't know what the \D (capital D does). "(\D+(-)?\d+)" 1) \d+ the backslash d say take one digit, the plus says take all didgits folowing the first digit (digit is 0 to 9) 2) ? says match any character before the string of digits 0 to 9. 3) "(\d+|([-+]\d+))" the first part is the same as the \d+ above. I added the pipe to have an Or'd conditions. Te square brackets say to match either character the plus sign or the minus sign. I think you have a space in front of the minus sign. Then you need this "(\d+|([-+]\b\d+))" You could also try this which will accept either a space between the plus and minus sign of no space. "(\d+|([-+]\b\d+))|([-+]\d+)" -- joel ------------------------------------------------------------------------ joel's Profile: 229 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=168082 Microsoft Office Help |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Fri, 8 Jan 2010 10:41:01 -0800, Jason
wrote: Joel, I tried using your suggestion, but it won't pull any numbers, much less the negatives. The closest I got was using: "(\D+(-)?\d+)" but it pulls "-8 instead of -8 or -8 instead of -8 I'm just curious about what happened with my suggestion? "[-+]?\b\d+\b" --ron |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Still will not pull any numbers with either syntax you suggested.
|
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
"[-+]?\b\d+\b"
This doesn't seem to grab any numbers at all. Perhaps there is more info I can provide from the script below: c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb") Private Function RegexMid(s As String, sDate As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If Set re = Nothing End Function ***Here is an example of the data from the source page that we are trying to pull from </tr </tbody <tbody <tr <td<a href="/history/airport/KSTP/2010/1/1/DailyHistory.html"1</a</td <td class="bl gb" 8 </td <td class="gb" 0 </td <td class="br gb" -8 </td Thank you! |
#12
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Ron,
Sorry I posted this reply earlier. "[-+]?\b\d+\b" This doesn't seem to grab any numbers at all. Perhaps there is more info I can provide from the script below: c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb") Private Function RegexMid(s As String, sDate As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If Set re = Nothing End Function ***Here is an example of the data from the source page that we are trying to pull from </tr </tbody <tbody <tr <td<a href="/history/airport/KSTP/2010/1/1/DailyHistory.html"1</a</td <td class="bl gb" 8 </td <td class="gb" 0 </td <td class="br gb" -8 </td Thank you! |
#13
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
I put the string into a file c:\temp\test.txt then ran this code. the pattern I was using worked. You have to move sequentially through the file to get the data. Look at what I did below. Sub UseScript() Const ReadFile = "c:\temp\test.txt" Const ForReading = 1, ForWriting = 2, _ ForAppending = 3 Set fs = CreateObject("Scripting.FileSystemObject") Set fin = fs.OpenTextFile(ReadFile, _ ForReading, TristateFalse) MyStr = fin.readall fin.Close URLdate = DateValue("1/1/2010") sURLdate = Format(URLdate, "/yyyy/d/m") Set c = Range("A1") 'find html header FindStr = "/history/airport/KSTP" & sURLdate & "/DailyHistory.html" StartSearch = 1 SearchStr = Mid(MyStr, StartSearch) Set Results = RegexMid(SearchStr, FindStr) If IsNull(Results) Then MsgBox ("Did not find string : " & FindStr) Exit Sub End If 'find Bl gb number Firstbyte = Results.Item(0).firstIndex NumberBytes = Results.Item(0).Length FindStr = "bl gb" StartSearch = StartSearch + Firstbyte + NumberBytes SearchStr = Mid(MyStr, StartSearch) Set Results = RegexMid(SearchStr, FindStr) If IsNull(Results) Then MsgBox ("Did not find string : " & FindStr) Exit Sub End If Firstbyte = Results.Item(0).firstIndex NumberBytes = Results.Item(0).Length FindStr = "(\d+|([-+]\d+))" StartSearch = StartSearch + Firstbyte + NumberBytes SearchStr = Mid(MyStr, StartSearch) Set Results = RegexMid(Mid(MyStr, StartSearch), FindStr) If IsNull(Results) Then MsgBox ("Did not find string : " & FindStr) Exit Sub End If c.Offset(0, i + 1).Value = Results.Item(0).Value 'find class=gb number Firstbyte = Results.Item(0).firstIndex NumberBytes = Results.Item(0).Length FindStr = "class=""gb""" StartSearch = StartSearch + Firstbyte + NumberBytes SearchStr = Mid(MyStr, StartSearch) Set Results = RegexMid(SearchStr, FindStr) If IsNull(Results) Then MsgBox ("Did not find string : " & FindStr) Exit Sub End If Firstbyte = Results.Item(0).firstIndex NumberBytes = Results.Item(0).Length FindStr = "(\d+|([-+]\d+))" StartSearch = StartSearch + Firstbyte + NumberBytes SearchStr = Mid(MyStr, StartSearch) Set Results = RegexMid(Mid(MyStr, StartSearch), FindStr) If IsNull(Results) Then MsgBox ("Did not find string : " & FindStr) Exit Sub End If c.Offset(0, i + 2).Value = Results.Item(0).Value 'find Br gb number Firstbyte = Results.Item(0).firstIndex NumberBytes = Results.Item(0).Length FindStr = "Br gb" StartSearch = StartSearch + Firstbyte + NumberBytes SearchStr = Mid(MyStr, StartSearch) Set Results = RegexMid(SearchStr, FindStr) If Results Is Nothing Then MsgBox ("Did not find string : " & FindStr) Exit Sub End If Firstbyte = Results.Item(0).firstIndex NumberBytes = Results.Item(0).Length FindStr = "(\d+|([-+]\d+))" StartSearch = StartSearch + Firstbyte + NumberBytes SearchStr = Mid(MyStr, StartSearch) Set Results = RegexMid(Mid(MyStr, StartSearch), FindStr) If Results Is Nothing Then MsgBox ("Did not find string : " & FindStr) Exit Sub End If c.Offset(0, i + 3).Value = Results.Item(0).Value End Sub Private Function RegexMid( _ ByVal s As String, _ ByVal FindStr) As Variant Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = FindStr If re.test(s) = True Then Set mc = re.Execute(s) Set RegexMid = mc Else Set RefexMid = Nothing End If Set re = Nothing End Function -- joel ------------------------------------------------------------------------ joel's Profile: 229 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=168082 Microsoft Office Help |
#14
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Fri, 8 Jan 2010 14:20:01 -0800, Jason
wrote: Ron, Sorry I posted this reply earlier. "[-+]?\b\d+\b" This doesn't seem to grab any numbers at all. Perhaps there is more info I can provide from the script below: c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=gb") Private Function RegexMid(s As String, sDate As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType & "\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If Set re = Nothing End Function ***Here is an example of the data from the source page that we are trying to pull from </tr </tbody <tbody <tr <td<a href="/history/airport/KSTP/2010/1/1/DailyHistory.html"1</a</td <td class="bl gb" 8 </td <td class="gb" 0 </td <td class="br gb" -8 </td Thank you! There is no question that the regex I provided should return all signed and unsigned integers, as I understand things. There is also no question that your regex of \D+(\d+) will never capture a negative integer into the capturing group, since the "-" will always be captured by the \D. The problem is in the rest of your regex. It could be that either sDate or sTempType is not properly formed. For example, the following three regex's return the desired numbers in capturing group 1 when run against your test data: "\b/2010/1/1/DailyHistory[\s\S]+?""bl gb""[\s\S]+?-?\b(\d+)\b" -- 8 "\b/2010/1/1/DailyHistory[\s\S]+?""gb""[\s\S]+?-?\b(\d+)\b" -- 0 "\b/2010/1/1/DailyHistory[\s\S]+?""br gb""[\s\S]+?(-?\b\d+)\b" -- -8 Above is what your regex should look like after you finish concatenating things together. So in addition to your problem with the part of your regex that captures the integer, which should be corrected by using my suggestion, you need to evaluate the rest of the regex and how it is being constructed. --ron |
#15
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
I tested Ron's code and it didn't return what you really need. Ron's string returns everthing to the end of the number you are looking for. I finally figure it out. You need to put parenthsis around the two sub parts of the search string. the first part is the tag and the second part is the number. Then use the submatches prperty to get the 2nd submatch. See code below Sub UseScript() Const ReadFile = "c:\temp\test.txt" Const ForReading = 1, ForWriting = 2, _ ForAppending = 3 Set fs = CreateObject("Scripting.FileSystemObject") Set fin = fs.OpenTextFile(ReadFile, _ ForReading, TristateFalse) MyStr = fin.readall fin.Close URLdate = DateValue("1/1/2010") sURLdate = Format(URLdate, "/yyyy/d/m") Set c = Range("A1") FindStr = "(bl gb[\s\S]+?)(\d+|([-+]\d+))" SearchStr = MyStr Results = RegexMid(SearchStr, FindStr) c.Offset(0, i + 1).Value = Results FindStr = "(class=""gb""[\s\S]+?)(\d+|([-+]\d+))" SearchStr = MyStr Results = RegexMid(SearchStr, FindStr) c.Offset(0, i + 2).Value = Results FindStr = "(Br gb[\s\S]+?)(\d+|([-+]\d+))" SearchStr = MyStr Results = RegexMid(SearchStr, FindStr) c.Offset(0, i + 3).Value = Results End Sub Private Function RegexMid( _ ByVal s As String, _ ByVal FindStr) Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = FindStr If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc.Item(0).submatches(1) End If Set re = Nothing End Function -- joel ------------------------------------------------------------------------ joel's Profile: 229 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=168082 Microsoft Office Help |
#16
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sat, 9 Jan 2010 17:29:42 +0000, joel wrote:
I tested Ron's code and it didn't return what you really need. Ron's string returns everthing to the end of the number you are looking for. My "code" was merely a regex which is quite different from your code. If implemented properly, it returns only the number. If you are returning only what you say, then you probably are not implementing it properly. To demonstrate some code segments, we could set up the following, with the data in an Excel worksheet. And code similar to what the OP posted, with a few minor changes to correct his errors. I just put MyStr into A1 for testing, not having access to the rest of the OP's code. And, by setting c=A2 and leaving i=0, the OP's sub would place the results in B2:D2. A1: (the sample text given by the OP </tr </tbody <tbody <tr <td<a href="/history/airport/KSTP/2010/1/1/DailyHistory.html"1</a</td <td class="bl gb" 8 </td <td class="gb" 0 </td <td class="br gb" -8 </td Then use this routine -- quite similar to that of the OP: ==================================== Option Explicit Sub TestExtract() Dim i As Long Dim c As Range Dim myStr As String Dim sURLdate As String Set c = Range("A2") myStr = Range("A1").Value sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/") c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""") End Sub '------------------------------------------------------------------ Private Function RegexMid(s As String, sDate As String, sTempType As String) _ As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.ignorecase = True re.Pattern = sDate & "DailyHistory[\s\S]+?" & _ sTempType & "[\s\S]+?(-?\b\d+)\b" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If End Function ================================= As expected, this returns into B2, C2 and D2 the signed integers following the sTempType variables. Note also the extra quote marks required in the third call, c/w the OP's. Also note the terminal regex representing the signed integer. I finally figure it out. You need to put parenthsis around the two sub parts of the search string. the first part is the tag and the second part is the number. Then use the submatches prperty to get the 2nd submatch. Note that your code for a signed integer: (\d+|([-+]\d+)) could be more simply expressed as ([-+]?\d+) Why did you choose to use alternation? --ron |
#17
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sat, 09 Jan 2010 16:11:36 -0500, Ron Rosenfeld
wrote: On Sat, 9 Jan 2010 17:29:42 +0000, joel wrote: I tested Ron's code and it didn't return what you really need. Ron's string returns everthing to the end of the number you are looking for. My "code" was merely a regex which is quite different from your code. If implemented properly, it returns only the number. If you are returning only what you say, then you probably are not implementing it properly. To demonstrate some code segments, we could set up the following, with the data in an Excel worksheet. And code similar to what the OP posted, with a few minor changes to correct his errors. I just put MyStr into A1 for testing, not having access to the rest of the OP's code. And, by setting c=A2 and leaving i=0, the OP's sub would place the results in B2:D2. A1: (the sample text given by the OP </tr </tbody <tbody <tr <td<a href="/history/airport/KSTP/2010/1/1/DailyHistory.html"1</a</td <td class="bl gb" 8 </td <td class="gb" 0 </td <td class="br gb" -8 </td Then use this routine -- quite similar to that of the OP: ==================================== Option Explicit Sub TestExtract() Dim i As Long Dim c As Range Dim myStr As String Dim sURLdate As String Set c = Range("A2") myStr = Range("A1").Value sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/") c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""") End Sub '------------------------------------------------------------------ Private Function RegexMid(s As String, sDate As String, sTempType As String) _ As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.ignorecase = True re.Pattern = sDate & "DailyHistory[\s\S]+?" & _ sTempType & "[\s\S]+?(-?\b\d+)\b" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If End Function ================================= As expected, this returns into B2, C2 and D2 the signed integers following the sTempType variables. Note also the extra quote marks required in the third call, c/w the OP's. Also note the terminal regex representing the signed integer. I finally figure it out. You need to put parenthsis around the two sub parts of the search string. the first part is the tag and the second part is the number. Then use the submatches prperty to get the 2nd submatch. Note that your code for a signed integer: (\d+|([-+]\d+)) could be more simply expressed as ([-+]?\d+) Why did you choose to use alternation? --ron In order to get a proper match I have to use the \ character before the / characters in the Format function, like this sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/") Maybe this has something to do with the Regional and Language settings. I also have a question. What is the function of the two \b in the regexp? I get a match even without them. And the negative number is also matched. Lars-Åke |
#18
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sat, 09 Jan 2010 23:47:45 +0100, Lars-Åke Aspelin
wrote: On Sat, 09 Jan 2010 16:11:36 -0500, Ron Rosenfeld wrote: On Sat, 9 Jan 2010 17:29:42 +0000, joel wrote: I tested Ron's code and it didn't return what you really need. Ron's string returns everthing to the end of the number you are looking for. My "code" was merely a regex which is quite different from your code. If implemented properly, it returns only the number. If you are returning only what you say, then you probably are not implementing it properly. To demonstrate some code segments, we could set up the following, with the data in an Excel worksheet. And code similar to what the OP posted, with a few minor changes to correct his errors. I just put MyStr into A1 for testing, not having access to the rest of the OP's code. And, by setting c=A2 and leaving i=0, the OP's sub would place the results in B2:D2. A1: (the sample text given by the OP </tr </tbody <tbody <tr <td<a href="/history/airport/KSTP/2010/1/1/DailyHistory.html"1</a</td <td class="bl gb" 8 </td <td class="gb" 0 </td <td class="br gb" -8 </td Then use this routine -- quite similar to that of the OP: ==================================== Option Explicit Sub TestExtract() Dim i As Long Dim c As Range Dim myStr As String Dim sURLdate As String Set c = Range("A2") myStr = Range("A1").Value sURLdate = Format(CDate("1/1/2010"), "/yyyy/m/d/") c.Offset(0, i + 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, i + 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, i + 3).Value = RegexMid(myStr, sURLdate, "class=""gb""") End Sub '------------------------------------------------------------------ Private Function RegexMid(s As String, sDate As String, sTempType As String) _ As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.ignorecase = True re.Pattern = sDate & "DailyHistory[\s\S]+?" & _ sTempType & "[\s\S]+?(-?\b\d+)\b" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If End Function ================================= As expected, this returns into B2, C2 and D2 the signed integers following the sTempType variables. Note also the extra quote marks required in the third call, c/w the OP's. Also note the terminal regex representing the signed integer. I finally figure it out. You need to put parenthsis around the two sub parts of the search string. the first part is the tag and the second part is the number. Then use the submatches prperty to get the 2nd submatch. Note that your code for a signed integer: (\d+|([-+]\d+)) could be more simply expressed as ([-+]?\d+) Why did you choose to use alternation? --ron In order to get a proper match I have to use the \ character before the / characters in the Format function, like this sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/") Maybe this has something to do with the Regional and Language settings. I also have a question. What is the function of the two \b in the regexp? I get a match even without them. And the negative number is also matched. Lars-Åke I found the explanation to \b (word boundary) http://msdn.microsoft.com/en-us/library/ms974570.aspx Lars-Åke |
#19
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sat, 09 Jan 2010 23:47:45 +0100, Lars-Åke Aspelin
wrote: In order to get a proper match I have to use the \ character before the / characters in the Format function, like this sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/") Maybe this has something to do with the Regional and Language settings. I don't know. AFAIK, all characters except [\^$.|?*+() should get matched literally. Preceding a character (that has no special meaning) by a backslash ("\") merely represents a single instance of the second character. If you type, into the immediate window: ?CDate("1/1/2010") what is returned? I also have a question. What is the function of the two \b in the regexp? I get a match even without them. And the negative number is also matched. \b represents a word boundary, or, more specifically, it matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string (or line) if the first and/or last characters in the string (or line) are word characters. --ron |
#20
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Ron: have you ever looked at the 1970's unix manual volume 2b under the YACC topic. See this webpage. Look for the link for the PDF files and use the link : v7vol2b.pdf (819KB) Webpage '7th Edition Manual PDF' (http://plan9.bell-labs.com/7thEdMan/bswv7.html) pdf file http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf It is the only good description of pattern matching that I have ever seen. The question mark indicates any single character. "[-+]?\b\d+\b" There are a number of problems with the above search string 1) why do you care that a blank occurs immediately after the last digit. what happens if there is a return, tab or end of file 2) The match is looking for a positive or negative sign folowed by any character followed by a blank. Most people don't have a blank. \d+|([-+]\d+) It is where I found the saying If you can't bring Mohammed to the mountain, bring the mountain to Mohammed. -- joel ------------------------------------------------------------------------ joel's Profile: 229 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=168082 Microsoft Office Help |
#21
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sat, 09 Jan 2010 22:50:16 -0500, Ron Rosenfeld
wrote: On Sat, 09 Jan 2010 23:47:45 +0100, Lars-Åke Aspelin wrote: In order to get a proper match I have to use the \ character before the / characters in the Format function, like this sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/") Maybe this has something to do with the Regional and Language settings. I don't know. AFAIK, all characters except [\^$.|?*+() should get matched literally. Preceding a character (that has no special meaning) by a backslash ("\") merely represents a single instance of the second character. If you type, into the immediate window: ?CDate("1/1/2010") what is returned? I also have a question. What is the function of the two \b in the regexp? I get a match even without them. And the negative number is also matched. \b represents a word boundary, or, more specifically, it matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string (or line) if the first and/or last characters in the string (or line) are word characters. --ron If I write ?CDate("1/1/2010") in the immediate window the following is returned 2010-01-01 If I write ?Format(CDate("1/1/2010"), "/yyyy/m/d/") the following is returned -2010-1-1- Only if I write ?Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/" the wanted result /2010/1/1/ is achieved. So, with my settings the / character seems to generate a - . To have a / generated the / has to be escaped with \. I found the explanation in the Excel help for Format: "(/) Date separator. In some locales, other characters may be used to represent the date separator. The date separator separates the day, month, and year when date values are formatted. The actual character used as the date separator in formatted output is determined by your system settings." And in my settings - (hyphen) is used as the date separator, in accordance with ISO 8601 extended format http://en.wikipedia.org/wiki/ISO_8601 Lars-Åke |
#22
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
In message of Sun, 10 Jan 2010 04:55:10 in
microsoft.public.excel.programming, joel writes Ron: have you ever looked at the 1970's unix manual volume 2b under the YACC topic. See this webpage. Look for the link for the PDF files and use the link : v7vol2b.pdf (819KB) Webpage '7th Edition Manual PDF' (http://plan9.bell-labs.com/7thEdMan/bswv7.html) pdf file http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf It is the only good description of pattern matching that I have ever seen. The question mark indicates any single character. Joel, How do you conclude that? (I expect "." to match any single character within a line and ".|\s" to match any single character). I found this table on page 49/250 (Joel is probably looking elsewhere): Regular expressions in Lex use the following operators: .... x? an optional x. x* 0,1,2, ... instances of x. x+ 1,2,3, ... instances of x. x?y an x or a y. .... I am confused by that x?y. I think it means an optional x followed by a literal y. I think there may be a glyph confusion. I suspect it is intended to be x|y where the character - for which I know no name - between x and y means "or" as & - ampersand - means "and". The symbols used by Set RE = CreateObject("VBScript.RegExp") RE.Pattern = ... are a superset of the BRE described in <http://opengrou p.org/onlinepubs/007908775/xbd/re.html. My own view is that thinks like \d is an unnecessary shorthand for [0-9]. I concede it is reasonable if a range is not allowed in a character set. i.e. if [0123456789] is needed. In VBA, I refer to <http://msdn.microsoft.com/en- us/library/ms974570.aspx which specifies everything I want to know other than the meanings of the exceptions given. <http://msdn.microsoft.com/en-us/library/xe43cc8d%28VS.85%29.aspx specifies 5019, which I hit yesterday. ;) For my part, I would start with "-?\d+" to grab a whole number. i.e a whole number is a minus which is optional followed by a digit one or more times. I found the OP's situation too complicated to want to follow and offer a suggestion. -- Walter Briscoe |
#23
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sun, 10 Jan 2010 08:05:08 +0100, Lars-Åke Aspelin
wrote: On Sat, 09 Jan 2010 22:50:16 -0500, Ron Rosenfeld wrote: On Sat, 09 Jan 2010 23:47:45 +0100, Lars-Åke Aspelin wrote: In order to get a proper match I have to use the \ character before the / characters in the Format function, like this sURLdate = Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/") Maybe this has something to do with the Regional and Language settings. I don't know. AFAIK, all characters except [\^$.|?*+() should get matched literally. Preceding a character (that has no special meaning) by a backslash ("\") merely represents a single instance of the second character. If you type, into the immediate window: ?CDate("1/1/2010") what is returned? I also have a question. What is the function of the two \b in the regexp? I get a match even without them. And the negative number is also matched. \b represents a word boundary, or, more specifically, it matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string (or line) if the first and/or last characters in the string (or line) are word characters. --ron If I write ?CDate("1/1/2010") in the immediate window the following is returned 2010-01-01 If I write ?Format(CDate("1/1/2010"), "/yyyy/m/d/") the following is returned -2010-1-1- Only if I write ?Format(CDate("1/1/2010"), "\/yyyy\/m\/d\/" the wanted result /2010/1/1/ is achieved. So, with my settings the / character seems to generate a - . To have a / generated the / has to be escaped with \. I found the explanation in the Excel help for Format: "(/) Date separator. In some locales, other characters may be used to represent the date separator. The date separator separates the day, month, and year when date values are formatted. The actual character used as the date separator in formatted output is determined by your system settings." And in my settings - (hyphen) is used as the date separator, in accordance with ISO 8601 extended format http://en.wikipedia.org/wiki/ISO_8601 Lars-Åke Well, that is an interesting difference, and clearly related to Excel/VBA/Regional Settings and not to the regex engine per se, as I originally thought. When I place ?Cdate("1/1/2010") in the immediate window, it returns 1/1/2010. So it would seem that for code that will run properly, regardless of the Windows Regional settings, it might be best to use "\/yyyy\/m\/d" in the Format command to properly format the date string to match with the URL. Here in the US, I don't generally have to deal with international requirements. Thanks for pointing that out. --ron |
#24
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sun, 10 Jan 2010 04:55:10 +0000, joel wrote:
Ron: have you ever looked at the 1970's unix manual volume 2b under the YACC topic. See this webpage. Look for the link for the PDF files and use the link : v7vol2b.pdf (819KB) Webpage '7th Edition Manual PDF' (http://plan9.bell-labs.com/7thEdMan/bswv7.html) pdf file http://plan9.bell-labs.com/7thEdMan/v7vol2b.pdf It is the only good description of pattern matching that I have ever seen. The question mark indicates any single character. I have not looked at that reference. However, if that is what it says, it is wrong. The question mark "?": makes the preceding item optional. Greedy, so the optional item is included in the match if possible. ================================= Looking at the reference you provided, I see that you have misread it. It does *NOT* read, as you claim, that "the question mark indicates any single character". It states the almost the SAME definition as *I* posted above: "Optional Expressions . The operator ? indicates an optional element of an expression. Thus ab?c matches either ac or abc." =================================== There are some instances where "?" represents any single character, but it does NOT do this with regard to regular expressions. Perhaps that is where you got confused. There are a number of problems with the above search string 1) why do you care that a blank occurs immediately after the last digit. what happens if there is a return, tab or end of file I don't know what you mean by a "blank". Perhaps you are confusing the use of the "\b" token? Check that definition (which I previously posted). It is also does NOT seem to be part of the LEX regular expression flavor which you are quoting. Rather, in LEX, it would mean a backspace; but that is not the case with more commonly used flavors. 2) The match is looking for a positive or negative sign folowed by any character followed by a blank. Most people don't have a blank. No it is not. See above. You are misinterpreting "\b".\ \d+|([-+]\d+) It is where I found the saying If you can't bring Mohammed to the mountain, bring the mountain to Mohammed. If Mohammed goes to the wrong mountain, he may get the wrong information. You might try some of these "mountains", if the links are still valid: Regular Expressions http://www.regular-expressions.info/reference.html http://support.microsoft.com/default...02&Product=vbb http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx http://msdn2.microsoft.com/en-us/library/ms974619.aspx http://www.regex-guru.info/ --ron |
#25
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
On Sun, 10 Jan 2010 07:15:53 +0000, Walter Briscoe
wrote: For my part, I would start with "-?\d+" to grab a whole number. i.e a whole number is a minus which is optional followed by a digit one or more times. I found the OP's situation too complicated to want to follow and offer a suggestion. I agree with a lot of what you wrote. I would point out that the above construct will also extract whole numbers that are embedded within other strings. That may or may not be desirable. In other words, it can extract both integers from the string below: abc123abc 123 --ron |
#26
Posted to microsoft.public.excel.programming
|
|||
|
|||
Regular Expression Help on syntax
Ron: I don't think in this case you care if you get multiple number using your expression. Assume this is a webpage (html). The requirement is to return the number associated with a tag item in a large documents where you don't know the number of tags in the document. Some tags may occur more than once and you may need to get the number that is associated with the 5th time the tag appeared. Walter Briscoe: There are utilities in the Unix manual where the ? means a single character. In YACC expresions it meas an optional character. So you can use [-+]?\d+ Lars-Åke Aspelin: The backslash before for special characters comes from Bell Labs (the Unix manuals). The scientist at Bell Labs developed a lot of the standards that are used today for regular expressions. the reasearch was needed to develope algorithms that cold be used to find peole names in phone books and to develope softwae compilers for programming languages. both tasks the phone company was interested in doing and used similar algorithms. Computers in the 1970's were expensive to operate and developing efficient fast algortihms that use minimum amount of memory (memory very expensive). Companiers today don't have the same concerns. The people are more expensive than the hardware which is fast and inexpensive so algorithms are developed as quickly as possible with lots of errors that sometimes gets fixed if enough people complain. -- joel ------------------------------------------------------------------------ joel's Profile: 229 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=168082 Microsoft Office Help |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Can someone help me with this regular expression? | Excel Discussion (Misc queries) | |||
Regular Expression Conditionals (?(if)then|else) in VBA? | Excel Programming | |||
Help with regular expression | Excel Programming | |||
Regular Expression | Excel Discussion (Misc queries) | |||
Regular expression searching problem | Excel Programming |