Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I have an excel sheet with strings in the cells in column C. In the
corresponding cell in column D, I want words of the type "TKCDT" or "TKR25" or "02587" or "25YHT" ie words having capital letters and/or numbers to be displayed. It does not matter whether this is accomplished using a fucnction or vba. Thanks in advance for all the help. Rajendra |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I think you are going to have to provide a little more detail of what you
want to do for us. What kind of string values do you have in column C and what link is there between those strings and the "words" you want to place in column D? For example, what in column C dictates that TKCDT is put on column D? Same question for TKR25 or 02587 etc. Rick "Raj" wrote in message ... I have an excel sheet with strings in the cells in column C. In the corresponding cell in column D, I want words of the type "TKCDT" or "TKR25" or "02587" or "25YHT" ie words having capital letters and/or numbers to be displayed. It does not matter whether this is accomplished using a fucnction or vba. Thanks in advance for all the help. Rajendra |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Nov 29, 9:25 pm, "Rick Rothstein \(MVP - VB\)"
wrote: I think you are going to have to provide a little more detail of what you want to do for us. What kind of string values do you have in column C and what link is there between those strings and the "words" you want to place in column D? For example, what in column C dictates that TKCDT is put on column D? Same question for TKR25 or 02587 etc. Rick "Raj" wrote in message ... I have an excel sheet with strings in the cells in column C. In the corresponding cell in column D, I want words of the type "TKCDT" or "TKR25" or "02587" or "25YHT" ie words having capital letters and/or numbers to be displayed. It does not matter whether this is accomplished using a fucnction or vba. Thanks in advance for all the help. Rajendra- Hide quoted text - - Show quoted text - Thanks Rick. Cell C2 (and other cells in column C) have sentences like "The TKRTC value in the 765TW field is not the default.". In cell D2(corresponding cell in column D), I want TKRTC and 765TW to be extracted ie words made up of capital letters and/or numbers in the sentence in cell C2. And repeat the same for all cells in column C. (Maybe I was wrong in using "string" in place of sentence in my previous post). Thanks in advance. |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I think this macro will do what you want. Right click the sheet tab where
your data is, select View Code from the popup menu that appears and copy/paste the macro below into the code window that appears. Run it from either the VBA editor environment or click Alt+F8 from the sheet where your data is and run it from there. Rick Sub GetCapWords() Dim I As Long Dim J As Long Dim Result As String Dim Words() As String Const FirstDataRow = 2 Const DataColumn = 3 'Column C Const CopyColumn = 4 'Column D For I = FirstDataRow To Cells(Rows.Count, DataColumn).End(xlUp).Row Words = Split(Cells(I, DataColumn).Value, " ") For J = 0 To UBound(Words) If Words(J) = UCase(Words(J)) Then Result = Result & ", " & Words(J) End If Next If Len(Result) 0 Then Result = Mid(Result, 3) End If Cells(I, CopyColumn).Value = Result Result = "" Next End Sub "Raj" wrote in message ... On Nov 29, 9:25 pm, "Rick Rothstein \(MVP - VB\)" wrote: I think you are going to have to provide a little more detail of what you want to do for us. What kind of string values do you have in column C and what link is there between those strings and the "words" you want to place in column D? For example, what in column C dictates that TKCDT is put on column D? Same question for TKR25 or 02587 etc. Rick "Raj" wrote in message ... I have an excel sheet with strings in the cells in column C. In the corresponding cell in column D, I want words of the type "TKCDT" or "TKR25" or "02587" or "25YHT" ie words having capital letters and/or numbers to be displayed. It does not matter whether this is accomplished using a fucnction or vba. Thanks in advance for all the help. Rajendra- Hide quoted text - - Show quoted text - Thanks Rick. Cell C2 (and other cells in column C) have sentences like "The TKRTC value in the 765TW field is not the default.". In cell D2(corresponding cell in column D), I want TKRTC and 765TW to be extracted ie words made up of capital letters and/or numbers in the sentence in cell C2. And repeat the same for all cells in column C. (Maybe I was wrong in using "string" in place of sentence in my previous post). Thanks in advance. |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Thu, 29 Nov 2007 07:47:27 -0800 (PST), Raj wrote:
I have an excel sheet with strings in the cells in column C. In the corresponding cell in column D, I want words of the type "TKCDT" or "TKR25" or "02587" or "25YHT" ie words having capital letters and/or numbers to be displayed. It does not matter whether this is accomplished using a fucnction or vba. Thanks in advance for all the help. Rajendra Here is a UDF that will return the words specified, separated by a <space. If you want a different separator, or none, the UDF can be modified to allow that. To enter it, <alt-F11 opens the VB Editor. Ensure your project is highlighted in the Project Explorer window, then Insert/Module and paste the code below into the window that opens. To use this, enter the formula =reSpecial(cell_ref) in the cell in column D where you want the results. cell_ref is the corresponding cell in column C. ==================================== Option Explicit Function reSpecial(str As String) As String Dim re As Object, mc As Object Dim I As Long Const sPat As String = "\b[A-Z0-9]+\b" Set re = CreateObject("vbscript.regexp") re.Global = True re.Pattern = sPat re.ignorecase = False If re.test(str) = True Then Set mc = re.Execute(str) For I = 0 To mc.Count - 1 reSpecial = reSpecial & mc(I) & " " '<space as separator Next I reSpecial = Trim(reSpecial) End If End Function ======================================= --ron |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Ron Rosenfeld" wrote in message
... On Thu, 29 Nov 2007 07:47:27 -0800 (PST), Raj wrote: I have an excel sheet with strings in the cells in column C. In the corresponding cell in column D, I want words of the type "TKCDT" or "TKR25" or "02587" or "25YHT" ie words having capital letters and/or numbers to be displayed. It does not matter whether this is accomplished using a fucnction or vba. Thanks in advance for all the help. Rajendra Here is a UDF that will return the words specified, separated by a <space. If you want a different separator, or none, the UDF can be modified to allow that. To enter it, <alt-F11 opens the VB Editor. Ensure your project is highlighted in the Project Explorer window, then Insert/Module and paste the code below into the window that opens. To use this, enter the formula =reSpecial(cell_ref) in the cell in column D where you want the results. cell_ref is the corresponding cell in column C. ==================================== Option Explicit Function reSpecial(str As String) As String Dim re As Object, mc As Object Dim I As Long Const sPat As String = "\b[A-Z0-9]+\b" Set re = CreateObject("vbscript.regexp") re.Global = True re.Pattern = sPat re.ignorecase = False If re.test(str) = True Then Set mc = re.Execute(str) For I = 0 To mc.Count - 1 reSpecial = reSpecial & mc(I) & " " '<space as separator Next I reSpecial = Trim(reSpecial) End If End Function ======================================= --ron For those who might be interested in a non-RegEx solution, here is my offering for a UDF (also using a space delimiter between found words)... Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Words = Split(R, " ") For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) Then GetCapWords = GetCapWords & " " & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid(GetCapWords, 3) End If End Function What surprises me a little, Ron, is that it looks shorter than the RegEx solution... not what I would have expected. Rick |
#7
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Thu, 29 Nov 2007 15:39:15 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: What surprises me a little, Ron, is that it looks shorter than the RegEx solution... not what I would have expected. Well, here's a shorter, possibly better optimized version: =============================== Function reSpecial(str As String) As String Dim re As Object, mc As Object, m As Object Set re = CreateObject("vbscript.regexp") re.Global = True re.Pattern = "\b[A-Z0-9]+\b" Set mc = re.Execute(str) For Each m In mc reSpecial = reSpecial & m & " " Next m reSpecial = Trim(reSpecial) End Function ====================================== But I think the big advantage is in development time. If the OP wants to change a parameter, it's probably just going to involve changing the regex. --ron |
#8
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Thu, 29 Nov 2007 15:39:15 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Words = Split(R, " ") For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) Then GetCapWords = GetCapWords & " " & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid(GetCapWords, 3) End If End Function What surprises me a little, Ron, is that it looks shorter than the RegEx solution... not what I would have expected. Rick, Given this string: The TKRTC value in the 765TW field is not the default Your routine, on my machine, returns: KRTC 765TW (The initial T is missing) Also, given a multiline string: The TKRTC value in the 765TW field is not the default (with no space after TKRTC), your routine returns: 65TW missing TKRTC and also again missing the first character. --ron |
#9
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
As for the missing first letter... that's what happens when you rush. My
brother is due any minute for dinner, so I had to get ready... I made a last minute change from my original delimiter of a comma-space to just the space that you used and I forgot to adjust the code for the length difference. I change my code to include a Delimiter constant (set to equal a space; but changeable to any combination of characters) and put code in to account for it. As for the multi-line problem... I never even gave it a thought originally. Below is a revised function that handles both problems (thanks for catching them)... Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Const Delimiter = " " Words = Split(Replace(Replace(R.Value, vbLf, " "), vbCr, " ")) For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) Then GetCapWords = GetCapWords & Delimiter & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid(GetCapWords, 1 + Len(Delimiter)) End If End Function Rick "Ron Rosenfeld" wrote in message ... On Thu, 29 Nov 2007 15:39:15 -0500, "Rick Rothstein \(MVP - VB\)" wrote: Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Words = Split(R, " ") For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) Then GetCapWords = GetCapWords & " " & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid(GetCapWords, 3) End If End Function What surprises me a little, Ron, is that it looks shorter than the RegEx solution... not what I would have expected. Rick, Given this string: The TKRTC value in the 765TW field is not the default Your routine, on my machine, returns: KRTC 765TW (The initial T is missing) Also, given a multiline string: The TKRTC value in the 765TW field is not the default (with no space after TKRTC), your routine returns: 65TW missing TKRTC and also again missing the first character. --ron |
#10
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Thu, 29 Nov 2007 17:39:14 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: As for the missing first letter... that's what happens when you rush. My brother is due any minute for dinner, so I had to get ready... I made a last minute change from my original delimiter of a comma-space to just the space that you used and I forgot to adjust the code for the length difference. I change my code to include a Delimiter constant (set to equal a space; but changeable to any combination of characters) and put code in to account for it. As for the multi-line problem... I never even gave it a thought originally. Below is a revised function that handles both problems (thanks for catching them)... Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Const Delimiter = " " Words = Split(Replace(Replace(R.Value, vbLf, " "), vbCr, " ")) For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) Then GetCapWords = GetCapWords & Delimiter & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid(GetCapWords, 1 + Len(Delimiter)) End If End Function Rick You're getting there. Still a problem with returning extra spaces -- if there are multiple spaces after the CapWord, your routine returns them, also. It seems that what is happening is that if there are multiple, sequential spaces, the Split function returns a null string ("") for the spaces after the first. Since "" = Ucase(""), the null string gets concatenated, followed by your Delimiter. Enjoy your dinner! We just had a wonderful one. My mother-in-law and niece are visiting from the Azores, and they go home tomorrow. My wife and niece made a great "going away" meal, and some of our kids were over, too. --ron |
#11
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
You're getting there. Still a problem with returning extra spaces -- if
there are multiple spaces after the CapWord, your routine returns them, also. It seems that what is happening is that if there are multiple, sequential spaces, the Split function returns a null string ("") for the spaces after the first. Since "" = Ucase(""), the null string gets concatenated, followed by your Delimiter. Yes, the blanks do cause a problem because the test I was doing was too simple. All I checked for was that the word equaled its capitalization. Well, as it turns out, an empty string equals the empty string with the UCase function applied and so it passed the test. Actually, in examining this problem, another problem was highlighted... if there is stand-alone non-letters, these too passed my simple test; hence, something like "(*)" was added to the return value. This part is easy enough to fix and the code to do that is shown after my signature below. However, there is another difference between our codes and I am not sure which of our treatments should be considered the "correct" one. If the string of text is this, for example... "One (TWO) Three" my routine returns (TWO) with the parentheses and your routine returns just TWO. The OP said in his Subject line as well as the body of his first posting that the items he wanted to find were "words"... so, should the surrounding parentheses be retained as I do (because they are attached to the upper case letters and/or digits) or removed as you do (even though they are attached)? Or are we both wrong and the "word" should be rejected altogether because it is not composed of just upper case letters and/or digits? We will need input from the OP on this issue. Now, while testing the above, I came across a problem with your function. If there are any internal non-letters/non-digits, your regular expression appears to treat them as blanks and separates the word into parts, using a blank, for each such character. So, for this sentence... "One TWENTY-ONE Three" your routine returns TWENTY ONE with a space in the middle. Your code does a similar thing with something like... "Model #AB(12)CD" returning AB CD EF with two internal blanks. Enjoy your dinner! Thank you... I did. We just had a wonderful one. My mother-in-law and niece are visiting from the Azores, and they go home tomorrow. My wife and niece made a great "going away" meal, and some of our kids were over, too. Yes, that sounds like it was nice too. I glad you had such a nice time. Rick 'The repaired function '============================== Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Const Delimiter = " " Words = Split(Replace(Replace(R.Value, vbLf, " "), vbCr, " ")) For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) And Words(X) Like "*[A-Z]*" Then GetCapWords = GetCapWords & Delimiter & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid(GetCapWords, 1 + Len(Delimiter)) End If End Function |
#12
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Nov 30, 8:39 am, "Rick Rothstein \(MVP - VB\)"
wrote: You're getting there. Still a problem with returning extra spaces -- if there are multiple spaces after the CapWord, your routine returns them, also. It seems that what is happening is that if there are multiple, sequential spaces, the Split function returns a null string ("") for the spaces after the first. Since "" = Ucase(""), the null string gets concatenated, followed by your Delimiter. Yes, the blanks do cause a problem because the test I was doing was too simple. All I checked for was that the word equaled its capitalization. Well, as it turns out, an empty string equals the empty string with the UCase function applied and so it passed the test. Actually, in examining this problem, another problem was highlighted... if there is stand-alone non-letters, these too passed my simple test; hence, something like "(*)" was added to the return value. This part is easy enough to fix and the code to do that is shown after my signature below. However, there is another difference between our codes and I am not sure which of our treatments should be considered the "correct" one. If the string of text is this, for example... "One (TWO) Three" my routine returns (TWO) with the parentheses and your routine returns just TWO. The OP said in his Subject line as well as the body of his first posting that the items he wanted to find were "words"... so, should the surrounding parentheses be retained as I do (because they are attached to the upper case letters and/or digits) or removed as you do (even though they are attached)? Or are we both wrong and the "word" should be rejected altogether because it is not composed of just upper case letters and/or digits? We will need input from the OP on this issue. Now, while testing the above, I came across a problem with your function. If there are any internal non-letters/non-digits, your regular expression appears to treat them as blanks and separates the word into parts, using a blank, for each such character. So, for this sentence... "One TWENTY-ONE Three" your routine returns TWENTY ONE with a space in the middle. Your code does a similar thing with something like... "Model #AB(12)CD" returning AB CD EF with two internal blanks. Enjoy your dinner! Thank you... I did. We just had a wonderful one. My mother-in-law and niece are visiting from the Azores, and they go home tomorrow. My wife and niece made a great "going away" meal, and some of our kids were over, too. Yes, that sounds like it was nice too. I glad you had such a nice time. Rick 'The repaired function '============================== Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Const Delimiter = " " Words = Split(Replace(Replace(R.Value, vbLf, " "), vbCr, " ")) For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) And Words(X) Like "*[A-Z]*" Then GetCapWords = GetCapWords & Delimiter & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid(GetCapWords, 1 + Len(Delimiter)) End If End Function Thanks Rick and Ron for all the effort. In column C is the sentence: " His code is CND8599, and pin is 2588." In column D the function GetCapWords is extracting "CND8599," but not extracting "2588" In column E the function reSpecial is extracting "CND8599 2588" Observations: reSpecial meets my requirements as it extracts Words Capitals and/or Numbers. An additional bonus(meeting my unstated requirement) is that it ignores the punctuation marks ie "," at the end of "CND8599". I would still love to see Rick's non regexp solution with necessary changes for displaying words containing only numbers and also ignoring punctuations at the end of words (if possible) Thanks once again Raj |
#13
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I would still love to see Rick's non regexp solution with necessary
changes for displaying words containing only numbers and also ignoring punctuations at the end of words (if possible) I think this will work... Function GetCapWords(R As Range) As String Dim X As Long Dim Words() As String Const Delimiter = " " Words = Split(Replace(Replace(R.Value, vbLf, " "), vbCr, " ")) For X = 0 To UBound(Words) If Words(X) = UCase(Words(X)) And Words(X) Like "*[A-Z0-9]*" Then GetCapWords = GetCapWords & Delimiter & Words(X) End If Next If Len(GetCapWords) 0 Then GetCapWords = Mid$(GetCapWords, 1 + Len(Delimiter)) End If For X = 1 To Len(GetCapWords) If Mid$(GetCapWords, X, 1) Like "[!A-Z0-9 ]" Then Mid$(GetCapWords, X, 1) = "`" End If Next GetCapWords = Replace(GetCapWords, "`", "") End Function Rick |
#14
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Thu, 29 Nov 2007 22:39:42 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: You're getting there. Still a problem with returning extra spaces -- if there are multiple spaces after the CapWord, your routine returns them, also. It seems that what is happening is that if there are multiple, sequential spaces, the Split function returns a null string ("") for the spaces after the first. Since "" = Ucase(""), the null string gets concatenated, followed by your Delimiter. Yes, the blanks do cause a problem because the test I was doing was too simple. All I checked for was that the word equaled its capitalization. Well, as it turns out, an empty string equals the empty string with the UCase function applied and so it passed the test. Actually, in examining this problem, another problem was highlighted... if there is stand-alone non-letters, these too passed my simple test; hence, something like "(*)" was added to the return value. This part is easy enough to fix and the code to do that is shown after my signature below. However, there is another difference between our codes and I am not sure which of our treatments should be considered the "correct" one. If the string of text is this, for example... "One (TWO) Three" my routine returns (TWO) with the parentheses and your routine returns just TWO. The OP said in his Subject line as well as the body of his first posting that the items he wanted to find were "words"... so, should the surrounding parentheses be retained as I do (because they are attached to the upper case letters and/or digits) or removed as you do (even though they are attached)? Or are we both wrong and the "word" should be rejected altogether because it is not composed of just upper case letters and/or digits? We will need input from the OP on this issue. Now, while testing the above, I came across a problem with your function. If there are any internal non-letters/non-digits, your regular expression appears to treat them as blanks and separates the word into parts, using a blank, for each such character. So, for this sentence... "One TWENTY-ONE Three" your routine returns TWENTY ONE with a space in the middle. Your code does a similar thing with something like... "Model #AB(12)CD" returning AB CD EF with two internal blanks. Those results depend on the definition of a "word containing caps and letters" My interpretation was that special characters are not in the class of caps and letters. But if the OP wanted to include them, it would be fairly simple to modify the regex to do so. Also, for word boundaries, I just used the regex definition which is the point between a word character (letter, digit, underscore) and a non-word character. Again, this is something that can be modified merely by changing the regex, if required by the OP. The rest of the VBA code does not need to be changed at all. --ron |
#15
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Thu, 29 Nov 2007 20:34:30 -0800 (PST), Raj wrote:
reSpecial meets my requirements as it extracts Words Capitals and/or Numbers. An additional bonus(meeting my unstated requirement) is that it ignores the punctuation marks ie "," at the end of "CND8599". Raj, I'm glad it meets your requirements. And I would not consider that omitting the punctuation was an "unstated" requirement. You did state you wanted "words" returned and that they were defined as consisting only of Capital letters and Numbers. So punctuation would not have been in the class. What you did leave unstated was what kind of delimiter you wanted between the words. I assumed a <space -- that turned out to be correct. In any event, I was happy to help, and I'm glad it meets your requirements. IT also gave me an opportunity to clean up some of the surrounding VBA code. --ron |
#16
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Also, for word boundaries, I just used the regex definition which is the
point between a word character (letter, digit, underscore) and a non-word character. I just looked back at your code and realized that the regex "language" that I knew must have changed over the years or, more likely, having been a UNIX based implementation, simply differs in construction. I pulled some old notes I had laying around from my "old days" and the \b which you used (for a blank space I am guessing) used to me the backspace character to me "back in the day". We had a \s character which stood for "white space" which were the blank space (obviously<g), horizontal tabs, vertical tabs (line feeds) and carriage returns. I think, but am not sure, that other characters stood for themselves (unless they were a meta-character like the backslash in which they were "escaped" by preceding them with a backslash) and, again I think, that the words were separated by white space only. As I said earlier, I've forgotten almost everything I knew about regular expressions because I haven't touched them for 20+ years; however, I'm getting the impression they have evolved over the years and "work" differently than how they did back in the mid-1980s when I worked with them. Rick |
#17
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Fri, 30 Nov 2007 12:43:40 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: Also, for word boundaries, I just used the regex definition which is the point between a word character (letter, digit, underscore) and a non-word character. I just looked back at your code and realized that the regex "language" that I knew must have changed over the years or, more likely, having been a UNIX based implementation, simply differs in construction. I pulled some old notes I had laying around from my "old days" and the \b which you used (for a blank space I am guessing) used to me the backspace character to me "back in the day". We had a \s character which stood for "white space" which were the blank space (obviously<g), horizontal tabs, vertical tabs (line feeds) and carriage returns. I think, but am not sure, that other characters stood for themselves (unless they were a meta-character like the backslash in which they were "escaped" by preceding them with a backslash) and, again I think, that the words were separated by white space only. As I said earlier, I've forgotten almost everything I knew about regular expressions because I haven't touched them for 20+ years; however, I'm getting the impression they have evolved over the years and "work" differently than how they did back in the mid-1980s when I worked with them. Rick I didn't work with regular expressions back then so can't comment. But in the usage with which I am familiar, \b only stands for a backspace if it is within a character class e.g. [\b] would be a backspace. \b does NOT stand for a blank space but rather for a "word boundary" which is defined as the junction of \w and \W as well as at the start and/or end of the string if the first and/or last characters in the string are word characters. \w stands for any word character, which is further defined as in the class: [A-Za-z0-9_] (note the underscore) \W is the negation of \w (any character NOT in the class) Also, \d stands for any digit, and \D stands for a non-digit. \s is the same as your recollection. And \S is any character not in the class of \s. And also there are a number of different flavors, with subtle (and not so subtle) rule differences. I restrict myself to VBScript (which is the same as Javascript). So the regex I used could also be written: "\b[A-Z\d]+\b" and explained as: "\b" & ' Assert position at a word boundary "[A-Z\d]" & ' Match a single character present in the list below ' A character in the range between “A” and “Z” ' A single digit 0..9 "+" & ' Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\b" ' Assert position at a word boundary Best, --ron |
#18
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]() "Ron Rosenfeld" wrote in message ... On Fri, 30 Nov 2007 12:43:40 -0500, "Rick Rothstein \(MVP - VB\)" wrote: Also, for word boundaries, I just used the regex definition which is the point between a word character (letter, digit, underscore) and a non-word character. I just looked back at your code and realized that the regex "language" that I knew must have changed over the years or, more likely, having been a UNIX based implementation, simply differs in construction. I pulled some old notes I had laying around from my "old days" and the \b which you used (for a blank space I am guessing) used to me the backspace character to me "back in the day". We had a \s character which stood for "white space" which were the blank space (obviously<g), horizontal tabs, vertical tabs (line feeds) and carriage returns. I think, but am not sure, that other characters stood for themselves (unless they were a meta-character like the backslash in which they were "escaped" by preceding them with a backslash) and, again I think, that the words were separated by white space only. As I said earlier, I've forgotten almost everything I knew about regular expressions because I haven't touched them for 20+ years; however, I'm getting the impression they have evolved over the years and "work" differently than how they did back in the mid-1980s when I worked with them. Rick I didn't work with regular expressions back then so can't comment. But in the usage with which I am familiar, \b only stands for a backspace if it is within a character class e.g. [\b] would be a backspace. \b does NOT stand for a blank space but rather for a "word boundary" which is defined as the junction of \w and \W as well as at the start and/or end of the string if the first and/or last characters in the string are word characters. \w stands for any word character, which is further defined as in the class: [A-Za-z0-9_] (note the underscore) \W is the negation of \w (any character NOT in the class) Also, \d stands for any digit, and \D stands for a non-digit. \s is the same as your recollection. And \S is any character not in the class of \s. And also there are a number of different flavors, with subtle (and not so subtle) rule differences. I restrict myself to VBScript (which is the same as Javascript). So the regex I used could also be written: "\b[A-Z\d]+\b" and explained as: "\b" & ' Assert position at a word boundary "[A-Z\d]" & ' Match a single character present in the list below ' A character in the range between "A" and "Z" ' A single digit 0..9 "+" & ' Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\b" ' Assert position at a word boundary Best, Thanks for the mini-lesson. Some of what you posted looks familiar, but I am afraid I've been away from it for far too long to be sure of anything any more. I do remember our UNIX clone was named CLIX and a company named Intergraph (which made our CADD workstations and software) was the vendor. I also remember doing most of my work in something called the Bourne Shell and the scripting language I used mostly was called awk (but I am afraid I remember very little about it either). Too many years have passed, I'm getting old, brain cells are dying...<g Rick |
#19
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Sat, 1 Dec 2007 00:32:31 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: Thanks for the mini-lesson. Some of what you posted looks familiar, but I am afraid I've been away from it for far too long to be sure of anything any more. I do remember our UNIX clone was named CLIX and a company named Intergraph (which made our CADD workstations and software) was the vendor. I also remember doing most of my work in something called the Bourne Shell and the scripting language I used mostly was called awk (but I am afraid I remember very little about it either). Too many years have passed, I'm getting old, brain cells are dying...<g Well, the ones that remain seem to be well tuned in to writing concise VBA code! I'm really a novice with regular expressions. Harlan Grove got me interested in them within the past year, and I've found them to be a much easier method of dealing with all kinds of text manipulations, expecially after getting used to the vagaries of VB Script. --ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Changing grey column headings from numbers back to capital letters | Excel Discussion (Misc queries) | |||
how to change small letters to capital letters | Excel Discussion (Misc queries) | |||
how do i turn all letters into capital letters? | Excel Discussion (Misc queries) | |||
Finding most common occurence of values in cells containing letters and numbers | Excel Worksheet Functions | |||
Finding Numbers with Cells that also contain letters | Excel Discussion (Misc queries) |