Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
I just had some really great help with some Excel files from this group where
the formatting was done by various people over the last 15-20 years. These are names for genealogy, and I am loading many Excel files into Access so people can search by name and soundex. I don't know if the other problem I am dealing with can be handled programatically, but some of the people in the past entered extra spaces between the first and middle names. I am going thru the files manually now and removing the extra space. I have more files to do and thought it was worth asking if a macro could do this. Here are some examples: John William (I change to John William by using delete key) Doris Doe Smith (I change to Doris Doe Smith) John W. ( change to John W.) I found many posts on this site that talks about Trim, but could not find anything that specifically addressed just removing extra space within text. Thanks in advance if there is any help for me, Sherry |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Given that you indicated you used Ron Rosenfeld's routine in your earlier
thread, I modified that to add the additional functionality... Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" txt = re.Replace(c.Text, "$1.") re.Pattern = "\s{2,}" c.Value = re.Replace(txt, " ") Next c End Sub Now a word of caution... this is my first regular expression construction in some 15 years, so I am rusty. What I posted works, but I can't guarantee it is the most efficient construction. So, check back here to see if Ron posts a better constructed regular expression solution than this one. Rick "SherryScrapDog" wrote in message ... I just had some really great help with some Excel files from this group where the formatting was done by various people over the last 15-20 years. These are names for genealogy, and I am loading many Excel files into Access so people can search by name and soundex. I don't know if the other problem I am dealing with can be handled programatically, but some of the people in the past entered extra spaces between the first and middle names. I am going thru the files manually now and removing the extra space. I have more files to do and thought it was worth asking if a macro could do this. Here are some examples: John William (I change to John William by using delete key) Doris Doe Smith (I change to Doris Doe Smith) John W. ( change to John W.) I found many posts on this site that talks about Trim, but could not find anything that specifically addressed just removing extra space within text. Thanks in advance if there is any help for me, Sherry |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
This might actually be better... it also removes all leading and trailing
spaces, if any... Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" txt = re.Replace(c.Text, "$1.") re.Pattern = "\s{2,}" c.Value = re.Replace(Trim(txt), " ") Next c End Sub Rick "Rick Rothstein (MVP - VB)" wrote in message ... Given that you indicated you used Ron Rosenfeld's routine in your earlier thread, I modified that to add the additional functionality... Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" txt = re.Replace(c.Text, "$1.") re.Pattern = "\s{2,}" c.Value = re.Replace(txt, " ") Next c End Sub Now a word of caution... this is my first regular expression construction in some 15 years, so I am rusty. What I posted works, but I can't guarantee it is the most efficient construction. So, check back here to see if Ron posts a better constructed regular expression solution than this one. Rick "SherryScrapDog" wrote in message ... I just had some really great help with some Excel files from this group where the formatting was done by various people over the last 15-20 years. These are names for genealogy, and I am loading many Excel files into Access so people can search by name and soundex. I don't know if the other problem I am dealing with can be handled programatically, but some of the people in the past entered extra spaces between the first and middle names. I am going thru the files manually now and removing the extra space. I have more files to do and thought it was worth asking if a macro could do this. Here are some examples: John William (I change to John William by using delete key) Doris Doe Smith (I change to Doris Doe Smith) John W. ( change to John W.) I found many posts on this site that talks about Trim, but could not find anything that specifically addressed just removing extra space within text. Thanks in advance if there is any help for me, Sherry |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Rick, Thanks tons! It works so well I wish I had asked this before. You
have saved me hours and hours of work. This is a volunteer project for me and I just love working on it, but as you can probably imagine, it gets boring going thru these files cell by cell and deleting spaces and adding periods. My goal is to give the Society a database with data as clean as possible. Thanks, thanks and thanks again, Sherry (I will check Ron's post if he does respond.) "Rick Rothstein (MVP - VB)" wrote: Given that you indicated you used Ron Rosenfeld's routine in your earlier thread, I modified that to add the additional functionality... Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" txt = re.Replace(c.Text, "$1.") re.Pattern = "\s{2,}" c.Value = re.Replace(txt, " ") Next c End Sub Now a word of caution... this is my first regular expression construction in some 15 years, so I am rusty. What I posted works, but I can't guarantee it is the most efficient construction. So, check back here to see if Ron posts a better constructed regular expression solution than this one. Rick "SherryScrapDog" wrote in message ... I just had some really great help with some Excel files from this group where the formatting was done by various people over the last 15-20 years. These are names for genealogy, and I am loading many Excel files into Access so people can search by name and soundex. I don't know if the other problem I am dealing with can be handled programatically, but some of the people in the past entered extra spaces between the first and middle names. I am going thru the files manually now and removing the extra space. I have more files to do and thought it was worth asking if a macro could do this. Here are some examples: John William (I change to John William by using delete key) Doris Doe Smith (I change to Doris Doe Smith) John W. ( change to John W.) I found many posts on this site that talks about Trim, but could not find anything that specifically addressed just removing extra space within text. Thanks in advance if there is any help for me, Sherry |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Worksheet & VB/VBA Trim functions are slightly different, inasmuch the
Worksheet function replaces any multiple spaces in the middle of text with a single space. Your sample text as posted threw me at first (when pasted directly into a cell), but try this sTmp = Replace(sOrig, Chr(160), " ") sOut = Application.WorksheetFunction.Trim(sTmp) where sOrig contains your text. As a cell formula - =TRIM(SUBSTITUTE(A1,CHAR(160)," ")) Regards, Peter T "SherryScrapDog" wrote in message ... I just had some really great help with some Excel files from this group where the formatting was done by various people over the last 15-20 years. These are names for genealogy, and I am loading many Excel files into Access so people can search by name and soundex. I don't know if the other problem I am dealing with can be handled programatically, but some of the people in the past entered extra spaces between the first and middle names. I am going thru the files manually now and removing the extra space. I have more files to do and thought it was worth asking if a macro could do this. Here are some examples: John William (I change to John William by using delete key) Doris Doe Smith (I change to Doris Doe Smith) John W. ( change to John W.) I found many posts on this site that talks about Trim, but could not find anything that specifically addressed just removing extra space within text. Thanks in advance if there is any help for me, Sherry |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
On Sun, 23 Dec 2007 22:31:49 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: This might actually be better... it also removes all leading and trailing spaces, if any... Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" txt = re.Replace(c.Text, "$1.") re.Pattern = "\s{2,}" c.Value = re.Replace(Trim(txt), " ") Next c End Sub A bit shorter: Option Explicit Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" c.Value = Application.WorksheetFunction.Trim _ (re.Replace(c.Text, "$1.")) Next c End Sub --ron |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Rick,
Thanks for this suggestion too. This is a problem I have in these files too, and I have a query in the Access database where I load the files to take the spaces out. I might as well do it in the Excel files though and eliminate one of the queries I do to load the files! Many thanks! Sherry "Rick Rothstein (MVP - VB)" wrote: This might actually be better... it also removes all leading and trailing spaces, if any... Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" txt = re.Replace(c.Text, "$1.") re.Pattern = "\s{2,}" c.Value = re.Replace(Trim(txt), " ") Next c End Sub Rick |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
On Sun, 23 Dec 2007 22:20:59 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: Now a word of caution... this is my first regular expression construction in some 15 years, so I am rusty. What I posted works, but I can't guarantee it is the most efficient construction. So, check back here to see if Ron posts a better constructed regular expression solution than this one. It works fine. And it is certainly possible to devise one regular expression that will also remove leading and trailing spaces. But I think it is simpler to just use the TRIM "worksheet" function (not the VBA function) to remove leading, trailing and doubled spaces in one swoop. But, you could run a third regex with pattern: ^\s+|\s+$ to remove leading and trailing white space characters. Or, if you only wanted to remove <space and <tab characters: "^[ \t]+|[ \t]+$" (note the <space at the start of the character class.) --ron |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Another idea might be to move the Pattern outside the loop since it's a
constant. Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("VbScript.RegExp") re.IgnoreCase = True re.Global = True re.Pattern = "\b([A-Z])\b(?!\.)" With WorksheetFunction For Each c In Selection.Cells c.Value = .Trim(re.Replace(c.Text, "$1.")) Next c End With End Sub -- Dana DeLouis "Ron Rosenfeld" wrote in message ... On Sun, 23 Dec 2007 22:31:49 -0500, "Rick Rothstein \(MVP - VB\)" wrote: This might actually be better... it also removes all leading and trailing spaces, if any... Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" txt = re.Replace(c.Text, "$1.") re.Pattern = "\s{2,}" c.Value = re.Replace(Trim(txt), " ") Next c End Sub A bit shorter: Option Explicit Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" c.Value = Application.WorksheetFunction.Trim _ (re.Replace(c.Text, "$1.")) Next c End Sub --ron |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Hi Peter,
Thanks for this response! I can't believe how helpful everyone is in this group and I appreciate it so much. I'm not exactly sure what you mean when you say replace sOrig with my text and wonder if this would be the cell column. However, the previous posts macro is working wonderful, and I'm not sure what this would do different than what I have now. I can see why you said my examples threw you because as I look at them now, they do not look like they have the 2 spaces in them as I had typed them. Please let me know if there is something this does that the other macro is not doing. And, I assume this is a macro that you are giving me; let me know if it is something else. This is my first attempt at doing things programmatically in Excel, so I'm very ignorant and appreciate all the help I have received. Again, Thanks! Sherry "Peter T" wrote: Worksheet & VB/VBA Trim functions are slightly different, inasmuch the Worksheet function replaces any multiple spaces in the middle of text with a single space. Your sample text as posted threw me at first (when pasted directly into a cell), but try this sTmp = Replace(sOrig, Chr(160), " ") sOut = Application.WorksheetFunction.Trim(sTmp) where sOrig contains your text. As a cell formula - =TRIM(SUBSTITUTE(A1,CHAR(160)," ")) Regards, Peter T "SherryScrapDog" wrote in message ... I just had some really great help with some Excel files from this group where the formatting was done by various people over the last 15-20 years. These are names for genealogy, and I am loading many Excel files into Access so people can search by name and soundex. I don't know if the other problem I am dealing with can be handled programatically, but some of the people in the past entered extra spaces between the first and middle names. I am going thru the files manually now and removing the extra space. I have more files to do and thought it was worth asking if a macro could do this. Here are some examples: John William (I change to John William by using delete key) Doris Doe Smith (I change to Doris Doe Smith) John W. ( change to John W.) I found many posts on this site that talks about Trim, but could not find anything that specifically addressed just removing extra space within text. Thanks in advance if there is any help for me, Sherry |
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
A bit shorter:
Option Explicit Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" c.Value = Application.WorksheetFunction.Trim _ (re.Replace(c.Text, "$1.")) Next c End Sub I thought about that solution, but I was not sure what was more efficient in the end... continually running out to the spreadsheet level to get one of its functions within the loop or simply resetting the Pattern each time. Can you, or anyone reading this thread, give me an idea of the "penalty" incurred, if any, by a macro when it reaches out to the spreadsheet world in order to execute one of its functions via the Application.WorksheetFunction property? Rick |
#12
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Hi Sherry,
I can't believe how helpful everyone is in this group Just normal service ! I'm not exactly sure what you mean when you say replace sOrig with my text and wonder if this would be the cell column. For use in the example as posted I meant first assign the text you want to process to the variable sOrig. The variable would have been declared like this Dim sOrig as String or it might be an argument in a function purely for testing copy "Doris Doe Smith" without the quotes into a cell, select the cell and run this macro Sub test1() Dim s As String, sReturn As String s = ActiveCell sReturn = TrimSpaces(s) MsgBox s & vbNewLine & sReturn End Sub Function TrimSpaces(sOrig As String) As String Dim sTmp As String Dim sOut As String sTmp = Replace(sOrig, Chr(160), " ") sOut = Application.WorksheetFunction.Trim(sTmp) TrimSpaces = sOut End Function The above may not be efficient and not necessary to use the additional sTmp variable. It depends on where your data (text to be processed) is coming from and what you want to do with it. You might, for example, pass an array to a function and loop each element. If it's a one off type of thing, copy and paste all data to cells in a column, copy down the cell formula as posted previously. If you are sure the pseudo space chr(160) only crept into the examples in your post, and never exists in your data, you'd only need the worksheet Trim function. If you want to filter or make further changes RegExp may will provide much greater flexibility, otherwise go with the simplest and/or fastest method you are sure will work for your needs. Regards, Peter T "SherryScrapDog" wrote in message ... Hi Peter, Thanks for this response! I can't believe how helpful everyone is in this group and I appreciate it so much. I'm not exactly sure what you mean when you say replace sOrig with my text and wonder if this would be the cell column. However, the previous posts macro is working wonderful, and I'm not sure what this would do different than what I have now. I can see why you said my examples threw you because as I look at them now, they do not look like they have the 2 spaces in them as I had typed them. Please let me know if there is something this does that the other macro is not doing. And, I assume this is a macro that you are giving me; let me know if it is something else. This is my first attempt at doing things programmatically in Excel, so I'm very ignorant and appreciate all the help I have received. Again, Thanks! Sherry "Peter T" wrote: Worksheet & VB/VBA Trim functions are slightly different, inasmuch the Worksheet function replaces any multiple spaces in the middle of text with a single space. Your sample text as posted threw me at first (when pasted directly into a cell), but try this sTmp = Replace(sOrig, Chr(160), " ") sOut = Application.WorksheetFunction.Trim(sTmp) where sOrig contains your text. As a cell formula - =TRIM(SUBSTITUTE(A1,CHAR(160)," ")) Regards, Peter T "SherryScrapDog" wrote in message ... I just had some really great help with some Excel files from this group where the formatting was done by various people over the last 15-20 years. These are names for genealogy, and I am loading many Excel files into Access so people can search by name and soundex. I don't know if the other problem I am dealing with can be handled programatically, but some of the people in the past entered extra spaces between the first and middle names. I am going thru the files manually now and removing the extra space. I have more files to do and thought it was worth asking if a macro could do this. Here are some examples: John William (I change to John William by using delete key) Doris Doe Smith (I change to Doris Doe Smith) John W. ( change to John W.) I found many posts on this site that talks about Trim, but could not find anything that specifically addressed just removing extra space within text. Thanks in advance if there is any help for me, Sherry |
#13
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Thanks much Peter for clarification, and for teaching me more. I very much
enjoy leaning all I can! Sherry "Peter T" wrote: Hi Sherry, I can't believe how helpful everyone is in this group Just normal service ! I'm not exactly sure what you mean when you say replace sOrig with my text and wonder if this would be the cell column. For use in the example as posted I meant first assign the text you want to process to the variable sOrig. The variable would have been declared like this Dim sOrig as String or it might be an argument in a function purely for testing copy "Doris Doe Smith" without the quotes into a cell, select the cell and run this macro Sub test1() Dim s As String, sReturn As String s = ActiveCell sReturn = TrimSpaces(s) MsgBox s & vbNewLine & sReturn End Sub Function TrimSpaces(sOrig As String) As String Dim sTmp As String Dim sOut As String sTmp = Replace(sOrig, Chr(160), " ") sOut = Application.WorksheetFunction.Trim(sTmp) TrimSpaces = sOut End Function The above may not be efficient and not necessary to use the additional sTmp variable. It depends on where your data (text to be processed) is coming from and what you want to do with it. You might, for example, pass an array to a function and loop each element. If it's a one off type of thing, copy and paste all data to cells in a column, copy down the cell formula as posted previously. If you are sure the pseudo space chr(160) only crept into the examples in your post, and never exists in your data, you'd only need the worksheet Trim function. If you want to filter or make further changes RegExp may will provide much greater flexibility, otherwise go with the simplest and/or fastest method you are sure will work for your needs. Regards, Peter T |
#14
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Can you, or anyone reading this thread, give me an idea of the "penalty"
incurred, if any, by a macro when it reaches out to the spreadsheet world in order to execute one of its functions via the Application.WorksheetFunction property? Rick Worksheet functions, whilst extremely efficient in cells, are slow when called in VBA. It's often quicker to recreate the worksheet function. Although in another part of this thread I suggested Application.Worksheetfunction.Trim( personally I wouldn't use it in in anything time sensitive, say as a UDF in a large number of cells or as part of a long loop. Other ways to replace multiple spaces with singles besides RegExp. If using multiple worksheet functions in the same loop use 'wf' as in the following Sub foo() Dim wf As WorksheetFunction Set wf = Application.WorksheetFunction s = wf.Trim(" a b c ") MsgBox s End Sub Even setting a ref to WorksheetFunction object only speeds up a little. Also I wouldn't use RegExp in a UDF and probably not in a function that was called in a much larger loop. Although RegExp is extremely efficient, creating the object is (relatively) slow. IOW, great in macro or function that receives an array to process, but don't send each element to a separate function that does CreateObject("vbscript.regexp"). I suppose could store the object at module level. Regards, Peter T |
#15
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
On Mon, 24 Dec 2007 12:52:47 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: A bit shorter: Option Explicit Sub AddDot() Dim c As Range Dim txt As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True For Each c In Selection re.Pattern = "\b([A-Z])\b(?!\.)" c.Value = Application.WorksheetFunction.Trim _ (re.Replace(c.Text, "$1.")) Next c End Sub I thought about that solution, but I was not sure what was more efficient in the end... continually running out to the spreadsheet level to get one of its functions within the loop or simply resetting the Pattern each time. Can you, or anyone reading this thread, give me an idea of the "penalty" incurred, if any, by a macro when it reaches out to the spreadsheet world in order to execute one of its functions via the Application.WorksheetFunction property? Rick I'm not sure which is more efficient. I agree with Dana about moving pattern outside the loop if you only are using a single pattern. I was still thinking about perhaps using multiple patterns. I suppose you could do something like: ===================================== Sub AddDot() Dim c As Range Dim re As Object Dim wsf As WorksheetFunction Set wsf = Application.WorksheetFunction Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True re.Pattern = "\b([A-Z])\b(?!\.)" For Each c In Selection c.Value = wsf.Proper(wsf.Trim(re.Replace(c.Text, "$1."))) Next c End Sub =========================================== to speed it up a bit, or even set a reference (tools/references) to Microsoft VBSCript 5.5 and then use: ============================================= Option Explicit Sub AddDot() 'requires reference set to Microsoft _ ' VBScript Regular Expressions 5.5 Dim c As Range Dim re As RegExp Dim wsf As WorksheetFunction Set wsf = Application.WorksheetFunction Set re = New RegExp re.IgnoreCase = True re.Global = True re.Pattern = "\b([A-Z])\b(?!\.)" For Each c In Selection c.Value = wsf.Proper(wsf.Trim(re.Replace(c.Text, "$1."))) Next c End Sub ========================================== And I don't know whether, with this structure, it makes sense to explicitly release the objects at the end of the Sub. ================================================== = Sub AddDot() 'requires reference set to Microsoft _ ' VBScript Regular Expressions 5.5 Dim c As Range Dim re As RegExp Dim wsf As WorksheetFunction Set wsf = Application.WorksheetFunction Set re = New RegExp re.IgnoreCase = True re.Global = True re.Pattern = "\b([A-Z])\b(?!\.)" For Each c In Selection c.Value = wsf.Proper(wsf.Trim(re.Replace(c.Text, "$1."))) Next c Set re = Nothing Set wsf = Nothing End Sub ================================================== = --ron |
#16
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Can you, or anyone reading this thread, give me an idea of the
"penalty" incurred, if any, by a macro when it reaches out to the spreadsheet world in order to execute one of its functions via the Application.WorksheetFunction property? Worksheet functions, whilst extremely efficient in cells, are slow when called in VBA. That was my gut feeling. I figured if calling a macro from the spreadsheet is slow (I've seen several references to that fact), then it must also be slow going the other way. Hence, I opted for the pattern toggling code that I posted. It's often quicker to recreate the worksheet function. Now that is interesting. I wouldn't use it in in anything time sensitive, say as a UDF in a large number of cells or as part of a long loop. I would agree with that. Other ways to replace multiple spaces with singles besides RegExp. The way I this over in the compiled VB world (which works here in the Excel VBA world as well) is like this... Do While InStr(SomeText, " ") 0 SomeText = Replace(SomeText, " ", " ") Loop Also I wouldn't use RegExp in a UDF and probably not in a function that was called in a much larger loop. Although RegExp is extremely efficient, creating the object is (relatively) slow. IOW, great in macro or function that receives an array to process, but don't send each element to a separate function that does CreateObject("vbscript.regexp"). In the compiled VB world, our tendency is to avoid calling out to any kind of scripting type object as this is very slow to do from a compiled executable and, very often, businesses have scripting completely turned off for security reasons. However, since VBA macros or UDFs are not compiled executable, I have no feeling for the efficiency calling a scripting object from within one. Rick |
#17
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
On Mon, 24 Dec 2007 14:47:43 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: In the compiled VB world, our tendency is to avoid calling out to any kind of scripting type object as this is very slow to do from a compiled executable and, very often, businesses have scripting completely turned off for security reasons. However, since VBA macros or UDFs are not compiled executable, I have no feeling for the efficiency calling a scripting object from within one. One could certainly write a VBA routine that did not use any external calls to do this. And it might be faster, but maybe not worth the work. Another option, which uses Regular Expressions but should be faster than VBScript since it is written in C++ and compiled as an xll add-in, would be to download and install Laurent Longre's free morefunc.xll add-in from http://xcell05.free.fr/english/index.html and then use the following formula: =TRIM(REGEX.SUBSTITUTE(A1,"(\b([A-Z])\b(?!\.))","[1].",,,FALSE)) or, if you wanted to use it as a Sub, then: ===================================== Option Explicit Sub AddDot() Dim c As Range Const sPat As String = "(^\s*)|(\s*$)|(\s{2,})|(\b([A-Z])\b(?!\.))" Const sRes As String = "[1=,2=,3= ,4=[4].]" For Each c In Selection c.Value = Run([Regex.substitute], c.Text, sPat, sRes, , , False) Next c End Sub ======================================= This add-in has a number of other useful functions. I had avoided recommending it for a while since he had not updated it to be compatible with XL2007, but it now is, for the most part. The other problem with the add-in, which is related to a limitation in XLL, is that you cannot pass strings 255 characters. But that should not be an issue for this application. --ron |
#18
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
Hi Rick,
Worksheet functions, whilst extremely efficient in cells, are slow when called in VBA. That was my gut feeling. I figured if calling a macro from the spreadsheet is slow (I've seen several references to that fact), then it must also be slow going the other way. Hence, I opted for the pattern toggling code that I posted. By 'macro' I assume you mean a UDF but not quite sure what you mean by "slow going the other way". It's often quicker to recreate the worksheet function. Now that is interesting. I wouldn't use it in in anything time sensitive, say as a UDF in a large number of cells or as part of a long loop. I would agree with that. Other ways to replace multiple spaces with singles besides RegExp. The way I this over in the compiled VB world (which works here in the Excel VBA world as well) is like this... Do While InStr(SomeText, " ") 0 SomeText = Replace(SomeText, " ", " ") Loop That's what I had in mind. It might be worth comparing re-calc time with a UDF like that vs Worksheetfunction.trim in a large number of cells Also I wouldn't use RegExp in a UDF and probably not in a function that was called in a much larger loop. Although RegExp is extremely efficient, creating the object is (relatively) slow. IOW, great in macro or function that receives an array to process, but don't send each element to a separate function that does CreateObject("vbscript.regexp"). In the compiled VB world, our tendency is to avoid calling out to any kind of scripting type object as this is very slow to do from a compiled executable and, very often, businesses have scripting completely turned off for security reasons. However, since VBA macros or UDFs are not compiled executable, I have no feeling for the efficiency calling a scripting object from within one. Even VBA compiles albeit in not the same way as VB. Unlike VB6, a workbook and code can be saved with code totally un-compiled, partially compiled or fully compiled. Considerations for re-distribution are file size vs speed the first time code is run in a session. Ie, the code compiles as it's run (if it had been saved un-compiled) and remains compiled thereafter in the current session (unless edited). There was a long thread a while back testing the merits of string manipulation with RegExp and looping through a byte array, with proponents on either side with me somewhere in the middle. In the testing I did I found RegExp to work well (ie fast) when processing a large array in one procedure but very slow in a UDF or even if called multiple times in a normal VBA function. The reason was due to setting the reference to the scripting object multiple times, ie each time the function was called. In a UDF the byte array method won hands down. Other issues, such as "very often, businesses have scripting completely turned off" would equally affect VBA. Regards, Pete T |
#19
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
On Mon, 24 Dec 2007 14:47:43 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: Can you, or anyone reading this thread, give me an idea of the "penalty" incurred, if any, by a macro when it reaches out to the spreadsheet world in order to execute one of its functions via the Application.WorksheetFunction property? Worksheet functions, whilst extremely efficient in cells, are slow when called in VBA. That was my gut feeling. I figured if calling a macro from the spreadsheet is slow (I've seen several references to that fact), then it must also be slow going the other way. Hence, I opted for the pattern toggling code that I posted. It's often quicker to recreate the worksheet function. Now that is interesting. I wouldn't use it in in anything time sensitive, say as a UDF in a large number of cells or as part of a long loop. I would agree with that. Other ways to replace multiple spaces with singles besides RegExp. The way I this over in the compiled VB world (which works here in the Excel VBA world as well) is like this... Do While InStr(SomeText, " ") 0 SomeText = Replace(SomeText, " ", " ") Loop Also I wouldn't use RegExp in a UDF and probably not in a function that was called in a much larger loop. Although RegExp is extremely efficient, creating the object is (relatively) slow. IOW, great in macro or function that receives an array to process, but don't send each element to a separate function that does CreateObject("vbscript.regexp"). In the compiled VB world, our tendency is to avoid calling out to any kind of scripting type object as this is very slow to do from a compiled executable and, very often, businesses have scripting completely turned off for security reasons. However, since VBA macros or UDFs are not compiled executable, I have no feeling for the efficiency calling a scripting object from within one. Rick OK, after reading about all these delays, I decided to do some timing tests and I find that the differences in time related to the procedure calls are far outweighed by the time involved in reading or writing to/from a cell. Here is the first run just picking out if a word is present in a string and writing the results to a cell: ======================================== Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = "Now is the time for all" Dim t As Long t = Timer test1 Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec" t = Timer test2 Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim c As Range For Each c In Range("a:a") If InStr(1, c.Text, "time") 0 Then c.Offset(0, 1).Value = "time" End If Next c End Sub Sub test2() Dim c As Range Dim re As New RegExp Dim mc As MatchCollection re.Pattern = "time" For Each c In Range("a:a") Set mc = re.Execute(c.Text) c.Offset(0, 3) = mc(0) Next c End Sub Sub test3() Dim c As Range Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Pattern = "time" For Each c In Range("a:a") Set mc = re.Execute(c.Text) c.Offset(0, 3) = mc(0) Next c End Sub ============================================ InStr test 15.53 sec RegExp early binding Test 15.45 sec RegExp late binding test 16.59 sec ============================================== And again, just testing your double space removal routine: ================================================ Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = " Now is the time for all " Dim t As Long t = Timer test1 Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec" t = Timer test2 Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim c As Range Dim SomeText As String For Each c In Range("a:a") SomeText = c.Text Do While InStr(SomeText, " ") 0 SomeText = Replace(SomeText, " ", " ") Loop c.Offset(0, 1) = SomeText Next c End Sub Sub test2() Dim c As Range Dim re As New RegExp Dim mc As MatchCollection re.Pattern = "\s{2,}" re.Global = True For Each c In Range("a:a") c.Offset(0, 3) = re.Replace(c.Text, " ") Next c End Sub Sub test3() Dim c As Range Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Pattern = "\s{2,}" re.Global = True For Each c In Range("a:a") c.Offset(0, 3) = re.Replace(c.Text, " ") Next c End Sub ================================================== ====== InStr test 16.72 sec RegExp early binding Test 15.95 sec RegExp late binding test 16.47 sec ====================================== Removing the read/write part, setting a reference to Regular Expressions, and doing 1,000,000 operations, some differences show up: ================================================== =========== Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = " Now is the time for all " Dim t As Long t = Timer test1 Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim sTxt As String Dim i As Long sTxt = Range("a1").Text For i = 1 To 2 ^ 20 Do While InStr(sTxt, " ") 0 sTxt = Replace(sTxt, " ", " ") Loop Next i Debug.Print sTxt End Sub Sub test3() Dim re As New RegExp Dim sTxt As String Dim i As Long sTxt = Range("a1").Text re.Global = True re.Pattern = "\s{2,}" For i = 1 To 2 ^ 20 sTxt = re.Replace(sTxt, " ") Next i Debug.Print sTxt End Sub ================================================== ============ Now is the time for all Instr Test .25 sec Now is the time for all RegExp test 2.05 sec ================================================== =========== I also ran a test which shot down my expectation that the morefunc.xll addin would run faster: ======================================== Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = " Now is the time for all " Dim t As Long t = Timer test1 Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec" t = Timer test2 Debug.Print "morefunc test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim sTxt As String Dim i As Long sTxt = Range("a1").Text For i = 1 To 2 ^ 20 Do While InStr(sTxt, " ") 0 sTxt = Replace(sTxt, " ", " ") Loop Next i Debug.Print sTxt End Sub Sub test2() Dim sTxt As String, sPat As String Dim i As Long sTxt = Range("a1").Text sPat = "\s{2,}" For i = 1 To 2 ^ 20 sTxt = Run([regex.substitute], sTxt, sPat, " ") Next i Debug.Print sTxt End Sub Sub test3() Dim re As New RegExp Dim sTxt As String Dim i As Long sTxt = Range("a1").Text re.Global = True re.Pattern = "\s{2,}" For i = 1 To 2 ^ 20 sTxt = re.Replace(sTxt, " ") Next i Debug.Print sTxt End Sub ================================================== == Now is the time for all Instr Test .75 sec Now is the time for all morefunc test 173.48 sec Now is the time for all RegExp test 2.28 sec =================================== So really, the question comes up as to whether or not the timing differences make a difference in the usual VBA applications, or will the timing differences be irrelevant given the timings of the rest of the routines. --ron |
#20
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
If I read your results correctly, it would seem for a macro executing wholly
within the VBA world, using the regular expression scripting engine is slower than using native, built-in VBA functions; however, once your macro interacts with the spreadsheet world, any coding time differences get buried by the time required for VBA to communicate with the spreadsheet. Rick "Ron Rosenfeld" wrote in message ... On Mon, 24 Dec 2007 14:47:43 -0500, "Rick Rothstein \(MVP - VB\)" wrote: Can you, or anyone reading this thread, give me an idea of the "penalty" incurred, if any, by a macro when it reaches out to the spreadsheet world in order to execute one of its functions via the Application.WorksheetFunction property? Worksheet functions, whilst extremely efficient in cells, are slow when called in VBA. That was my gut feeling. I figured if calling a macro from the spreadsheet is slow (I've seen several references to that fact), then it must also be slow going the other way. Hence, I opted for the pattern toggling code that I posted. It's often quicker to recreate the worksheet function. Now that is interesting. I wouldn't use it in in anything time sensitive, say as a UDF in a large number of cells or as part of a long loop. I would agree with that. Other ways to replace multiple spaces with singles besides RegExp. The way I this over in the compiled VB world (which works here in the Excel VBA world as well) is like this... Do While InStr(SomeText, " ") 0 SomeText = Replace(SomeText, " ", " ") Loop Also I wouldn't use RegExp in a UDF and probably not in a function that was called in a much larger loop. Although RegExp is extremely efficient, creating the object is (relatively) slow. IOW, great in macro or function that receives an array to process, but don't send each element to a separate function that does CreateObject("vbscript.regexp"). In the compiled VB world, our tendency is to avoid calling out to any kind of scripting type object as this is very slow to do from a compiled executable and, very often, businesses have scripting completely turned off for security reasons. However, since VBA macros or UDFs are not compiled executable, I have no feeling for the efficiency calling a scripting object from within one. Rick OK, after reading about all these delays, I decided to do some timing tests and I find that the differences in time related to the procedure calls are far outweighed by the time involved in reading or writing to/from a cell. Here is the first run just picking out if a word is present in a string and writing the results to a cell: ======================================== Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = "Now is the time for all" Dim t As Long t = Timer test1 Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec" t = Timer test2 Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim c As Range For Each c In Range("a:a") If InStr(1, c.Text, "time") 0 Then c.Offset(0, 1).Value = "time" End If Next c End Sub Sub test2() Dim c As Range Dim re As New RegExp Dim mc As MatchCollection re.Pattern = "time" For Each c In Range("a:a") Set mc = re.Execute(c.Text) c.Offset(0, 3) = mc(0) Next c End Sub Sub test3() Dim c As Range Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Pattern = "time" For Each c In Range("a:a") Set mc = re.Execute(c.Text) c.Offset(0, 3) = mc(0) Next c End Sub ============================================ InStr test 15.53 sec RegExp early binding Test 15.45 sec RegExp late binding test 16.59 sec ============================================== And again, just testing your double space removal routine: ================================================ Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = " Now is the time for all " Dim t As Long t = Timer test1 Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec" t = Timer test2 Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim c As Range Dim SomeText As String For Each c In Range("a:a") SomeText = c.Text Do While InStr(SomeText, " ") 0 SomeText = Replace(SomeText, " ", " ") Loop c.Offset(0, 1) = SomeText Next c End Sub Sub test2() Dim c As Range Dim re As New RegExp Dim mc As MatchCollection re.Pattern = "\s{2,}" re.Global = True For Each c In Range("a:a") c.Offset(0, 3) = re.Replace(c.Text, " ") Next c End Sub Sub test3() Dim c As Range Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Pattern = "\s{2,}" re.Global = True For Each c In Range("a:a") c.Offset(0, 3) = re.Replace(c.Text, " ") Next c End Sub ================================================== ====== InStr test 16.72 sec RegExp early binding Test 15.95 sec RegExp late binding test 16.47 sec ====================================== Removing the read/write part, setting a reference to Regular Expressions, and doing 1,000,000 operations, some differences show up: ================================================== =========== Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = " Now is the time for all " Dim t As Long t = Timer test1 Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim sTxt As String Dim i As Long sTxt = Range("a1").Text For i = 1 To 2 ^ 20 Do While InStr(sTxt, " ") 0 sTxt = Replace(sTxt, " ", " ") Loop Next i Debug.Print sTxt End Sub Sub test3() Dim re As New RegExp Dim sTxt As String Dim i As Long sTxt = Range("a1").Text re.Global = True re.Pattern = "\s{2,}" For i = 1 To 2 ^ 20 sTxt = re.Replace(sTxt, " ") Next i Debug.Print sTxt End Sub ================================================== ============ Now is the time for all Instr Test .25 sec Now is the time for all RegExp test 2.05 sec ================================================== =========== I also ran a test which shot down my expectation that the morefunc.xll addin would run faster: ======================================== Option Explicit Sub foo() Range("a:d").Clear Range("a:a").Value = " Now is the time for all " Dim t As Long t = Timer test1 Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec" t = Timer test2 Debug.Print "morefunc test", Format(Timer - t, "#.00") & " sec" t = Timer test3 Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec" End Sub Sub test1() Dim sTxt As String Dim i As Long sTxt = Range("a1").Text For i = 1 To 2 ^ 20 Do While InStr(sTxt, " ") 0 sTxt = Replace(sTxt, " ", " ") Loop Next i Debug.Print sTxt End Sub Sub test2() Dim sTxt As String, sPat As String Dim i As Long sTxt = Range("a1").Text sPat = "\s{2,}" For i = 1 To 2 ^ 20 sTxt = Run([regex.substitute], sTxt, sPat, " ") Next i Debug.Print sTxt End Sub Sub test3() Dim re As New RegExp Dim sTxt As String Dim i As Long sTxt = Range("a1").Text re.Global = True re.Pattern = "\s{2,}" For i = 1 To 2 ^ 20 sTxt = re.Replace(sTxt, " ") Next i Debug.Print sTxt End Sub ================================================== == Now is the time for all Instr Test .75 sec Now is the time for all morefunc test 173.48 sec Now is the time for all RegExp test 2.28 sec =================================== So really, the question comes up as to whether or not the timing differences make a difference in the usual VBA applications, or will the timing differences be irrelevant given the timings of the rest of the routines. --ron |
#21
Posted to microsoft.public.excel.programming
|
|||
|
|||
Remove extra space within text
On Wed, 26 Dec 2007 16:01:03 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: If I read your results correctly, it would seem for a macro executing wholly within the VBA world, using the regular expression scripting engine is slower than using native, built-in VBA functions; however, once your macro interacts with the spreadsheet world, any coding time differences get buried by the time required for VBA to communicate with the spreadsheet. Rick That would be my conclusion, also. It does give rise to some coding considerations (i.e. keep as much of the scripting engine outside of the read/write loops as possible), but that just makes common sense, anyway. --ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How do I remove extra space when text wrapping? | Excel Worksheet Functions | |||
How can I remove extra space when using autofit in Excel 2003? | Excel Discussion (Misc queries) | |||
How do I remove extra space when text wrapping? | Excel Discussion (Misc queries) | |||
Remove an extra space after a number. | Excel Worksheet Functions | |||
Remove Space in Text | Excel Discussion (Misc queries) |