Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
Doesn't seem like many people are using the regular expression capability in
Excel. Some reason? Apart from the fact that it's kind of hard to figure out? Actually, I'm trying to port some JavaScript regular expressions into Excel... I think I've enabled the required library linkage, but I can't seem to figure out the syntax required to refer to all of the lines in the spreadsheet (which is really just a list of text strings). Don't seem to be any good examples anywhere I can find them on the Web, and the Microsoft documentation has been typically unenlightening. Are the examples and useful instructions more liable to be found on the Word side? Some of the later manipulations would be better with the spreadsheet's capabilities, however. |
#2
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
Hi
post your current code which does not work -- Regards Frank Kabel Frankfurt, Germany "Shannon Jacobs" schrieb im Newsbeitrag ... Doesn't seem like many people are using the regular expression capability in Excel. Some reason? Apart from the fact that it's kind of hard to figure out? Actually, I'm trying to port some JavaScript regular expressions into Excel... I think I've enabled the required library linkage, but I can't seem to figure out the syntax required to refer to all of the lines in the spreadsheet (which is really just a list of text strings). Don't seem to be any good examples anywhere I can find them on the Web, and the Microsoft documentation has been typically unenlightening. Are the examples and useful instructions more liable to be found on the Word side? Some of the later manipulations would be better with the spreadsheet's capabilities, however. |
#3
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
There's a tutorial on the MVPS site. Word's regular expressions are pretty
well the same as any other, except that it uses minimal matching where unix regex uses maximal. "Shannon Jacobs" wrote in message ... Doesn't seem like many people are using the regular expression capability in Excel. Some reason? Apart from the fact that it's kind of hard to figure out? Actually, I'm trying to port some JavaScript regular expressions into Excel... I think I've enabled the required library linkage, but I can't seem to figure out the syntax required to refer to all of the lines in the spreadsheet (which is really just a list of text strings). Don't seem to be any good examples anywhere I can find them on the Web, and the Microsoft documentation has been typically unenlightening. Are the examples and useful instructions more liable to be found on the Word side? Some of the later manipulations would be better with the spreadsheet's capabilities, however. |
#4
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
Frank Kabel wrote:
Hi post your current code which does not work <old context snip Well, I'd be willing to consider pasting in the code, but right now I can't do it. I can't even tell for sure whether or not it is working. If it is working, then Microsoft's implementation is incredibly bad and slow, which would also explain why it seems that very few people are using the feature. The current status is that I can't even get Excel's attention to ask it to look at the code, but it's using 95% of the machine cycles and seems happy, so I'm waiting to see what results come out, if any. From memory I can say that I just defined a simple regexp, basically two options before a short trailer string in the Pattern. Then I tried to invoke a Replace on the Cells (just over 2000 in a single column), which did nothing yesterday. Today I created an outer loop and tried to pass each cell in individually--but I'm still waiting for it to finish or crash or do something. About 10 minutes now, and still no results, and CPU still around 95%. In yesterday's experiments I just zapped Excel at this point, and upon restarting it would attempt to recover the file, but whether I used the original saved file or the recovered version, in either case after that Excel would not allow me to use any macros. The error message complained about the security settings, but no matter what I did to the security settings I could not get back into the file. Had to zot the file completely each time and start from scratch. Like they say: Microsoft's way or the highway. |
#5
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
If you are dealing with just one column of the spreadsheet, it might be
easier to copy it into a Word document, make the changes there, and copy it back. Word's Find and Replace has the regular expression functions built in (check the 'Use Wildcards' checkbox), so you can see what's going on and you don't need to screw around with coding at all. "Shannon Jacobs" wrote in message ... Jezebel wrote: You could try -- Sub clean() Dim aRegExp as object Dim I As Long Set aRegExp = CreateObject("vbscript.regexp") With aRegExp .Pattern = "(height|width)=""12"" " .Global = True .IgnoreCase = True ' .MultiLine = True End With For I = 65 To ActiveSheet.UsedRange.Rows.Count Cells(I, 1) = aRegExp.Replace(Cells(I, 1), "") Next End Sub Are there more search strings you're dealing with? Otherwise, it would be heaps quicker to use: with Range("A65:A" & ActiveSheet.UsedRange.Rows.Count) .Replace What:="Height=""12""", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False .Replace What:="Width=""12""", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False end with This sounds very much like what I've been trying to figure out, though I won't be able to pursue it farther until later, probably Friday at earliest. However, right now I'm struggling with the following problem. I want to drop the last part of the image tags, and all of the first part including the directory information. All I want to have left is the stem name of the file. The first triplet works properly, but the second one is overmatching somehow, and the stem name is also dropped. Even worse, in this latest form I tried to use the parenthetic grouping for a clause and then the \1 to return the stem, but now it just sticks a "\1" into the file. ' remove ends of image tags aRegExp.Pattern = "\.gif[\w+ ?= ?\w+]*" theLine = aRegExp.Replace(theLine, "") ' remove fronts of image tags, but save the last word aRegExp.Pattern = "<img src ?= ?\./[\w+/]*(\w+)" theLine = aRegExp.Replace(theLine, "\1") I haven't even gotten to the really nasty part of the file transformation yet, though so far I'm getting a moderate feel for how they work here... Not sure if it will help, but my current primary reference is the following URL, and I'm working with an English version of Excel XP right now. http://msdn.microsoft.com/library/de...ting051099.asp |
#6
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
Jezebel wrote:
If you are dealing with just one column of the spreadsheet, it might be easier to copy it into a Word document, make the changes there, and copy it back. Word's Find and Replace has the regular expression functions built in (check the 'Use Wildcards' checkbox), so you can see what's going on and you don't need to screw around with coding at all. The Word wildcards seem to be much more limited than real regular expressions, so I had already eliminated that option. I haven't listed all of my transformations here, but a number of them are somewhat trickier, so that won't work. Your suggestion with the loop simplification is good, but I don't think the Call itself is really causing much of the overhead, but you are right that I should move it to one level. The other idea about using the range.replace doesn't seem to apply here. (In addition, I admit that part of my motivation is to learn about macro programming specifically within Excel, since my coworkers already use it a lot (though none of them are into regular expressions).) For now, I'll just note the mistake and fix for the problem I commented on last time: This is the incorrect code: ' remove fronts of image tags, but save the last word aRegExp.Pattern = "<img src ?= ?\./[\w+/]*(\w+)" theLine = aRegExp.Replace(theLine, "\1") This is correct: ' remove fronts of image tags, but save the last word aRegExp.Pattern = "<img src ?= ?\./(\w+/)*" theLine = aRegExp.Replace(theLine, "") Typically dumb typo with typical fatal results in a RegExp. Within a bracketed class of course the \w picked up the rest of the string, whether or not it ended in the slash that I intended to require. By ending the match after the last slash, I didn't have to worry about explicitly preserving the last word. I still don't understand why the \1 didn't work as expected, but just for grins, I tried ye olde $1, which produced really weird results... This may be a future problem, since I think that one of the later transformations is not just a simple deletion, but will require submatch reference in the replacement... Right now I'm just massaging the line breaks, which is kind of a nuisance since I still can't get the MultiLine parameter to work as I had hoped. If that would only work, I should be able to slurp the next major transformation in one step or two... |
#7
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
Check some of the web tutorials (eg
http://word.mvps.org/FAQs/General/UsingWildcards.htm) on Word's Find and Replace -- it really does support full regular expressions. BTW, why do your posts use JIS (Japanese) encoding? "Shannon Jacobs" wrote in message ... Jezebel wrote: If you are dealing with just one column of the spreadsheet, it might be easier to copy it into a Word document, make the changes there, and copy it back. Word's Find and Replace has the regular expression functions built in (check the 'Use Wildcards' checkbox), so you can see what's going on and you don't need to screw around with coding at all. The Word wildcards seem to be much more limited than real regular expressions, so I had already eliminated that option. I haven't listed all of my transformations here, but a number of them are somewhat trickier, so that won't work. Your suggestion with the loop simplification is good, but I don't think the Call itself is really causing much of the overhead, but you are right that I should move it to one level. The other idea about using the range.replace doesn't seem to apply here. (In addition, I admit that part of my motivation is to learn about macro programming specifically within Excel, since my coworkers already use it a lot (though none of them are into regular expressions).) For now, I'll just note the mistake and fix for the problem I commented on last time: This is the incorrect code: ' remove fronts of image tags, but save the last word aRegExp.Pattern = "<img src ?= ?\./[\w+/]*(\w+)" theLine = aRegExp.Replace(theLine, "\1") This is correct: ' remove fronts of image tags, but save the last word aRegExp.Pattern = "<img src ?= ?\./(\w+/)*" theLine = aRegExp.Replace(theLine, "") Typically dumb typo with typical fatal results in a RegExp. Within a bracketed class of course the \w picked up the rest of the string, whether or not it ended in the slash that I intended to require. By ending the match after the last slash, I didn't have to worry about explicitly preserving the last word. I still don't understand why the \1 didn't work as expected, but just for grins, I tried ye olde $1, which produced really weird results... This may be a future problem, since I think that one of the later transformations is not just a simple deletion, but will require submatch reference in the replacement... Right now I'm just massaging the line breaks, which is kind of a nuisance since I still can't get the MultiLine parameter to work as I had hoped. If that would only work, I should be able to slurp the next major transformation in one step or two... |
#8
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
As regards the Word regular expressions, I'm afraid the answer is "No, they
are not 'full' regular expressions." For example, my very first simple transformation (expressed as "(height|width) ?=? ?""?\d+%?""? ?" using VB) cannot be expressed in that system. At least I could find no way to do so, and the URL you provided also shows no way to do the simple string alternative. Here is another important bit of evidence from the URL you provided: "If you are using VBA, you might want to look at the RegExp object (from VBScript, which you can include in your macro projects, and which offers some features not included in Word wildcards)." I'm never absolutely sure in dealing with Microsoft, but at this point I'm pretty well convinced that the situation is that there are four "levels" of search and replace available within Word: (1) literal, (2) very basic wildcards with a few special notations, (3) extended wildcards with some regexp notation, and (4) full regular expressions (which can only be accessed via VBScript). (Actually, there also appear to be two levels of the "full" regular expressions.) The advantage of being Microsoft is that you can announce that all of these are now de facto standards. Within Excel, I haven't found access to levels (2) or (3), but that's moot, since they can't handle some of the transformations I'm already doing from level (4). (Or is that really level (4)b?) If I can just figure out some way to make the MultiLine option work, I think I'll basically be done with the project. I feel like I need some minor trick on the "Cells" object, but so far I haven't been able to find it. Hmm... Have you ever made a *really* long string by just appending all of the fields together? And sorry about the Japanese encoding. Sometimes OE respects my "standing request" to use a Western encoding, and sometimes it does not. It's pretty annoying to have to check all the time trying to catch it when it slips back. I have not been able to find the pattern of whatever is controlling the encoding (apart from OE trying to use the original encoding when making a reply, but that's actually a reasonable behavior and works properly (most of the time)). Jezebel wrote: Check some of the web tutorials (eg http://word.mvps.org/FAQs/General/UsingWildcards.htm) on Word's Find and Replace -- it really does support full regular expressions. BTW, why do your posts use JIS (Japanese) encoding? <older stuff snipped |
#9
Posted to microsoft.public.excel.programming,microsoft.public.word.vba.beginners,microsoft.public.word.vba.general
|
|||
|
|||
Regular expressions for replacements in Excel?
Well, below is the current version of the code, but unless someone is
willing to help and can clarify how to make the MultiLine thing work, I'm probably going to let it stand. This is about 5 times faster than the original version, but basically it's just inlining the subroutine call and reducing the overhead for object creation. This will process the 2000 lines in around 4 seconds. By using a "Dim theText() as String" statement the system seemed to accept the MultiLine assignment, but in reality only the first string from Cells was copied into my array, and all of my experiments failed. I wasn't able to find any directly applicable source code examples, and I couldn't twist any of the examples I did find into working as expected... My current conclusion is that there are very few people using the MultiLine capability, and none of them are monitoring any of these newsgroups. The other approach of using large strings seems implausible since there's a 32 KB size limit on the strings, and the total file is something over 200 KB. However, I may still be able to pursue that approach if I can devise some criteria for lumping things into reasonably small packages... Basically you can take the following as an example of how to use true regular expressions within Excel via a VB macro: Sub FastClean() Dim startTime startTime = Now Dim I As Long Dim lastI As Long Dim S As String Set aRegExp = CreateObject("vbscript.regexp") With aRegExp .Global = True .IgnoreCase = True ' .MultiLine = True End With ActiveCell.SpecialCells(xlLastCell).Select lastI = ActiveCell.Row For I = 1 To lastI S = Cells(I, 1) 'kill first three parameters aRegExp.Pattern = "(height|width|border)=""?\d+%?""? ?" S = aRegExp.Replace(S, "") ' Eight more patterns and replacements Cells(I, 1) = S Next MsgBox "Finished in " + Format(Now - startTime, "ss.s") + " seconds." End Sub |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Get rid of with regular expressions | Excel Discussion (Misc queries) | |||
Regular expressions in VB | Excel Discussion (Misc queries) | |||
Substring in excel? How about regular expressions? | Excel Discussion (Misc queries) | |||
Regular expressions in Excel | Excel Discussion (Misc queries) | |||
Regular Expressions in VBA & Excel including an interactive tool | Excel Programming |