![]() |
Extract number from cell when specific format is found
Is there a formula that will extract only the numbers that immediately follow
"E" in a string? For example, the formula would get "1787" from this string: http://www.glassdoor.com/Jobs/First-...10_KO11,28.htm |
Extract number from cell when specific format is found
If the number is always after "_IE" then the below formula would work
=LEFT(MID(A1,FIND("_IE",A1)+3,99),FIND(".",MID(A1, FIND("_IE",A1)+3,99))-1) If the number is the first numeric in the string the below should work =LEFT(MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&" 0123456789")),LEN(A1)),FIND(".",MID(A1,MIN(SEARCH( {0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),LEN(A1)))-1) If this post helps click Yes --------------- Jacob Skaria "paul c" wrote: Is there a formula that will extract only the numbers that immediately follow "E" in a string? For example, the formula would get "1787" from this string: http://www.glassdoor.com/Jobs/First-...10_KO11,28.htm |
Extract number from cell when specific format is found
Ron Rosenfeld wrote...
paul c <paul wrote... Is there a formula that will extract only the numbers that immediately follow "E" in a string? .... In either event, this can be done easily with a User Defined Function. . . |
Extract number from cell when specific format is found
Ron Rosenfeld wrote...
Harlan Grove wrote: .... Back to the point, it's more efficient to replace the unwanted contents with nothing than to use the match collection. What do you mean by "efficient"? Easier to code? Faster execution? Something else? .... Regex engine does less, so entire regex use takes less time and uses less memory. Regex .Replace method simply returns a string, but it's .Execute method create a Matches collection object. When is the "set re = Nothing" required? I said paranoia. I don't entirely trust CreateObject. It does no harm if it's not needed. So far as your regex is concerned, I believe the lazy qualifier following the initial "." is only required if there is more than one Ennnn substring AND the OP wants to return the first one. . . . Correct. It matches only up to the first E\d... pattern. . . . With it absent, the function will return the last Ennn; and I believe the regex engine will come to the solution in fewer steps with much less back-tracking. Unclear. You mean that greedy * would find the last match faster than non-greedy * would find the first? Unlikely. Also irrelevant if the goal is the first match. Maybe there's more backtracking in my function (picky: unlikely there's any true backtracking, but there's A LOT of look ahead checking), but it seems to run faster than your function. Maybe the Matches collection overhead is that much of a drag. Also, rewriting the regex to avoid non-greedy quantifiers requires something along the lines of re.Pattern = "^([^E]*(E\D)?)*E(\d+(\.\d+)?).*$" t = re.Replace(s, "$3") If t < vbNullString Then hg = CDbl(t) Else hg = CVErr(xlErrNum) which involves back-to-back-to-back 0-or-more quantifiers, so probably even more backtracking. FWLIW, it's a pity VBScript regexs lack a simple .Find method that returns the position of the first match in the string as a Long, or even better something like the awk approach result = match(s, /E[0-9]*\.?[0-9]+/) ? substr(s, RSTART, RLENGTH) : "" Constructing and dereferencing the Matches collection takes a lot of time. |
Extract number from cell when specific format is found
On Sat, 1 Aug 2009 16:18:11 -0700 (PDT), Harlan Grove
wrote: . . . With it absent, the function will return the last Ennn; and I believe the regex engine will come to the solution in fewer steps with much less back-tracking. Unclear. You mean that greedy * would find the last match faster than non-greedy * would find the first? Unlikely. Also irrelevant if the goal is the first match. Maybe there's more backtracking in my function (picky: unlikely there's any true backtracking, but there's A LOT of look ahead checking), but it seems to run faster than your function. Maybe the Matches collection overhead is that much of a drag. Also, rewriting the regex to avoid non-greedy quantifiers requires something along the lines of re.Pattern = "^([^E]*(E\D)?)*E(\d+(\.\d+)?).*$" t = re.Replace(s, "$3") If t < vbNullString Then hg = CDbl(t) Else hg = CVErr(xlErrNum) which involves back-to-back-to-back 0-or-more quantifiers, so probably even more backtracking. FWLIW, it's a pity VBScript regexs lack a simple .Find method that returns the position of the first match in the string as a Long, or even better something like the awk approach result = match(s, /E[0-9]*\.?[0-9]+/) ? substr(s, RSTART, RLENGTH) : "" Constructing and dereferencing the Matches collection takes a lot of time. As I wrote before, I can certainly accept that the .Replace function will work faster. However, with regard to your regex, testing it in another program, against the OP's test string, revealed that it took significantly more steps to match than if you omit the lazy qualifier. However, upon closer examination, I discovered that the number of steps was more dependent on the location of the matched substring than on the use of a lazy vs greedy ".". Oh well, live and learn. Now being curious, I ran some procedure timings. I did a run of 100 calls and measured the elapsed time for four different routines. I repeated this several times and averaged the results. I then changed the position of the Ennn substring from near the end to near the beginning of the string and did more runs. The string was in A1; and the four function calls were in B1:E1 The four routines we foo: Your original with the lazy ".?" foobar: Yours with the greedy "." NumAfte Mine with the match collection DoItInVBA: A VBA routine that I cobbled together (see below) If my method of obtaining timing is accurate, the differences in the timing of the various regex approaches seems trivial. The VBA routine is MUCH faster. On my machine, the hi resolution counter frequency is 266,680 and these were the results with the four different routines: Ennn near Start End foo 1.914E-02 1.980E-02 foobar 1.832E-02 1.881E-02 NumAfterE 1.851E-02 1.888E-02 DoItInVBA 1.912E-04 1.825E-04 This is unexpected as I would have expected my regex routine to take significantly longer. As a matter of fact, if I place the timer within the function, it does seem to. However, the execution is so quick that we are talking about differences of only a few counts of the timer. One of these days, I'll do a bit more extensive testing. ================================================== ==== Option Explicit Sub CheckTiming() Dim oTime As New CHiResTimer Const s As String = "1http://www.glassdoor.com/Jobs/First-Data-Account-Executive-Jobs-EI_IE1783.0,10_KO11,28.htm" Const i As Long = 100 Dim j As Long Dim res oTime.StartTimer For j = 1 To i res = foo(s) Next j oTime.StopTimer Debug.Print "foo took " & Format(oTime.Elapsed, "#.0000000") & " seconds." oTime.StartTimer For j = 1 To i res = foobar(s) Next j oTime.StopTimer Debug.Print "foobar took " & Format(oTime.Elapsed, "#.0000000") & " seconds." oTime.StartTimer For j = 1 To i res = NumAfterE(s) Next j oTime.StopTimer Debug.Print "NumAfterE took " & Format(oTime.Elapsed, "#.0000000") & " seconds." oTime.StartTimer For j = 1 To i res = DoItInVBA(s) Next j oTime.StopTimer Debug.Print "DoItInVBA took " & Format(oTime.Elapsed, "#.0000000") & " seconds." End Sub '---------------------------------------- Option Explicit Function foo(s As String) As Variant Dim re As Object Set re = CreateObject("VBScript.Regexp") re.Pattern = ".*?E([-+]?\d*\.?\d+).*" If re.test(s) Then foo = CDbl(re.Replace(s, "$1")) _ Else foo = CVErr(xlErrNum) Set re = Nothing 'paranoia End Function '------------------------------------------- Function foobar(s As String) As Variant Dim re As Object Set re = CreateObject("VBScript.Regexp") re.Pattern = ".*E([-+]?\d*\.?\d+).*" If re.test(s) Then foobar = CDbl(re.Replace(s, "$1")) _ Else foobar = CVErr(xlErrNum) Set re = Nothing 'paranoia End Function '---------------------------------------- Function NumAfterE(s As String) As Variant Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Pattern = "E(\d+(\.\d+)?)" If re.test(s) = True Then Set mc = re.Execute(s) NumAfterE = CDbl(mc(0).submatches(0)) Else: NumAfterE = CVErr(xlErrNum) End If Set re = Nothing End Function '------------------------------------------- Function DoItInVBA(s As String) As Variant Dim lStart As Long lStart = 1 Do lStart = InStr(lStart, s, "E") + 1 Loop Until Mid(s, lStart, 1) Like "#" DoItInVBA = Val(Mid(s, lStart)) End Function '------------------------------ The following is in a Class Module: ================================== Option Explicit 'How many times per second is the counter updated? Private Declare Function QueryFrequency Lib "kernel32" _ Alias "QueryPerformanceFrequency" ( _ lpFrequency As Currency) As Long 'What is the counter's value Private Declare Function QueryCounter Lib "kernel32" _ Alias "QueryPerformanceCounter" ( _ lpPerformanceCount As Currency) As Long 'Variables to store the counter information Dim cFrequency As Currency Dim cOverhead As Currency Dim cStarted As Currency Dim cStopped As Currency Private Sub Class_Initialize() Dim cCount1 As Currency, cCount2 As Currency 'Get the counter frequency QueryFrequency cFrequency 'Call the hi-res counter twice, to check how long it takes QueryCounter cCount1 QueryCounter cCount2 'Store the call overhead cOverhead = cCount2 - cCount1 End Sub Public Sub StartTimer() 'Get the time that we started QueryCounter cStarted End Sub Public Sub StopTimer() 'Get the time that we stopped QueryCounter cStopped End Sub Public Property Get Elapsed() As Double Dim cTimer As Currency 'Have we stopped or not? If cStopped = 0 Then QueryCounter cTimer Else cTimer = cStopped End If 'If we have a frequency, return the duration, in seconds If cFrequency 0 Then Elapsed = (cTimer - cStarted - cOverhead) / cFrequency End If End Property ================================= --ron |
Extract number from cell when specific format is found
On Sat, 01 Aug 2009 22:39:32 -0400, Ron Rosenfeld
wrote: On Sat, 1 Aug 2009 16:18:11 -0700 (PDT), Harlan Grove wrote: As I wrote before, I can certainly accept that the .Replace function will work faster. However, with regard to your regex, testing it in another program, against the OP's test string, revealed that it took significantly more steps to match than if you omit the lazy qualifier. However, upon closer examination, I discovered that the number of steps was more dependent on the location of the matched substring than on the use of a lazy vs greedy ".". Oh well, live and learn. Now being curious, I ran some procedure timings. I did a run of 100 calls and measured the elapsed time for four different routines. I repeated this several times and averaged the results. I then changed the position of the Ennn substring from near the end to near the beginning of the string and did more runs. The string was in A1; and the four function calls were in B1:E1 Ooops. I hit Send too soon. I first tried the timing with the string and function calls as noted above. But the timings were too quick -- only a few counts of the high resolution timer. So I changed to calling the functions from a Sub, as you see in the routine I posted previously. Sorry about the confusion. --ron |
All times are GMT +1. The time now is 09:37 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com