ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)
-   -   Extract number from cell when specific format is found (https://www.excelbanter.com/excel-worksheet-functions/238623-extract-number-cell-when-specific-format-found.html)

paul c

Extract number from cell when specific format is found
 
Is there a formula that will extract only the numbers that immediately follow
"E" in a string?

For example, the formula would get "1787" from this string:
http://www.glassdoor.com/Jobs/First-...10_KO11,28.htm



Jacob Skaria

Extract number from cell when specific format is found
 
If the number is always after "_IE" then the below formula would work

=LEFT(MID(A1,FIND("_IE",A1)+3,99),FIND(".",MID(A1, FIND("_IE",A1)+3,99))-1)


If the number is the first numeric in the string the below should work

=LEFT(MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&" 0123456789")),LEN(A1)),FIND(".",MID(A1,MIN(SEARCH( {0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),LEN(A1)))-1)

If this post helps click Yes
---------------
Jacob Skaria


"paul c" wrote:

Is there a formula that will extract only the numbers that immediately follow
"E" in a string?

For example, the formula would get "1787" from this string:
http://www.glassdoor.com/Jobs/First-...10_KO11,28.htm



Ron Rosenfeld

Extract number from cell when specific format is found
 
On Fri, 31 Jul 2009 11:22:01 -0700, paul c <paul
wrote:

Is there a formula that will extract only the numbers that immediately follow
"E" in a string?

For example, the formula would get "1787" from this string:
http://www.glassdoor.com/Jobs/First-...10_KO11,28.htm


Do you want only the digits? Or could the number include the decimal -- e.g.
should it be 1787.0 in this instance?

In either event, this can be done easily with a User Defined Function. The one
below is annotated so you can set it up for either option. And it will return
a #NUM! error if there are no digits after any "E".


To enter this User Defined Function (UDF), <alt-F11 opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=NumAfterE(cell_ref

in some cell.

==================================
Option Explicit
Function NumAfterE(s As String) As Variant
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "E(\d+(\.\d+)?)"
'for only digits and no decimals, change to
're.Pattern = "E(\d+)"
If re.test(s) = True Then
Set mc = re.Execute(s)
NumAfterE = CDbl(mc(0).submatches(0))
Else: NumAfterE = CVErr(xlErrNum)
End If
End Function
==================================
--ron

Harlan Grove[_2_]

Extract number from cell when specific format is found
 
Ron Rosenfeld wrote...
paul c <paul wrote...
Is there a formula that will extract only the numbers that immediately follow
"E" in a string?

....
In either event, this can be done easily with a User Defined Function. . .

Harlan Grove[_2_]

Extract number from cell when specific format is found
 
Ron Rosenfeld wrote...
Harlan Grove wrote:

....
Back to the point, it's more efficient to replace the unwanted
contents with nothing than to use the match collection.


What do you mean by "efficient"? Easier to code? Faster execution? Something
else?

....

Regex engine does less, so entire regex use takes less time and uses
less memory. Regex .Replace method simply returns a string, but
it's .Execute method create a Matches collection object.

When is the "set re = Nothing" required?


I said paranoia. I don't entirely trust CreateObject. It does no harm
if it's not needed.

So far as your regex is concerned, I believe the lazy qualifier following the
initial "." is only required if there is more than one Ennnn substring AND the
OP wants to return the first one. . . .


Correct. It matches only up to the first E\d... pattern.

. . . With it absent, the function will return the last Ennn; and I believe the regex
engine will come to the solution in fewer steps with much less back-tracking.


Unclear. You mean that greedy * would find the last match faster than
non-greedy * would find the first? Unlikely. Also irrelevant if the
goal is the first match.

Maybe there's more backtracking in my function (picky: unlikely
there's any true backtracking, but there's A LOT of look ahead
checking), but it seems to run faster than your function. Maybe the
Matches collection overhead is that much of a drag. Also, rewriting
the regex to avoid non-greedy quantifiers requires something along the
lines of

re.Pattern = "^([^E]*(E\D)?)*E(\d+(\.\d+)?).*$"
t = re.Replace(s, "$3")
If t < vbNullString Then hg = CDbl(t) Else hg = CVErr(xlErrNum)

which involves back-to-back-to-back 0-or-more quantifiers, so probably
even more backtracking.

FWLIW, it's a pity VBScript regexs lack a simple .Find method that
returns the position of the first match in the string as a Long, or
even better something like the awk approach

result = match(s, /E[0-9]*\.?[0-9]+/) ? substr(s, RSTART, RLENGTH) :
""

Constructing and dereferencing the Matches collection takes a lot of
time.

Ron Rosenfeld

Extract number from cell when specific format is found
 
On Sat, 1 Aug 2009 16:18:11 -0700 (PDT), Harlan Grove
wrote:

. . . With it absent, the function will return the last Ennn; and I believe the regex
engine will come to the solution in fewer steps with much less back-tracking.


Unclear. You mean that greedy * would find the last match faster than
non-greedy * would find the first? Unlikely. Also irrelevant if the
goal is the first match.

Maybe there's more backtracking in my function (picky: unlikely
there's any true backtracking, but there's A LOT of look ahead
checking), but it seems to run faster than your function. Maybe the
Matches collection overhead is that much of a drag. Also, rewriting
the regex to avoid non-greedy quantifiers requires something along the
lines of

re.Pattern = "^([^E]*(E\D)?)*E(\d+(\.\d+)?).*$"
t = re.Replace(s, "$3")
If t < vbNullString Then hg = CDbl(t) Else hg = CVErr(xlErrNum)

which involves back-to-back-to-back 0-or-more quantifiers, so probably
even more backtracking.

FWLIW, it's a pity VBScript regexs lack a simple .Find method that
returns the position of the first match in the string as a Long, or
even better something like the awk approach

result = match(s, /E[0-9]*\.?[0-9]+/) ? substr(s, RSTART, RLENGTH) :
""

Constructing and dereferencing the Matches collection takes a lot of
time.


As I wrote before, I can certainly accept that the .Replace function will work
faster.

However, with regard to your regex, testing it in another program, against the
OP's test string, revealed that it took significantly more steps to match than
if you omit the lazy qualifier. However, upon closer examination, I discovered
that the number of steps was more dependent on the location of the matched
substring than on the use of a lazy vs greedy ".". Oh well, live and learn.

Now being curious, I ran some procedure timings. I did a run of 100 calls and
measured the elapsed time for four different routines. I repeated this several
times and averaged the results.

I then changed the position of the Ennn substring from near the end to near the
beginning of the string and did more runs.

The string was in A1; and the four function calls were in B1:E1

The four routines we

foo: Your original with the lazy ".?"
foobar: Yours with the greedy "."
NumAfte Mine with the match collection
DoItInVBA: A VBA routine that I cobbled together (see below)

If my method of obtaining timing is accurate, the differences in the timing of
the various regex approaches seems trivial. The VBA routine is MUCH faster.

On my machine, the hi resolution counter frequency is 266,680 and these were
the results with the four different routines:

Ennn near
Start End
foo 1.914E-02 1.980E-02
foobar 1.832E-02 1.881E-02
NumAfterE 1.851E-02 1.888E-02
DoItInVBA 1.912E-04 1.825E-04

This is unexpected as I would have expected my regex routine to take
significantly longer. As a matter of fact, if I place the timer within the
function, it does seem to. However, the execution is so quick that we are
talking about differences of only a few counts of the timer.

One of these days, I'll do a bit more extensive testing.

================================================== ====
Option Explicit

Sub CheckTiming()
Dim oTime As New CHiResTimer
Const s As String =
"1http://www.glassdoor.com/Jobs/First-Data-Account-Executive-Jobs-EI_IE1783.0,10_KO11,28.htm"
Const i As Long = 100
Dim j As Long
Dim res

oTime.StartTimer
For j = 1 To i
res = foo(s)
Next j
oTime.StopTimer
Debug.Print "foo took " & Format(oTime.Elapsed, "#.0000000") & " seconds."

oTime.StartTimer
For j = 1 To i
res = foobar(s)
Next j
oTime.StopTimer
Debug.Print "foobar took " & Format(oTime.Elapsed, "#.0000000") & " seconds."

oTime.StartTimer
For j = 1 To i
res = NumAfterE(s)
Next j
oTime.StopTimer
Debug.Print "NumAfterE took " & Format(oTime.Elapsed, "#.0000000") & "
seconds."


oTime.StartTimer
For j = 1 To i
res = DoItInVBA(s)
Next j
oTime.StopTimer
Debug.Print "DoItInVBA took " & Format(oTime.Elapsed, "#.0000000") & "
seconds."

End Sub
'----------------------------------------
Option Explicit
Function foo(s As String) As Variant
Dim re As Object
Set re = CreateObject("VBScript.Regexp")

re.Pattern = ".*?E([-+]?\d*\.?\d+).*"
If re.test(s) Then foo = CDbl(re.Replace(s, "$1")) _
Else foo = CVErr(xlErrNum)
Set re = Nothing 'paranoia
End Function
'-------------------------------------------
Function foobar(s As String) As Variant
Dim re As Object
Set re = CreateObject("VBScript.Regexp")
re.Pattern = ".*E([-+]?\d*\.?\d+).*"
If re.test(s) Then foobar = CDbl(re.Replace(s, "$1")) _
Else foobar = CVErr(xlErrNum)
Set re = Nothing 'paranoia
End Function
'----------------------------------------
Function NumAfterE(s As String) As Variant
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "E(\d+(\.\d+)?)"
If re.test(s) = True Then
Set mc = re.Execute(s)
NumAfterE = CDbl(mc(0).submatches(0))
Else: NumAfterE = CVErr(xlErrNum)
End If
Set re = Nothing
End Function
'-------------------------------------------
Function DoItInVBA(s As String) As Variant
Dim lStart As Long
lStart = 1
Do
lStart = InStr(lStart, s, "E") + 1
Loop Until Mid(s, lStart, 1) Like "#"
DoItInVBA = Val(Mid(s, lStart))
End Function
'------------------------------

The following is in a Class Module:

==================================
Option Explicit

'How many times per second is the counter updated?
Private Declare Function QueryFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" ( _
lpFrequency As Currency) As Long

'What is the counter's value
Private Declare Function QueryCounter Lib "kernel32" _
Alias "QueryPerformanceCounter" ( _
lpPerformanceCount As Currency) As Long

'Variables to store the counter information
Dim cFrequency As Currency
Dim cOverhead As Currency
Dim cStarted As Currency
Dim cStopped As Currency


Private Sub Class_Initialize()
Dim cCount1 As Currency, cCount2 As Currency

'Get the counter frequency
QueryFrequency cFrequency

'Call the hi-res counter twice, to check how long it takes
QueryCounter cCount1
QueryCounter cCount2

'Store the call overhead
cOverhead = cCount2 - cCount1

End Sub

Public Sub StartTimer()
'Get the time that we started
QueryCounter cStarted
End Sub

Public Sub StopTimer()
'Get the time that we stopped
QueryCounter cStopped
End Sub

Public Property Get Elapsed() As Double

Dim cTimer As Currency

'Have we stopped or not?
If cStopped = 0 Then
QueryCounter cTimer
Else
cTimer = cStopped
End If

'If we have a frequency, return the duration, in seconds
If cFrequency 0 Then
Elapsed = (cTimer - cStarted - cOverhead) / cFrequency
End If

End Property
=================================
--ron

Ron Rosenfeld

Extract number from cell when specific format is found
 
On Sat, 01 Aug 2009 22:39:32 -0400, Ron Rosenfeld
wrote:

On Sat, 1 Aug 2009 16:18:11 -0700 (PDT), Harlan Grove
wrote:


As I wrote before, I can certainly accept that the .Replace function will work
faster.

However, with regard to your regex, testing it in another program, against the
OP's test string, revealed that it took significantly more steps to match than
if you omit the lazy qualifier. However, upon closer examination, I discovered
that the number of steps was more dependent on the location of the matched
substring than on the use of a lazy vs greedy ".". Oh well, live and learn.

Now being curious, I ran some procedure timings. I did a run of 100 calls and
measured the elapsed time for four different routines. I repeated this several
times and averaged the results.

I then changed the position of the Ennn substring from near the end to near the
beginning of the string and did more runs.

The string was in A1; and the four function calls were in B1:E1


Ooops. I hit Send too soon.

I first tried the timing with the string and function calls as noted above. But
the timings were too quick -- only a few counts of the high resolution timer.
So I changed to calling the functions from a Sub, as you see in the routine I
posted previously.

Sorry about the confusion.
--ron


All times are GMT +1. The time now is 09:37 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com