View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.misc
Ron Rosenfeld Ron Rosenfeld is offline
external usenet poster
 
Posts: 5,651
Default Parsing Number from Text

On 5 Apr 2007 18:31:29 -0700, "Harlan Grove" wrote:

Ron Rosenfeld wrote...
...
This might be more foolproof:

=SUM(--REGEX.MID(rng&"GBP0","(?<=GBP)[\d\.]+"))


This fails with Ron Coderre's pathological samples,

These work....
GBP 100.00 REFNO
GBP 00100.00 REFNO
GBP100.00 REFNO

These don't....
GBP a100.00 REFNO........"a"
GBP 100.00REFNO...........no space before REFNO
GBP 10.0.0.0 REFNO...........extra decimal points

IMO, you need to use REGEX.SUBSTITUTE to do this.

REGEX.SUBSTITUTE(x,".*GBP[^-0-9.]*(-?\d*\.?\d*)[^0-9]?.*","[1]")


You're correct.

As written, the REGEX.MID (my 2nd one) would handle the issue of no space
before REFNO.

However, I did make the assumption that there was no space between GPB and the
value to be extracted; and that that value did not contain multiple decimals,
and that it was a positive value.

It would be easy to add in the option for an optional space between GBP and the
value:

=SUM(--REGEX.MID(rng&"GBP0","(?<=GBP)[\s\d\.]+"))

and one could even handle the possibility of signed numbers and multiple
decimals by using a more general notation for floating point numbers:

=SUM(--REGEX.MID(rng&"GBP0","(?<=GBP)\s*[-+]?\d*\.?\d+"))

But to handle the possibility of extraneous characters between GBP and the
number, like the "a" above, I would absolutely agree that the REGEX.SUBSTITUTE
would be better.


--ron