Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Richard
 
Posts: n/a
Default String splitting for inconsistent strings

I receive a series of daily share prices in an html email which I save
as a file and then have a macro load this file into a data sheet.

Amongst all the html stuff there are recognisable share price strings
like this in Column A

ARM Holdings 116.00p -0.03p (-2.11)
Amino Technologies 98.50p 0.00p (-0.00)
BAA 872.50p -0.01p (-0.06)
McCarthey and Stone 766.00p 0.04p (0.52)


I'd like to slice these up so that in columns B:C I end up with the
following

B1: Arm Holdings, C1: 116.00
B2: Amino Technologies, C2: 98.50
B3: BAA, C3: 872.50
B4: McCarthy and Stone, C4: 766.00

Unfortunately some names are two or more words long, and others a
single word, which makes identifying the split between the name and the
price difficult.

Until recently, Digital Look.com who send the emails had a slightly
different layout where there was a unique identifier company ID code
surrounded with ( ) which made it reasonably easy to identify the split
between the name and the price. However their layout has now changed to
the above, and I'm strruggling to find an elegant solution to slicing
the string.

I'd be grateful for any help or suggestions

Usual TIA,

Richard Buttrey

  #2   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Rosenfeld
 
Posts: n/a
Default String splitting for inconsistent strings

On 20 Jun 2006 09:12:09 -0700, "Richard" wrote:

I receive a series of daily share prices in an html email which I save
as a file and then have a macro load this file into a data sheet.

Amongst all the html stuff there are recognisable share price strings
like this in Column A

ARM Holdings 116.00p -0.03p (-2.11)
Amino Technologies 98.50p 0.00p (-0.00)
BAA 872.50p -0.01p (-0.06)
McCarthey and Stone 766.00p 0.04p (0.52)


I'd like to slice these up so that in columns B:C I end up with the
following

B1: Arm Holdings, C1: 116.00
B2: Amino Technologies, C2: 98.50
B3: BAA, C3: 872.50
B4: McCarthy and Stone, C4: 766.00

Unfortunately some names are two or more words long, and others a
single word, which makes identifying the split between the name and the
price difficult.

Until recently, Digital Look.com who send the emails had a slightly
different layout where there was a unique identifier company ID code
surrounded with ( ) which made it reasonably easy to identify the split
between the name and the price. However their layout has now changed to
the above, and I'm strruggling to find an elegant solution to slicing
the string.

I'd be grateful for any help or suggestions

Usual TIA,

Richard Buttrey


It seems as if this can be done with "Regular Expressions". I will make use of
the fact that the value which you want to pull out is the first floating point
number that ends with a "p " (thats "p" followed by <space). If there are
any issues where the stock name includes a string of the form <numberp<space
you might want to change the formula to take that into account.

Download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr

To extract the company name:

=REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s))")

To extract the price:

=REGEX.MID(A1,"\d+(\.\d*)?(?=p\s)")

To be a little more bullet-proof one can require that the price number be
identified as the first positive floating point number followed by the sequence
of p<space then optionally a "-" and another digit. So the formula would
then be:

Company Name:
=REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s-?\d))")

Price:
=REGEX.MID(A1,"\d+(\.\d*)?(?=p\s-?\d)")


--ron
  #3   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Richard Buttrey
 
Posts: n/a
Default String splitting for inconsistent strings

On Tue, 20 Jun 2006 13:29:40 -0400, Ron Rosenfeld
wrote:


It seems as if this can be done with "Regular Expressions". I will make use of
the fact that the value which you want to pull out is the first floating point
number that ends with a "p " (thats "p" followed by <space). If there are
any issues where the stock name includes a string of the form <numberp<space
you might want to change the formula to take that into account.

Download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr

To extract the company name:

=REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s))" )

To extract the price:

=REGEX.MID(A1,"\d+(\.\d*)?(?=p\s)")

To be a little more bullet-proof one can require that the price number be
identified as the first positive floating point number followed by the sequence
of p<space then optionally a "-" and another digit. So the formula would
then be:

Company Name:
=REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s-?\d))")

Price:
=REGEX.MID(A1,"\d+(\.\d*)?(?=p\s-?\d)")


--ron


Ron,

Many thanks for the pointer to this Add In which I've just downloaded.
The REGEX function looks very interesting - I just need to get my mind
around the syntax and terminology.

One supplementary if I may. I've just used your example on my data and
it works a treat.
However there's one other small variation I've just noticed, and
that's where the integer part of the share price is four characters
with a comma separating the thousands.

e.g. the following string

GlaxoSmithKline 1,480.00p 0.16p (1.09)

returns nothing for the name in B1, and in C1 480.00, not 1480.00

Can you suggest a modification that will overcome this please - it'll
probably take me a couple of weeks to understand the syntax!

Kind regards,

Richard Buttrey



__
Richard Buttrey
Grappenhall, Cheshire, UK
__________________________
  #4   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Rosenfeld
 
Posts: n/a
Default String splitting for inconsistent strings

On Thu, 22 Jun 2006 00:04:37 +0100, Richard Buttrey
wrote:

On Tue, 20 Jun 2006 13:29:40 -0400, Ron Rosenfeld
wrote:


It seems as if this can be done with "Regular Expressions". I will make use of
the fact that the value which you want to pull out is the first floating point
number that ends with a "p " (thats "p" followed by <space). If there are
any issues where the stock name includes a string of the form <numberp<space
you might want to change the formula to take that into account.

Download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr

To extract the company name:

=REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s)) ")

To extract the price:

=REGEX.MID(A1,"\d+(\.\d*)?(?=p\s)")

To be a little more bullet-proof one can require that the price number be
identified as the first positive floating point number followed by the sequence
of p<space then optionally a "-" and another digit. So the formula would
then be:

Company Name:
=REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s-?\d))")

Price:
=REGEX.MID(A1,"\d+(\.\d*)?(?=p\s-?\d)")


--ron


Ron,

Many thanks for the pointer to this Add In which I've just downloaded.
The REGEX function looks very interesting - I just need to get my mind
around the syntax and terminology.

One supplementary if I may. I've just used your example on my data and
it works a treat.
However there's one other small variation I've just noticed, and
that's where the integer part of the share price is four characters
with a comma separating the thousands.

e.g. the following string

GlaxoSmithKline 1,480.00p 0.16p (1.09)

returns nothing for the name in B1, and in C1 480.00, not 1480.00

Can you suggest a modification that will overcome this please - it'll
probably take me a couple of weeks to understand the syntax!

Kind regards,

Richard Buttrey



__
Richard Buttrey
Grappenhall, Cheshire, UK
__________________________


What you need to do, I believe, is optionally look for zero or more sequences,
within the number, of a comma followed by three digits. The syntax for that
would be: (,\d{3})*



I think this modification will do that:

\d+(,\d{3})*(\.\d*)(?=p\s)

So the modified "more bulletproof" formulas would be:

Company Name:

=REGEX.MID(A1,"(\w+\s?)+(?=\s\d+(,\d{3})*(\.\d*)?( ?=p\s-?\d))")

Price:

=REGEX.MID(A1,"\d+(,\d{3})*(\.\d*)?(?=p\s-?\d)")

If you look at the Company Name formula, you will note I made a small change as
the previous formula would include a trailing <space after each name. This
one does not.

The simpler formulas would be

Name:

=REGEX.MID(A1,"(\w+\s?)+(?=\s\d+(,\d{3})*(\.\d*)?( ?=p\s))")

Price:

=REGEX.MID(A1,"\d+(,\d{3})*(\.\d*)(?=p\s)")

=========================================

http://www.regular-expressions.info/reference.html

is a place on the net to start learning a bit about syntax.

I would also recommend the book "Mastering Regular Expressions" by Jeffrey
Friedl. I got mine for just a few dollars, used, at amazon.com

Finally, be aware that a limitation of Longre's functions is that they cannot
process strings longer than 255 characters. I believe this limitation is due
to the data types used in the XLL, and also applies to the other functions in
the add-in.

If you need to process longer strings, it can be done in VBA by setting a
reference to Microsoft VBScript Regular Expressions 5.5 and writing some code.
I've got a little of that done, but nothing as flexible as what Longre
provides. And there's also some syntax that won't work in the VBScript
implementation, so I rarely bother with it.

Have fun
--ron
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Splitting a text string into string and number mcambrose Excel Discussion (Misc queries) 4 February 21st 06 03:47 PM
Splitting text strings along commas. Andrew Excel Discussion (Misc queries) 3 January 10th 06 04:01 PM
Colomn of strings, how to count each string with "abc" OR "def" Tommy T Excel Worksheet Functions 8 September 5th 05 04:57 PM
splitting number strings in excel Bob Excel Worksheet Functions 1 February 3rd 05 03:33 AM
Read Text File into Excel Using VBA Willie T Excel Discussion (Misc queries) 13 January 8th 05 12:37 AM


All times are GMT +1. The time now is 11:41 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"