Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
I receive a series of daily share prices in an html email which I save
as a file and then have a macro load this file into a data sheet. Amongst all the html stuff there are recognisable share price strings like this in Column A ARM Holdings 116.00p -0.03p (-2.11) Amino Technologies 98.50p 0.00p (-0.00) BAA 872.50p -0.01p (-0.06) McCarthey and Stone 766.00p 0.04p (0.52) I'd like to slice these up so that in columns B:C I end up with the following B1: Arm Holdings, C1: 116.00 B2: Amino Technologies, C2: 98.50 B3: BAA, C3: 872.50 B4: McCarthy and Stone, C4: 766.00 Unfortunately some names are two or more words long, and others a single word, which makes identifying the split between the name and the price difficult. Until recently, Digital Look.com who send the emails had a slightly different layout where there was a unique identifier company ID code surrounded with ( ) which made it reasonably easy to identify the split between the name and the price. However their layout has now changed to the above, and I'm strruggling to find an elegant solution to slicing the string. I'd be grateful for any help or suggestions Usual TIA, Richard Buttrey |
#2
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
On 20 Jun 2006 09:12:09 -0700, "Richard" wrote:
I receive a series of daily share prices in an html email which I save as a file and then have a macro load this file into a data sheet. Amongst all the html stuff there are recognisable share price strings like this in Column A ARM Holdings 116.00p -0.03p (-2.11) Amino Technologies 98.50p 0.00p (-0.00) BAA 872.50p -0.01p (-0.06) McCarthey and Stone 766.00p 0.04p (0.52) I'd like to slice these up so that in columns B:C I end up with the following B1: Arm Holdings, C1: 116.00 B2: Amino Technologies, C2: 98.50 B3: BAA, C3: 872.50 B4: McCarthy and Stone, C4: 766.00 Unfortunately some names are two or more words long, and others a single word, which makes identifying the split between the name and the price difficult. Until recently, Digital Look.com who send the emails had a slightly different layout where there was a unique identifier company ID code surrounded with ( ) which made it reasonably easy to identify the split between the name and the price. However their layout has now changed to the above, and I'm strruggling to find an elegant solution to slicing the string. I'd be grateful for any help or suggestions Usual TIA, Richard Buttrey It seems as if this can be done with "Regular Expressions". I will make use of the fact that the value which you want to pull out is the first floating point number that ends with a "p " (thats "p" followed by <space). If there are any issues where the stock name includes a string of the form <numberp<space you might want to change the formula to take that into account. Download and install Longre's free morefunc.xll add-in from http://xcell05.free.fr To extract the company name: =REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s))") To extract the price: =REGEX.MID(A1,"\d+(\.\d*)?(?=p\s)") To be a little more bullet-proof one can require that the price number be identified as the first positive floating point number followed by the sequence of p<space then optionally a "-" and another digit. So the formula would then be: Company Name: =REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s-?\d))") Price: =REGEX.MID(A1,"\d+(\.\d*)?(?=p\s-?\d)") --ron |
#3
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
On Tue, 20 Jun 2006 13:29:40 -0400, Ron Rosenfeld
wrote: It seems as if this can be done with "Regular Expressions". I will make use of the fact that the value which you want to pull out is the first floating point number that ends with a "p " (thats "p" followed by <space). If there are any issues where the stock name includes a string of the form <numberp<space you might want to change the formula to take that into account. Download and install Longre's free morefunc.xll add-in from http://xcell05.free.fr To extract the company name: =REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s))" ) To extract the price: =REGEX.MID(A1,"\d+(\.\d*)?(?=p\s)") To be a little more bullet-proof one can require that the price number be identified as the first positive floating point number followed by the sequence of p<space then optionally a "-" and another digit. So the formula would then be: Company Name: =REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s-?\d))") Price: =REGEX.MID(A1,"\d+(\.\d*)?(?=p\s-?\d)") --ron Ron, Many thanks for the pointer to this Add In which I've just downloaded. The REGEX function looks very interesting - I just need to get my mind around the syntax and terminology. One supplementary if I may. I've just used your example on my data and it works a treat. However there's one other small variation I've just noticed, and that's where the integer part of the share price is four characters with a comma separating the thousands. e.g. the following string GlaxoSmithKline 1,480.00p 0.16p (1.09) returns nothing for the name in B1, and in C1 480.00, not 1480.00 Can you suggest a modification that will overcome this please - it'll probably take me a couple of weeks to understand the syntax! Kind regards, Richard Buttrey __ Richard Buttrey Grappenhall, Cheshire, UK __________________________ |
#4
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
On Thu, 22 Jun 2006 00:04:37 +0100, Richard Buttrey
wrote: On Tue, 20 Jun 2006 13:29:40 -0400, Ron Rosenfeld wrote: It seems as if this can be done with "Regular Expressions". I will make use of the fact that the value which you want to pull out is the first floating point number that ends with a "p " (thats "p" followed by <space). If there are any issues where the stock name includes a string of the form <numberp<space you might want to change the formula to take that into account. Download and install Longre's free morefunc.xll add-in from http://xcell05.free.fr To extract the company name: =REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s)) ") To extract the price: =REGEX.MID(A1,"\d+(\.\d*)?(?=p\s)") To be a little more bullet-proof one can require that the price number be identified as the first positive floating point number followed by the sequence of p<space then optionally a "-" and another digit. So the formula would then be: Company Name: =REGEX.MID(A1,"(\w+\s+)+(?=\d+(\.\d*)?(?=p\s-?\d))") Price: =REGEX.MID(A1,"\d+(\.\d*)?(?=p\s-?\d)") --ron Ron, Many thanks for the pointer to this Add In which I've just downloaded. The REGEX function looks very interesting - I just need to get my mind around the syntax and terminology. One supplementary if I may. I've just used your example on my data and it works a treat. However there's one other small variation I've just noticed, and that's where the integer part of the share price is four characters with a comma separating the thousands. e.g. the following string GlaxoSmithKline 1,480.00p 0.16p (1.09) returns nothing for the name in B1, and in C1 480.00, not 1480.00 Can you suggest a modification that will overcome this please - it'll probably take me a couple of weeks to understand the syntax! Kind regards, Richard Buttrey __ Richard Buttrey Grappenhall, Cheshire, UK __________________________ What you need to do, I believe, is optionally look for zero or more sequences, within the number, of a comma followed by three digits. The syntax for that would be: (,\d{3})* I think this modification will do that: \d+(,\d{3})*(\.\d*)(?=p\s) So the modified "more bulletproof" formulas would be: Company Name: =REGEX.MID(A1,"(\w+\s?)+(?=\s\d+(,\d{3})*(\.\d*)?( ?=p\s-?\d))") Price: =REGEX.MID(A1,"\d+(,\d{3})*(\.\d*)?(?=p\s-?\d)") If you look at the Company Name formula, you will note I made a small change as the previous formula would include a trailing <space after each name. This one does not. The simpler formulas would be Name: =REGEX.MID(A1,"(\w+\s?)+(?=\s\d+(,\d{3})*(\.\d*)?( ?=p\s))") Price: =REGEX.MID(A1,"\d+(,\d{3})*(\.\d*)(?=p\s)") ========================================= http://www.regular-expressions.info/reference.html is a place on the net to start learning a bit about syntax. I would also recommend the book "Mastering Regular Expressions" by Jeffrey Friedl. I got mine for just a few dollars, used, at amazon.com Finally, be aware that a limitation of Longre's functions is that they cannot process strings longer than 255 characters. I believe this limitation is due to the data types used in the XLL, and also applies to the other functions in the add-in. If you need to process longer strings, it can be done in VBA by setting a reference to Microsoft VBScript Regular Expressions 5.5 and writing some code. I've got a little of that done, but nothing as flexible as what Longre provides. And there's also some syntax that won't work in the VBScript implementation, so I rarely bother with it. Have fun --ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Splitting a text string into string and number | Excel Discussion (Misc queries) | |||
Splitting text strings along commas. | Excel Discussion (Misc queries) | |||
Colomn of strings, how to count each string with "abc" OR "def" | Excel Worksheet Functions | |||
splitting number strings in excel | Excel Worksheet Functions | |||
Read Text File into Excel Using VBA | Excel Discussion (Misc queries) |