View Single Post
  #5   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Harlan Grove Harlan Grove is offline
external usenet poster
 
Posts: 733
Default Extracting Info From Within A Text String

nospaminlich wrote...
....
Within an imported file I have data like this:

26/07/2006 MT.:004367 CAPITAL CITY COUNCIL 225200
Ref:200060

29/04/2006 West Country Waterworks Co Inv:7097932959 O/No:2252005207
Trans. ID.:00000505

06/11/2006 Conway Cleaning Inv:StM002L Non Order Trans. ID.:00000619

22/06/2006 E A Gangle & Partners 00000527 Crd. Nt. No.:4823a
Trans. ID.:00000524

12/06/2006 Wessex Purchasing O/No.:2252005217 Trans. ID.:00000218
where the date is in Col C and all the rest is in Col D

For the purposes of this we'll assume that the 5 rows of data are in rows 1
to 5 (if only it were that simple!)

I want to split the text in Col D into Cols E to I whe

Col E has the formula =if(Left(D2,3)="MT.",Left(D2,10),"") so it returns
MT.:004367 for the first row in this example and "" for the rest


Note: looks like your first record is in row 2 rather than row 1.

Regular expressions would be the best tool for this. If you download
and install Laurent Longre's MOREFUNC.XLL add-in, freely available at

http://xcell05.free.fr/english/

you could get the same result with the formula

=REGEX.MID(D2,"MT\.:\S+")

and now it gets complicated....

Col F has a formula which returns the supplier so in the examples above it
would bring back CAPITAL CITY COUNCIL in row 1 and then West Country
Waterworks Co, Conway Cleaning, E A Gangle & Partners and Wessex Purchasing
in subsequent rows. So the formula needs to return everything to the left of
the first occurence of either Inv or 0000 or O/No unless the string starts
with MT. where it returns text from the 12th character in the string until
there are multiple spaces


=REGEX.MID(D2,"[A-Za-z][^ \t:]*(\s[^ \t:]+)*?(?=\s(\s|[^ \t:]+:|
\d{2}).*$)")

Col G has a formula which returns Inv:and the following numbers until there
is a string of multiple spaces so rows 1,4 and 5 would be "", row 2 would
have Inv:7097932959 and row 3 would have Inv:StM002L


=REGEX.MID(D2,"Inv:\S+")

Col H has a formula which returns Crd. Nt. No.:and the following numbers
until there is a space so rows 1,2,3 and 5 would be "", row 4 would have Crd.
Nt. No.:4823a


=REGEX.MID(D2,"Crd\. Nt\. No\.:\S+")

Col I has a formula which returns "Non-Order" if "Non-Order" appears in the
string or O/No:and the following numbers until there is a space


=IF(COUNTIF(D2,"*Non Order*"),"Non-Order",REGEX.MID(D2,"O/No:\S+"))

Col J Returns Trans ID.:and the following digits

....

=REGEX.MID(D2,"Trans\. ID\.:\S+")

This leaves unspecified what to do with Ref:# and substrings of
numerals not preceded by a tag and colon, e.g., 225200 in the first
record.