Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.worksheet.functions
frosterrj
 
Posts: n/a
Default String parsing with variable lenght strings

I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert
  #2   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Coderre
 
Posts: n/a
Default String parsing with variable lenght strings

Perhaps something like this:

For a value in A1

The numeric (left) part:
B1:
=--LEFT(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0)-1)

The unit of measure (right) part:
C1:
=TRIM(MID(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0),255))

Note: Commit those array formulas by holding down the [Ctrl][Shift] keys
when you press [Enter].

Copy those formulas down as far as you need.

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro


"frosterrj" wrote:

I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert

  #3   Report Post  
Posted to microsoft.public.excel.worksheet.functions
az94
 
Posts: n/a
Default String parsing with variable lenght strings

It's a bit ugly, but it seems to work. Assuming your values start in
A1 and go down, the following formula in cell B1 will break out the
number.

=IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF( ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISER R(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VA LUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check
num length"))))

This will test the first two characters, then the first 3, then the
first 4 etc. to see if it is a number, and when it hits a length that
includes an alpha, it will drop down a character and return the value.
I stopped at 4 digits, but obviously you can extend it as much as you
need.

Then you can use the following formula in cell C1 to give you the text.

=RIGHT(A1,LEN(A1)-LEN(B1))

Note: The second formula will include a space if there is one. You
may want to remove all spaces before you begin to avoid that.

Hope that helps...

  #4   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Harlan Grove
 
Posts: n/a
Default String parsing with variable lenght strings

frosterrj wrote...
Actually, I am having the most trouble because the numbers can contain
decimals:
7.63 lb, 66.5 oz etc.
Seems to be stopping at the '.' .


What is? Rather, which formulas are? The one I proposed below allow for
decimal fractions.

"Harlan Grove" wrote:
frosterrj wrote...

....
For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

.....

And with the following modified string in cell A1

5.5#, 10Tins, 4 lb, 100.23Pcs

and the following defined name

If so, then define a name like seq referring to

=ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,25 6))

....

And the following intermediate formula

A2:
=SUBSTITUTE(A1," ","")


The following formulas

A3:
=--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq))

B3:
=MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),F IND(",",$A2&",",
SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3))
-COUNT($A3:A3))

C3:
=--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1),
LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3))
+COUNT($A3:B3,1),seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc.

....

produce the following results (A3:H3).

5.5 # 10 Tins 4 lb 100.23 Pcs

And the following formulas

A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.


produce the same,

5.5 # 10 Tins 4 lb 100.23 Pcs

Doesn't appear to stop at decimal fractions. Or are you referring to
some of the other responses that took your one sample record as
exhaustively indicative of your data?

  #5   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Rosenfeld
 
Posts: n/a
Default String parsing with variable lenght strings

On 28 Mar 2006 15:04:06 -0800, "Harlan Grove" wrote:

Ron Rosenfeld wrote...
...
Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")


It works, but \d+ is more compact.


I phrased it this way for clarity, especially when comparing with the second
formula.


(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")


It picks up the commas as well. Those could be excluded with a positive
lookahead assertion and a comma appended to the string.


For some reason, I thought each combination of value/units was in a separate
cell. If so, commas would not be an issue.
--ron


  #6   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Rosenfeld
 
Posts: n/a
Default String parsing with variable lenght strings

On Tue, 28 Mar 2006 17:47:10 -0500, Ron Rosenfeld
wrote:

On Tue, 28 Mar 2006 12:04:02 -0800, frosterrj
wrote:

I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert


Ron C.'s formulas should work.

For a different approach, and to gain access to a number of other useful
functions, you could download and install Longre's free morefunc.xll add-in
from http://xcell05.free.fr/

Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")

(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")

(means return the first series of characters that are not in the class [0-9 ]
-- note that there is a <space included in the expression, so that any space
in between the number and the units will not be returned).


--ron


Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


--ron
  #7   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Harlan Grove
 
Posts: n/a
Default String parsing with variable lenght strings

Ron Rosenfeld wrote...
....
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($ A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.

  #8   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Rosenfeld
 
Posts: n/a
Default String parsing with variable lenght strings

On 28 Mar 2006 22:40:41 -0800, "Harlan Grove" wrote:

Ron Rosenfeld wrote...
...
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3 )/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS( $A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3 )/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.


Yes, the "best" solution often depends critically on the nature of the data
being analyzed. You've got a lot more experience than I on generalizing from
incomplete data specifications. That's very useful in these NG's.


--ron
  #9   Report Post  
Posted to microsoft.public.excel.worksheet.functions
frosterrj
 
Posts: n/a
Default String parsing with variable lenght strings

I was using this suggestion from above:
=IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF( ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISER R(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VA LUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check
num length"))))

I think I misunderstood what you were telling me below (too late in the day
for really complex formulas!).

Robert

"Harlan Grove" wrote:

frosterrj wrote...
Actually, I am having the most trouble because the numbers can contain
decimals:
7.63 lb, 66.5 oz etc.
Seems to be stopping at the '.' .


What is? Rather, which formulas are? The one I proposed below allow for
decimal fractions.

"Harlan Grove" wrote:
frosterrj wrote...

....
For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

.....

And with the following modified string in cell A1

5.5#, 10Tins, 4 lb, 100.23Pcs

and the following defined name

If so, then define a name like seq referring to

=ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,25 6))

....

And the following intermediate formula

A2:
=SUBSTITUTE(A1," ","")


The following formulas

A3:
=--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq))

B3:
=MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),F IND(",",$A2&",",
SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3))
-COUNT($A3:A3))

C3:
=--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1),
LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3))
+COUNT($A3:B3,1),seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc.

....

produce the following results (A3:H3).

5.5 # 10 Tins 4 lb 100.23 Pcs

And the following formulas

A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.


produce the same,

5.5 # 10 Tins 4 lb 100.23 Pcs

Doesn't appear to stop at decimal fractions. Or are you referring to
some of the other responses that took your one sample record as
exhaustively indicative of your data?


  #10   Report Post  
Posted to microsoft.public.excel.worksheet.functions
frosterrj
 
Posts: n/a
Default String parsing with variable lenght strings

Nope, my data is just one type per cell:
9.5 OZ
30#
#10 TIN
1 GAL
38#
the basic regex.mid works.

Thanks for the pointers to the morefunc.xls. Got it an am using now.

Robert

"Harlan Grove" wrote:

Ron Rosenfeld wrote...
....
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($ A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.




  #11   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Rosenfeld
 
Posts: n/a
Default String parsing with variable lenght strings

On Fri, 31 Mar 2006 13:01:03 -0800, frosterrj
wrote:

Nope, my data is just one type per cell:
9.5 OZ
30#
#10 TIN
1 GAL
38#
the basic regex.mid works.

Thanks for the pointers to the morefunc.xls. Got it an am using now.

Robert


Glad you've got it working.

By the way, if you need to distribute your workbook, and if you installed
morefunc using the suggested defaults, there is an option on the Tools Menu to
save morefunc with the workbook.


--ron
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Aling multiple sets of data by header column MarkusO Excel Discussion (Misc queries) 2 April 12th 06 07:29 PM
Variable in string DevinC Excel Discussion (Misc queries) 5 January 26th 06 08:59 PM
Colomn of strings, how to count each string with "abc" OR "def" Tommy T Excel Worksheet Functions 8 September 5th 05 04:57 PM
Converting variable text strings to numeric Richgard53 Excel Discussion (Misc queries) 1 July 13th 05 06:22 AM
Read Text File into Excel Using VBA Willie T Excel Discussion (Misc queries) 13 January 8th 05 12:37 AM


All times are GMT +1. The time now is 01:53 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"