ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)
-   -   String parsing with variable lenght strings (https://www.excelbanter.com/excel-worksheet-functions/80139-string-parsing-variable-lenght-strings.html)

frosterrj

String parsing with variable lenght strings
 
I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert

Ron Coderre

String parsing with variable lenght strings
 
Perhaps something like this:

For a value in A1

The numeric (left) part:
B1:
=--LEFT(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0)-1)

The unit of measure (right) part:
C1:
=TRIM(MID(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0),255))

Note: Commit those array formulas by holding down the [Ctrl][Shift] keys
when you press [Enter].

Copy those formulas down as far as you need.

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro


"frosterrj" wrote:

I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert


az94

String parsing with variable lenght strings
 
It's a bit ugly, but it seems to work. Assuming your values start in
A1 and go down, the following formula in cell B1 will break out the
number.

=IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF( ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISER R(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VA LUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check
num length"))))

This will test the first two characters, then the first 3, then the
first 4 etc. to see if it is a number, and when it hits a length that
includes an alpha, it will drop down a character and return the value.
I stopped at 4 digits, but obviously you can extend it as much as you
need.

Then you can use the following formula in cell C1 to give you the text.

=RIGHT(A1,LEN(A1)-LEN(B1))

Note: The second formula will include a space if there is one. You
may want to remove all spaces before you begin to avoid that.

Hope that helps...


Harlan Grove

String parsing with variable lenght strings
 
frosterrj wrote...
Actually, I am having the most trouble because the numbers can contain
decimals:
7.63 lb, 66.5 oz etc.
Seems to be stopping at the '.' .


What is? Rather, which formulas are? The one I proposed below allow for
decimal fractions.

"Harlan Grove" wrote:
frosterrj wrote...

....
For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

.....

And with the following modified string in cell A1

5.5#, 10Tins, 4 lb, 100.23Pcs

and the following defined name

If so, then define a name like seq referring to

=ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,25 6))

....

And the following intermediate formula

A2:
=SUBSTITUTE(A1," ","")


The following formulas

A3:
=--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq))

B3:
=MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),F IND(",",$A2&",",
SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3))
-COUNT($A3:A3))

C3:
=--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1),
LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3))
+COUNT($A3:B3,1),seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc.

....

produce the following results (A3:H3).

5.5 # 10 Tins 4 lb 100.23 Pcs

And the following formulas

A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.


produce the same,

5.5 # 10 Tins 4 lb 100.23 Pcs

Doesn't appear to stop at decimal fractions. Or are you referring to
some of the other responses that took your one sample record as
exhaustively indicative of your data?


Ron Rosenfeld

String parsing with variable lenght strings
 
On 28 Mar 2006 15:04:06 -0800, "Harlan Grove" wrote:

Ron Rosenfeld wrote...
...
Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")


It works, but \d+ is more compact.


I phrased it this way for clarity, especially when comparing with the second
formula.


(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")


It picks up the commas as well. Those could be excluded with a positive
lookahead assertion and a comma appended to the string.


For some reason, I thought each combination of value/units was in a separate
cell. If so, commas would not be an issue.
--ron

Ron Rosenfeld

String parsing with variable lenght strings
 
On Tue, 28 Mar 2006 17:47:10 -0500, Ron Rosenfeld
wrote:

On Tue, 28 Mar 2006 12:04:02 -0800, frosterrj
wrote:

I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the
content of the character part.

For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

I have found this formula:
=LEFT(A1,FIND("-",A1,1)-1)

but it assumes some level of consistency, the "-" in the cell.

Any ideas appreciated.

Robert


Ron C.'s formulas should work.

For a different approach, and to gain access to a number of other useful
functions, you could download and install Longre's free morefunc.xll add-in
from http://xcell05.free.fr/

Then use the regular expression formulas:

For the number:
=REGEX.MID(A1,"[0-9]+")

(means return the first series of characters that are in the class [0-9].

For the units:
=REGEX.MID(A1,"[^0-9 ]+")

(means return the first series of characters that are not in the class [0-9 ]
-- note that there is a <space included in the expression, so that any space
in between the number and the units will not be returned).


--ron


Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


--ron

Harlan Grove

String parsing with variable lenght strings
 
Ron Rosenfeld wrote...
....
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($ A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.


Ron Rosenfeld

String parsing with variable lenght strings
 
On 28 Mar 2006 22:40:41 -0800, "Harlan Grove" wrote:

Ron Rosenfeld wrote...
...
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3 )/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS( $A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3 )/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.


Yes, the "best" solution often depends critically on the nature of the data
being analyzed. You've got a lot more experience than I on generalizing from
incomplete data specifications. That's very useful in these NG's.


--ron

frosterrj

String parsing with variable lenght strings
 
I was using this suggestion from above:
=IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF( ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISER R(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VA LUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check
num length"))))

I think I misunderstood what you were telling me below (too late in the day
for really complex formulas!).

Robert

"Harlan Grove" wrote:

frosterrj wrote...
Actually, I am having the most trouble because the numbers can contain
decimals:
7.63 lb, 66.5 oz etc.
Seems to be stopping at the '.' .


What is? Rather, which formulas are? The one I proposed below allow for
decimal fractions.

"Harlan Grove" wrote:
frosterrj wrote...

....
For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc.
I would like to be able to pull out the #, Tins, lb, and Pcs in to their
own column.

.....

And with the following modified string in cell A1

5.5#, 10Tins, 4 lb, 100.23Pcs

and the following defined name

If so, then define a name like seq referring to

=ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,25 6))

....

And the following intermediate formula

A2:
=SUBSTITUTE(A1," ","")


The following formulas

A3:
=--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq))

B3:
=MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),F IND(",",$A2&",",
SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3))
-COUNT($A3:A3))

C3:
=--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1),
LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3))
+COUNT($A3:B3,1),seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc.

....

produce the following results (A3:H3).

5.5 # 10 Tins 4 lb 100.23 Pcs

And the following formulas

A3:
=--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2)

B3:
=REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)&
"}\d*\.?\d+ *([^,]+).*","[2]")

Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc.


produce the same,

5.5 # 10 Tins 4 lb 100.23 Pcs

Doesn't appear to stop at decimal fractions. Or are you referring to
some of the other responses that took your one sample record as
exhaustively indicative of your data?



frosterrj

String parsing with variable lenght strings
 
Nope, my data is just one type per cell:
9.5 OZ
30#
#10 TIN
1 GAL
38#
the basic regex.mid works.

Thanks for the pointers to the morefunc.xls. Got it an am using now.

Robert

"Harlan Grove" wrote:

Ron Rosenfeld wrote...
....
Having read that some of your data may optionally include decimal numbers, I
would change my previous recommendation to:

Number:

=REGEX.MID(A1,"[0-9]+(\.[0-9]+)?")

This assumes that all decimal digits have a number before the decimal point.
If that is not the case, it can be easily modified.

Units:

=REGEX.MID(A1,"[^0-9. ]+")

This assumes there are not "dots" in the units, as in your examples. Again, if
there are, the expression can be modified to include them.


It gets more difficult if the numbers could include 0. and .0 and if
the units could include periods and numerals, e.g., 20 reams A4, 5
boxes .22 shells.

With just built-in formulas referencing the original record in A1 plus
the name seq as I defined it earlier,

A3:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq))

B3:
=MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2,
FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($ A3:B3)/2)
-SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2)

C3:
=MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2,
LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1
+COLUMNS($A3:B3)/2,seq)),seq))

Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3,
etc. Note that the number portions are text in the formulas above.

With MOREFUNC.XLL, divide and conquer.

A3:
=--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+")

B3:
=REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2),
" *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","")

Copy A3:B3 and paste into C3:D3, E3:F3, etc.

Note: I'm assuming the OP's records are in one cell given the OP's
statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm
interpretting the 'etc.' to mean there could be more fields in the
record, not that each of these be in a separate column.



Ron Rosenfeld

String parsing with variable lenght strings
 
On Fri, 31 Mar 2006 13:01:03 -0800, frosterrj
wrote:

Nope, my data is just one type per cell:
9.5 OZ
30#
#10 TIN
1 GAL
38#
the basic regex.mid works.

Thanks for the pointers to the morefunc.xls. Got it an am using now.

Robert


Glad you've got it working.

By the way, if you need to distribute your workbook, and if you installed
morefunc using the suggested defaults, there is an option on the Tools Menu to
save morefunc with the workbook.


--ron


All times are GMT +1. The time now is 09:44 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com