![]() |
String parsing with variable lenght strings
I am trying to split up a cell into numbers and charachters and place them in
separate columns, but the lenght of the number part varies as does the content of the character part. For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc. I would like to be able to pull out the #, Tins, lb, and Pcs in to their own column. I have found this formula: =LEFT(A1,FIND("-",A1,1)-1) but it assumes some level of consistency, the "-" in the cell. Any ideas appreciated. Robert |
String parsing with variable lenght strings
Perhaps something like this:
For a value in A1 The numeric (left) part: B1: =--LEFT(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0)-1) The unit of measure (right) part: C1: =TRIM(MID(A1,MATCH(FALSE,ISNUMBER(--MID(A1,ROW($A$1:INDEX(A:A,LEN(A1))),1)),0),255)) Note: Commit those array formulas by holding down the [Ctrl][Shift] keys when you press [Enter]. Copy those formulas down as far as you need. Does that help? *********** Regards, Ron XL2002, WinXP-Pro "frosterrj" wrote: I am trying to split up a cell into numbers and charachters and place them in separate columns, but the lenght of the number part varies as does the content of the character part. For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc. I would like to be able to pull out the #, Tins, lb, and Pcs in to their own column. I have found this formula: =LEFT(A1,FIND("-",A1,1)-1) but it assumes some level of consistency, the "-" in the cell. Any ideas appreciated. Robert |
String parsing with variable lenght strings
It's a bit ugly, but it seems to work. Assuming your values start in
A1 and go down, the following formula in cell B1 will break out the number. =IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF( ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISER R(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VA LUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check num length")))) This will test the first two characters, then the first 3, then the first 4 etc. to see if it is a number, and when it hits a length that includes an alpha, it will drop down a character and return the value. I stopped at 4 digits, but obviously you can extend it as much as you need. Then you can use the following formula in cell C1 to give you the text. =RIGHT(A1,LEN(A1)-LEN(B1)) Note: The second formula will include a space if there is one. You may want to remove all spaces before you begin to avoid that. Hope that helps... |
String parsing with variable lenght strings
frosterrj wrote...
Actually, I am having the most trouble because the numbers can contain decimals: 7.63 lb, 66.5 oz etc. Seems to be stopping at the '.' . What is? Rather, which formulas are? The one I proposed below allow for decimal fractions. "Harlan Grove" wrote: frosterrj wrote... .... For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc. I would like to be able to pull out the #, Tins, lb, and Pcs in to their own column. ..... And with the following modified string in cell A1 5.5#, 10Tins, 4 lb, 100.23Pcs and the following defined name If so, then define a name like seq referring to =ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,25 6)) .... And the following intermediate formula A2: =SUBSTITUTE(A1," ","") The following formulas A3: =--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq)) B3: =MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),F IND(",",$A2&",", SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3)) -COUNT($A3:A3)) C3: =--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1), LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3)) +COUNT($A3:B3,1),seq)),seq)) Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3, etc. .... produce the following results (A3:H3). 5.5 # 10 Tins 4 lb 100.23 Pcs And the following formulas A3: =--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2) B3: =REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)& "}\d*\.?\d+ *([^,]+).*","[2]") Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc. produce the same, 5.5 # 10 Tins 4 lb 100.23 Pcs Doesn't appear to stop at decimal fractions. Or are you referring to some of the other responses that took your one sample record as exhaustively indicative of your data? |
String parsing with variable lenght strings
On 28 Mar 2006 15:04:06 -0800, "Harlan Grove" wrote:
Ron Rosenfeld wrote... ... Then use the regular expression formulas: For the number: =REGEX.MID(A1,"[0-9]+") It works, but \d+ is more compact. I phrased it this way for clarity, especially when comparing with the second formula. (means return the first series of characters that are in the class [0-9]. For the units: =REGEX.MID(A1,"[^0-9 ]+") It picks up the commas as well. Those could be excluded with a positive lookahead assertion and a comma appended to the string. For some reason, I thought each combination of value/units was in a separate cell. If so, commas would not be an issue. --ron |
String parsing with variable lenght strings
On Tue, 28 Mar 2006 17:47:10 -0500, Ron Rosenfeld
wrote: On Tue, 28 Mar 2006 12:04:02 -0800, frosterrj wrote: I am trying to split up a cell into numbers and charachters and place them in separate columns, but the lenght of the number part varies as does the content of the character part. For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc. I would like to be able to pull out the #, Tins, lb, and Pcs in to their own column. I have found this formula: =LEFT(A1,FIND("-",A1,1)-1) but it assumes some level of consistency, the "-" in the cell. Any ideas appreciated. Robert Ron C.'s formulas should work. For a different approach, and to gain access to a number of other useful functions, you could download and install Longre's free morefunc.xll add-in from http://xcell05.free.fr/ Then use the regular expression formulas: For the number: =REGEX.MID(A1,"[0-9]+") (means return the first series of characters that are in the class [0-9]. For the units: =REGEX.MID(A1,"[^0-9 ]+") (means return the first series of characters that are not in the class [0-9 ] -- note that there is a <space included in the expression, so that any space in between the number and the units will not be returned). --ron Having read that some of your data may optionally include decimal numbers, I would change my previous recommendation to: Number: =REGEX.MID(A1,"[0-9]+(\.[0-9]+)?") This assumes that all decimal digits have a number before the decimal point. If that is not the case, it can be easily modified. Units: =REGEX.MID(A1,"[^0-9. ]+") This assumes there are not "dots" in the units, as in your examples. Again, if there are, the expression can be modified to include them. --ron |
String parsing with variable lenght strings
Ron Rosenfeld wrote...
.... Having read that some of your data may optionally include decimal numbers, I would change my previous recommendation to: Number: =REGEX.MID(A1,"[0-9]+(\.[0-9]+)?") This assumes that all decimal digits have a number before the decimal point. If that is not the case, it can be easily modified. Units: =REGEX.MID(A1,"[^0-9. ]+") This assumes there are not "dots" in the units, as in your examples. Again, if there are, the expression can be modified to include them. It gets more difficult if the numbers could include 0. and .0 and if the units could include periods and numerals, e.g., 20 reams A4, 5 boxes .22 shells. With just built-in formulas referencing the original record in A1 plus the name seq as I defined it earlier, A3: =LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq)) B3: =MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2, FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($ A3:B3)/2) -SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2) C3: =MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2, LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1 +COLUMNS($A3:B3)/2,seq)),seq)) Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3, etc. Note that the number portions are text in the formulas above. With MOREFUNC.XLL, divide and conquer. A3: =--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+") B3: =REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2), " *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","") Copy A3:B3 and paste into C3:D3, E3:F3, etc. Note: I'm assuming the OP's records are in one cell given the OP's statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm interpretting the 'etc.' to mean there could be more fields in the record, not that each of these be in a separate column. |
String parsing with variable lenght strings
On 28 Mar 2006 22:40:41 -0800, "Harlan Grove" wrote:
Ron Rosenfeld wrote... ... Having read that some of your data may optionally include decimal numbers, I would change my previous recommendation to: Number: =REGEX.MID(A1,"[0-9]+(\.[0-9]+)?") This assumes that all decimal digits have a number before the decimal point. If that is not the case, it can be easily modified. Units: =REGEX.MID(A1,"[^0-9. ]+") This assumes there are not "dots" in the units, as in your examples. Again, if there are, the expression can be modified to include them. It gets more difficult if the numbers could include 0. and .0 and if the units could include periods and numerals, e.g., 20 reams A4, 5 boxes .22 shells. With just built-in formulas referencing the original record in A1 plus the name seq as I defined it earlier, A3: =LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq)) B3: =MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3 )/2, FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS( $A3:B3)/2) -SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2) C3: =MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3 )/2, LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1 +COLUMNS($A3:B3)/2,seq)),seq)) Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3, etc. Note that the number portions are text in the formulas above. With MOREFUNC.XLL, divide and conquer. A3: =--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+") B3: =REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2), " *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","") Copy A3:B3 and paste into C3:D3, E3:F3, etc. Note: I'm assuming the OP's records are in one cell given the OP's statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm interpretting the 'etc.' to mean there could be more fields in the record, not that each of these be in a separate column. Yes, the "best" solution often depends critically on the nature of the data being analyzed. You've got a lot more experience than I on generalizing from incomplete data specifications. That's very useful in these NG's. --ron |
String parsing with variable lenght strings
I was using this suggestion from above:
=IF(ISERR(VALUE(LEFT(A1,2))),VALUE(LEFT(A1,1)),IF( ISERR(VALUE(LEFT(A1,3))),VALUE(LEFT(A1,2)),IF(ISER R(VALUE(LEFT(A1,4))),VALUE(LEFT(A1,3)),IF(ISERR(VA LUE(LEFT(A1,5))),VALUE(LEFT(A1,4)),"Check num length")))) I think I misunderstood what you were telling me below (too late in the day for really complex formulas!). Robert "Harlan Grove" wrote: frosterrj wrote... Actually, I am having the most trouble because the numbers can contain decimals: 7.63 lb, 66.5 oz etc. Seems to be stopping at the '.' . What is? Rather, which formulas are? The one I proposed below allow for decimal fractions. "Harlan Grove" wrote: frosterrj wrote... .... For example, one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc. I would like to be able to pull out the #, Tins, lb, and Pcs in to their own column. ..... And with the following modified string in cell A1 5.5#, 10Tins, 4 lb, 100.23Pcs and the following defined name If so, then define a name like seq referring to =ROW(INDEX(Sheet1!$A:$A,1):INDEX(Sheet1!$A:$A,25 6)) .... And the following intermediate formula A2: =SUBSTITUTE(A1," ","") The following formulas A3: =--LEFT(TRIM(A2),LOOKUP(2,1/ISNUMBER(-MID(A2,1,seq)),seq)) B3: =MID($A2,SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3),F IND(",",$A2&",", SUMPRODUCT(LEN($A3:A3))+COUNT($A3:A3))-SUMPRODUCT(LEN($A3:A3)) -COUNT($A3:A3)) C3: =--MID($A2,SUMPRODUCT(LEN($A3:B3))+COUNT($A3:B3,1), LOOKUP(2,1/ISNUMBER(-MID($A2,SUMPRODUCT(LEN($A3:B3)) +COUNT($A3:B3,1),seq)),seq)) Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3, etc. .... produce the following results (A3:H3). 5.5 # 10 Tins 4 lb 100.23 Pcs And the following formulas A3: =--REGEX.MID($A1,"\d*\.?\d+",(COLUMNS($A3:A3)+1)/2) B3: =REGEX.SUBSTITUTE($A1,"([^,]+, *){"&(COLUMNS($A4:B4)/2-1)& "}\d*\.?\d+ *([^,]+).*","[2]") Copy A3:B3 and paste into C3:D3, E3:F3, G3:H3, etc. produce the same, 5.5 # 10 Tins 4 lb 100.23 Pcs Doesn't appear to stop at decimal fractions. Or are you referring to some of the other responses that took your one sample record as exhaustively indicative of your data? |
String parsing with variable lenght strings
Nope, my data is just one type per cell:
9.5 OZ 30# #10 TIN 1 GAL 38# the basic regex.mid works. Thanks for the pointers to the morefunc.xls. Got it an am using now. Robert "Harlan Grove" wrote: Ron Rosenfeld wrote... .... Having read that some of your data may optionally include decimal numbers, I would change my previous recommendation to: Number: =REGEX.MID(A1,"[0-9]+(\.[0-9]+)?") This assumes that all decimal digits have a number before the decimal point. If that is not the case, it can be easily modified. Units: =REGEX.MID(A1,"[^0-9. ]+") This assumes there are not "dots" in the units, as in your examples. Again, if there are, the expression can be modified to include them. It gets more difficult if the numbers could include 0. and .0 and if the units could include periods and numerals, e.g., 20 reams A4, 5 boxes .22 shells. With just built-in formulas referencing the original record in A1 plus the name seq as I defined it earlier, A3: =LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq)) B3: =MID($A1,SUMPRODUCT(LEN($A3:A3))+COLUMNS($A3:B3)/2, FIND(",",$A1&",",SUMPRODUCT(LEN($A3:A3))+COLUMNS($ A3:B3)/2) -SUMPRODUCT(LEN($A3:A3))-COLUMNS($A3:B3)/2) C3: =MID($A1,SUMPRODUCT(LEN($A3:B3))+1+COLUMNS($A3:B3)/2, LOOKUP(2,1/ISNUMBER(-MID($A1,SUMPRODUCT(LEN($A3:B3))+1 +COLUMNS($A3:B3)/2,seq)),seq)) Copy B3 and paste into D3, then copy C3:D3 and paste into E3:F3, G3:H3, etc. Note that the number portions are text in the formulas above. With MOREFUNC.XLL, divide and conquer. A3: =--REGEX.MID(REGEX.MID($A1,"[^,]+",(COLUMNS($A3:A3)+1)/2),"\d+(\.\d*)?|\.\d+") B3: =REGEX.SUBSTITUTE(REGEX.MID($A1,"[^,]+",COLUMNS($A3:B3)/2), " *"&REPLACE(A3,2,0,IF(LEFT(A3,1)="0","?",""))&" *","") Copy A3:B3 and paste into C3:D3, E3:F3, etc. Note: I'm assuming the OP's records are in one cell given the OP's statement: "one cell could be 5#, 10Tins, 4 lb, 100Pcs, etc." I'm interpretting the 'etc.' to mean there could be more fields in the record, not that each of these be in a separate column. |
String parsing with variable lenght strings
On Fri, 31 Mar 2006 13:01:03 -0800, frosterrj
wrote: Nope, my data is just one type per cell: 9.5 OZ 30# #10 TIN 1 GAL 38# the basic regex.mid works. Thanks for the pointers to the morefunc.xls. Got it an am using now. Robert Glad you've got it working. By the way, if you need to distribute your workbook, and if you installed morefunc using the suggested defaults, there is an option on the Tools Menu to save morefunc with the workbook. --ron |
All times are GMT +1. The time now is 09:44 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com