![]() |
last number array from string
I have a address string (variable length) with ID numbers at the end and want
to be able to identify just the array at the end which could be any number in length example Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands, ,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268 as I want to use the number for a lookup, please can anyone help with a formula as there are 12,000 records. Thanks |
If ID numbers at end are always 6 characters and let's say data is in Column
A with string beginning in A1 then in B1 you could type =right(A1,6). This returns 202268 in your example. "Michael" wrote: I have a address string (variable length) with ID numbers at the end and want to be able to identify just the array at the end which could be any number in length example Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands, ,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268 as I want to use the number for a lookup, please can anyone help with a formula as there are 12,000 records. Thanks |
Assuming your records are housed in A1:A1200
A bit cumbersome but try - =MID(A1,MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:" &LEN(A1))))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN( A1))))))+1,LEN(A1)-MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1) )))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1)))))) ) Hit Ctrl+Shft+Enter -sk Michael wrote: I have a address string (variable length) with ID numbers at the end and want to be able to identify just the array at the end which could be any number in length example Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands, ,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268 as I want to use the number for a lookup, please can anyone help with a formula as there are 12,000 records. Thanks |
Absolutely fantastic, thanks very much
Thanks Michael "sk" wrote: Assuming your records are housed in A1:A1200 A bit cumbersome but try - =MID(A1,MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:" &LEN(A1))))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN( A1))))))+1,LEN(A1)-MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1) )))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1)))))) ) Hit Ctrl+Shft+Enter -sk Michael wrote: I have a address string (variable length) with ID numbers at the end and want to be able to identify just the array at the end which could be any number in length example Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands, ,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268 as I want to use the number for a lookup, please can anyone help with a formula as there are 12,000 records. Thanks |
Michael wrote...
I have a address string (variable length) with ID numbers at the end and want to be able to identify just the array at the end which could be any number in length example Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands, ,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268 as I want to use the number for a lookup, please can anyone help with a formula as there are 12,000 records. If all these records have the same number of commas in them, copy the range and paste it in another range, select the entire range of records and run Data Text to Columns, choose Delimited, then choose Comma as the delimiter, choose to skip all fields except the last one, then parse the records (click OK). If the number of commas differs between records, but you always want the field after the final comma, define a name like seq referring to' =ROW(INDIRECT("1:1024")) and if your first records were in cell A2, use the following formula in B2 to pull the final field. =MID(A2,LOOKUP(2,1/(MID(A2,seq,1)=","),seq)+1,1024) |
Harlan
That is awesome and works as described. I have been trying to work out how it works, but can't quite figure it out. Could you explain? -- Regards Roger Govier "Harlan Grove" wrote in message oups.com... Michael wrote... I have a address string (variable length) with ID numbers at the end and want to be able to identify just the array at the end which could be any number in length example Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands, ,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268 as I want to use the number for a lookup, please can anyone help with a formula as there are 12,000 records. If all these records have the same number of commas in them, copy the range and paste it in another range, select the entire range of records and run Data Text to Columns, choose Delimited, then choose Comma as the delimiter, choose to skip all fields except the last one, then parse the records (click OK). If the number of commas differs between records, but you always want the field after the final comma, define a name like seq referring to' =ROW(INDIRECT("1:1024")) and if your first records were in cell A2, use the following formula in B2 to pull the final field. =MID(A2,LOOKUP(2,1/(MID(A2,seq,1)=","),seq)+1,1024) |
Roger Govier wrote...
That is awesome and works as described. I have been trying to work out how it works, but can't quite figure it out. Could you explain? .... the field after the final comma, define a name like seq referring to' =ROW(INDIRECT("1:1024")) and if your first records were in cell A2, use the following formula in B2 to pull the final field. =MID(A2,LOOKUP(2,1/(MID(A2,seq,1)=","),seq)+1,1024) This relies on an aspect of Excel's ordered lookup/matching functionality that isn't explicitly documented but has always worked this way. Ordered lookup or matching means looking for nonexact matches in sorted ranges. The 'matching' value would be the largest value in the range searched less than or equal to the value sought, e.g., MATCH with no 3rd argument or 3rd argument positive, VLOOKUP with no 4th argument or 4th argument TRUE or 1. This sort of lookup or matching can take advantage of binary search. The first step in binary search is to bracket the value sought. If that value is larger than any value in the range searched, Excel seems to 'short circuit' the process by returning a reference to the last number entry in the range searched. In the formula above, the 2nd argument to LOOKUP is an array of 1 divided by TRUE or FALSE values. The arithmetic operation coerces the boolean values to 1 and 0, respectively. 1/1 = 1 while 1/0 = #DIV/0!, so the last number entry in the array would correspond to the last comma in the string (up to the 1024th character). |
Roger Govier wrote...
.... However, I am still puzzled as to why you are looking up "2". If I substitute "0" for "2" I get #N/A If I substitute "1" for "2" I get the string from GY1 to the end Any value from 2 onward returns the desired result. Why is it 2? It could be 1.0000000000001, but I'm lazy and prefer 2 because it's shorter. The point I seem to have failed to make is that when the value sought is *STRICTLY* *LARGER* than (**NOT** equal to or larger than) the largest value in the range (or array) searched, Excel immediately returns a reference to the last number entry in the range (array) searched. If the value sought occurs in the range or array (as 1 does multiple times in the array 1/(MID(...)=",")), then Excel could return a reference to *ANY* of those matches, usually the middle such match. I didn't want a middle match, so I ensured my value sought would be larger than any number value in the array I searched. |
Harlan, That was a brilliant exposition on the facetiously dubbed "BIGNUMBER" theory. I do have a query though. Isn't the effectiveness of the search (by the formula) critically dependent upon the data being "ordered or srorted"? In the orignial sample by Roger, the string was *Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,,GY1 1AN ,(014816 ), ,Pharmacy, , ,202268*and this is by no means ordred or sorted. If the erratic nature of LOOKUP with unsorted data is anything to go by, is there not a danger that uncritical application of the formula may (at times) lead to errorneous results? Albeilt,for some of the time we might yet get correct results, the rub is zero tolerance in striving for accuracy with no room for the hit and miss Any enlightenment will be appreciated -- davidm ------------------------------------------------------------------------ davidm's Profile: http://www.excelforum.com/member.php...o&userid=20645 View this thread: http://www.excelforum.com/showthread...hreadid=391015 |
Harlan,
Thankyou. I now understand. I can't fault the laziness <vbg -- Regards Roger Govier "Harlan Grove" wrote in message oups.com... Roger Govier wrote... ... However, I am still puzzled as to why you are looking up "2". If I substitute "0" for "2" I get #N/A If I substitute "1" for "2" I get the string from GY1 to the end Any value from 2 onward returns the desired result. Why is it 2? It could be 1.0000000000001, but I'm lazy and prefer 2 because it's shorter. The point I seem to have failed to make is that when the value sought is *STRICTLY* *LARGER* than (**NOT** equal to or larger than) the largest value in the range (or array) searched, Excel immediately returns a reference to the last number entry in the range (array) searched. If the value sought occurs in the range or array (as 1 does multiple times in the array 1/(MID(...)=",")), then Excel could return a reference to *ANY* of those matches, usually the middle such match. I didn't want a middle match, so I ensured my value sought would be larger than any number value in the array I searched. |
Consider the simpler array formula
=MID(A1,MAX(ROW(INDIRECT("1:"&LEN(A1)))*N(MID(A1,R OW(INDIRECT("1:"&LEN (A1))),1)=","))+1,1024) -- An array formula is entered with CTRL-SHIFT-ENTER rather than just ENTER. If done correctly, XL will display curly brackets { and } around the formula Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions In article , says... Absolutely fantastic, thanks very much Thanks Michael "sk" wrote: Assuming your records are housed in A1:A1200 A bit cumbersome but try - =MID(A1,MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:" &LEN(A1))))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN( A1))))))+1,LEN(A1)-MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1) )))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1)))))) ) Hit Ctrl+Shft+Enter -sk Michael wrote: I have a address string (variable length) with ID numbers at the end and want to be able to identify just the array at the end which could be any number in length example Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands, ,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268 as I want to use the number for a lookup, please can anyone help with a formula as there are 12,000 records. Thanks |
Hi Tushar
Thank you also for your explanation. If MS ever delivers on its stated claim of trustworthy computing all these 'tricks' will break. Well, we live on a farm and all the pigs here are still on the ground <VBG -- Regards Roger Govier "Tushar Mehta" wrote in message om... In article , says... Harlan That is awesome and works as described. I have been trying to work out how it works, but can't quite figure it out. Could you explain? It relies on the fact that XL doesn't validate the arguments to a bunch of functions, all of the lookup type. If MS ever delivers on its stated claim of trustworthy computing all these 'tricks' will break. In this case the garbage spewed by MS's reliance on GIGO happens to be someone else's treasure. {grin} -- Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions |
Tushar Mehta wrote...
It relies on the fact that XL doesn't validate the arguments to a bunch of functions, all of the lookup type. If MS ever delivers on its stated claim of trustworthy computing all these 'tricks' will break. Validate arguments? Excel provides no semantic validation in any function. Excel returns errors or wrong results when fed faulty arguments. That's how all spreadsheets work. That's how most functional and procedural languages work too. Does trustworthy computing extend to any derivative programs produced by Microsoft's users? Strictly with respect to lookup and match functions, FWLIW 123 works the same way as Excel. Approximate matching uses binary search. Binary search performs bracketting as it's first step. If the lookup value is larger than the last (valid) value in the lookup range, these functions short circuit and return references to the last (valid) value. Maybe this functionality will be 'fixed' at some later date, in which case I can only hope Microsoft adds reverse lookup and match functions that search from end to start. In this case the garbage spewed by MS's reliance on GIGO happens to be someone else's treasure. {grin} GIGO? Purely on practical grounds, it'd REALLY slow Excel down if it had to validate that the first column or row of lookup ranges or arrays were in sorted order. Maybe you'd like the wait, but I prefer Excel to behave as it does now. |
davidm wrote...
.... . . . . Isn't the effectiveness of the search (by the formula) critically dependent upon the data being "ordered or srorted"? Only when searching for values equal to or less than the largest value in the range or array searched. If the value sought is larger than the largest value in the range or array searched, then the initial bracketting step of the lookup or match function returns immediately with a reference to the last value in the range or array searched. That is, these functions work as follows. If lookupvalue < FirstValueInLookupRangeOrArray Then return #N/A ElseIf lookupvalue LastValueInLookupRangeOrArray Then return reference to LastValueInLookupRangeOrArray Else search between first and last entries . . . End If In the orignial sample by Roger, the string was *Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,,GY1 1AN ,(014816 ), ,Pharmacy, , ,202268*and this is by no means ordred or sorted. If the erratic nature of LOOKUP with unsorted data is anything to go by, is there not a danger that uncritical application of the formula may (at times) lead to errorneous results? Albeilt,for some of the time we might yet get correct results, the rub is zero tolerance in striving for accuracy with no room for the hit and miss Any time you pass an unsorted lookup table to {V|H|}LOOKUP with 4th argument missing, TRUE or nonzero numeric, you could get inaccurate results. The sole exception, given how Excel works now (all versions at least from Excel 97 through Excel 2003, but likely all versions back to Excel 1.0 for Macs), is when the lookup value is strictly greater than the largest value of the same type in the range or array searched. In that situation, Excel *consistently* returns a reference to that last match. I can offer no proof that this will always work because I don't have Excel's source code handy to see exactly how Excel does this. However, I'm comfortable with the empirical basis for this behavior. This has become a widely used idiom. If it didn't work, I'm very confident someone would have posted an example where it failed by now. No one has, so I'm willing to rely on this functionality myself. You're free not to do so if that'd make you more comfortable. |
"Tushar Mehta" wrote...
Consider the simpler array formula =MID(A1,MAX(ROW(INDIRECT("1:"&LEN(A1))) *N(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=","))+1, 1024) .... Gosh, why not consider doing it manually? Where to begin deconstructing this? Let's start with the inconsistency of recommending against using undocumented lookup/match functionality but recmmending using N()'s undocumented behavior when fed arrays of booleans. Next there's explicit use of ROW(INDIRECT("1:"&LEN(A1))) rather than using a defined name with a fixed upper bound, but (what the heck) using an arbitrarily large 3rd argment to the outer MID call. If you want to use only explicitly documented functionality of built-in functions only, shouldn't you use =RIGHT(A1,LEN(A1)-MAX(IF(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=",", ROW(INDIRECT("1:"&LEN(A1)))))) ? Otherwise it's your preferred undocumented behavior against someone else's undocumented behavior. |
In article ,
says... ? Otherwise it's your preferred undocumented behavior against someone else's undocumented behavior. Oh, really? HLOOKUP, VLOOKUP, LOOKUP return incorrect values in Excel http://support.microsoft.com/default...b;en-us;181201 There's a similar article about MATCH. MS has also documented a list of XL functions that accept arrays as arguments. Of course, given how incompatible MS's search algorithms and my way of thinking are I cannot find it after 30 minutes of searching. But, I believe N() is on that list. The bottom line is this. All of you who enjoy exploiting *documented* bugs in XL are welcome to do so. If -- and yes I know all about pigs not yet becoming airbore -- MS fixes those holes in its software you will have no one to blame but yourself. However, hoisting those "solutions" onto others *without* warning about what you are doing is what I object to. -- Regards, Tushar Mehta www.tushar-mehta.com Excel, PowerPoint, and VBA add-ins, tutorials Custom MS Office productivity solutions In article , says... "Tushar Mehta" wrote... Consider the simpler array formula =MID(A1,MAX(ROW(INDIRECT("1:"&LEN(A1))) *N(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=","))+1, 1024) ... Gosh, why not consider doing it manually? Where to begin deconstructing this? Let's start with the inconsistency of recommending against using undocumented lookup/match functionality but recmmending using N()'s undocumented behavior when fed arrays of booleans. Next there's explicit use of ROW(INDIRECT("1:"&LEN(A1))) rather than using a defined name with a fixed upper bound, but (what the heck) using an arbitrarily large 3rd argment to the outer MID call. If you want to use only explicitly documented functionality of built-in functions only, shouldn't you use =RIGHT(A1,LEN(A1)-MAX(IF(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=",", ROW(INDIRECT("1:"&LEN(A1)))))) ? Otherwise it's your preferred undocumented behavior against someone else's undocumented behavior. |
Tushar Mehta wrote...
In article , says... ? Otherwise it's your preferred undocumented behavior against someone els= e's undocumented behavior. Oh, really? HLOOKUP, VLOOKUP, LOOKUP return incorrect values in Excel http://support.microsoft.com/default...b;en-us;181201 Which says: "CAUSE This behavior occurs when either of the following conditions is true: =B7 The range specified for the "table_array" argument (LOOKUP) or the range specified for the "lookup_vector" argument (VLOOKUP and HLOOKUP) is not sorted in ascending order. -or- =B7 Number formatting is applied to the range that is hiding the underlying values." With respect to the first point: The semantic issue here is that the table range isn't sorted. This is a problem in all languages that provide binary search against arbitrary arrays. For binary search to work, the arrays must be sorted, usually in ascending order. If you feed binary search an unsorted array, you do have GIGO. One very slow way to fix this is to have the search procedure sort its input array. I suppose there's a case to be made for that, but it'd really slow things down, and it'd more than eliminate the benefits of binary search vs linear search. With respect to the second point: At what point do users become responsible for what they feed to functions? This particular KB article seems clearly intended for fairly naive users who wouldn't consider reading online help, to wit for VLOOKUP: "Table_array is the table of information in which data is looked up. Use a reference to a range or a range name, such as Database or List. If range_lookup is TRUE, the values in the first column of table_array must be placed in ascending order: ..., -2, -1, 0, 1, 2, ..., A-Z, FALSE, TRUE; otherwise VLOOKUP may not give the correct value. If range_lookup is FALSE, table_array does not need to be sorted. [...]" MS has also documented a list of XL functions that accept arrays as arguments. Of course, given how incompatible MS's search algorithms and my way of thinking are I cannot find it after 30 minutes of searching. But, I believe N() is on that list. Unlike you, I believe it should be sufficient to refer to online help. That is, 'documented' functionality should be what online help says. So for the N function: "N Returns a value converted to a number. Syntax N(value) Value is the value you want converted. N converts values listed in the following table. If value is or refers to N returns A number That number A date, in one of the built-in date formats available in Microsoft Excel The serial number of that date TRUE 1 FALSE 0 An error value, such as #DIV/0! The error value Anything else 0" It does mention TRUE and FALSE, but not arrays. N (and T) work VERY DIFFERENTLY when fed range vs array arguments. N({1,2,3,4}<2.5) returns {1,1,0,0}, but with C1:C4 containing {TRUE;TRUE;FALSE;FALSE}, N(C1:C4) returns 1. Self-evident to anyone who used to use 123, but likely surprising to everyone else. The bottom line is this. All of you who enjoy exploiting *documented* bugs in XL are welcome to do so. If -- and yes I know all about pigs not yet becoming airbore -- MS fixes those holes in its software you will have no one to blame but yourself. However, hoisting those "solutions" onto others *without* warning about what you are doing is what I object to. First find that url to a Microsoft source that explicitly confirms that N and T functions are *intended* to handle arrays, then I'll consider that you're not abusing the N function. In other words, no I won't just take your word for it. Second, Microsoft may change worksheet function behavior, but would it abandon compatibility with other spreadsheets entirely? Excel's lookup and match functions behave the *same* as Lotus 123 when the lookup value is larger than any value in the lookup range and the lookup or match mode is nonexact. I suppose I should overlook your ignorance of compatability issues. There may be a chance that Microsoft would deprecate this particular bit of functionality. I wonder whether I'd be surprised or not if they didn't provide prior warning. I know I wouldn't be surprised if many IT departments upgraded without checking for semantic changes like this. Anyway, if Microsoft is going to go to the uncharacteristic effort of changing core functionality, maybe they'll add some as well as break other. If Excel only supported regular expressions, as OpenOffice Calc and Gnumeric do, one could just have used a formula like the following OpenOffice Calc one to find the substring following the final comma. =3DMID(A1;SEARCH("[^,]+$";A1);256) But my money would be on no changes even 'real soon now'. |
All times are GMT +1. The time now is 03:26 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com