ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)
-   -   last number array from string (https://www.excelbanter.com/excel-worksheet-functions/37581-last-number-array-string.html)

Michael

last number array from string
 
I have a address string (variable length) with ID numbers at the end and want
to be able to identify just the array at the end which could be any number in
length example
Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,
,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268

as I want to use the number for a lookup, please can anyone help with a
formula as there are 12,000 records.

Thanks


ducktape

If ID numbers at end are always 6 characters and let's say data is in Column
A with string beginning in A1 then in B1 you could type =right(A1,6). This
returns 202268 in your example.

"Michael" wrote:

I have a address string (variable length) with ID numbers at the end and want
to be able to identify just the array at the end which could be any number in
length example
Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,
,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268

as I want to use the number for a lookup, please can anyone help with a
formula as there are 12,000 records.

Thanks


sk

Assuming your records are housed in A1:A1200

A bit cumbersome but try -

=MID(A1,MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:" &LEN(A1))))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN( A1))))))+1,LEN(A1)-MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1) )))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1)))))) )

Hit Ctrl+Shft+Enter

-sk

Michael wrote:
I have a address string (variable length) with ID numbers at the end and want
to be able to identify just the array at the end which could be any number in
length example
Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,
,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268

as I want to use the number for a lookup, please can anyone help with a
formula as there are 12,000 records.

Thanks



Michael

Absolutely fantastic, thanks very much

Thanks
Michael

"sk" wrote:

Assuming your records are housed in A1:A1200

A bit cumbersome but try -

=MID(A1,MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:" &LEN(A1))))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN( A1))))))+1,LEN(A1)-MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1) )))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1)))))) )

Hit Ctrl+Shft+Enter

-sk

Michael wrote:
I have a address string (variable length) with ID numbers at the end and want
to be able to identify just the array at the end which could be any number in
length example
Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,
,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268

as I want to use the number for a lookup, please can anyone help with a
formula as there are 12,000 records.

Thanks




Harlan Grove

Michael wrote...
I have a address string (variable length) with ID numbers at the end and want
to be able to identify just the array at the end which could be any number in
length example
Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,
,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268

as I want to use the number for a lookup, please can anyone help with a
formula as there are 12,000 records.


If all these records have the same number of commas in them, copy the
range and paste it in another range, select the entire range of records
and run Data Text to Columns, choose Delimited, then choose Comma as
the delimiter, choose to skip all fields except the last one, then
parse the records (click OK).

If the number of commas differs between records, but you always want
the field after the final comma, define a name like seq referring to'

=ROW(INDIRECT("1:1024"))

and if your first records were in cell A2, use the following formula in
B2 to pull the final field.

=MID(A2,LOOKUP(2,1/(MID(A2,seq,1)=","),seq)+1,1024)


Roger Govier

Harlan
That is awesome and works as described.
I have been trying to work out how it works, but can't quite figure it out.
Could you explain?

--
Regards
Roger Govier
"Harlan Grove" wrote in message
oups.com...
Michael wrote...
I have a address string (variable length) with ID numbers at the end and
want
to be able to identify just the array at the end which could be any number
in
length example
Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,
,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268

as I want to use the number for a lookup, please can anyone help with a
formula as there are 12,000 records.


If all these records have the same number of commas in them, copy the
range and paste it in another range, select the entire range of records
and run Data Text to Columns, choose Delimited, then choose Comma as
the delimiter, choose to skip all fields except the last one, then
parse the records (click OK).

If the number of commas differs between records, but you always want
the field after the final comma, define a name like seq referring to'

=ROW(INDIRECT("1:1024"))

and if your first records were in cell A2, use the following formula in
B2 to pull the final field.

=MID(A2,LOOKUP(2,1/(MID(A2,seq,1)=","),seq)+1,1024)




Harlan Grove

Roger Govier wrote...
That is awesome and works as described.
I have been trying to work out how it works, but can't quite figure it out.
Could you explain?

....
the field after the final comma, define a name like seq referring to'

=ROW(INDIRECT("1:1024"))

and if your first records were in cell A2, use the following formula in
B2 to pull the final field.

=MID(A2,LOOKUP(2,1/(MID(A2,seq,1)=","),seq)+1,1024)


This relies on an aspect of Excel's ordered lookup/matching
functionality that isn't explicitly documented but has always worked
this way. Ordered lookup or matching means looking for nonexact matches
in sorted ranges. The 'matching' value would be the largest value in
the range searched less than or equal to the value sought, e.g., MATCH
with no 3rd argument or 3rd argument positive, VLOOKUP with no 4th
argument or 4th argument TRUE or 1.

This sort of lookup or matching can take advantage of binary search.
The first step in binary search is to bracket the value sought. If that
value is larger than any value in the range searched, Excel seems to
'short circuit' the process by returning a reference to the last number
entry in the range searched.

In the formula above, the 2nd argument to LOOKUP is an array of 1
divided by TRUE or FALSE values. The arithmetic operation coerces the
boolean values to 1 and 0, respectively. 1/1 = 1 while 1/0 = #DIV/0!,
so the last number entry in the array would correspond to the last
comma in the string (up to the 1024th character).


Harlan Grove

Roger Govier wrote...
....
However, I am still puzzled as to why you are looking up "2".
If I substitute "0" for "2" I get #N/A
If I substitute "1" for "2" I get the string from GY1 to the end
Any value from 2 onward returns the desired result.
Why is it 2?


It could be 1.0000000000001, but I'm lazy and prefer 2 because it's
shorter.

The point I seem to have failed to make is that when the value sought
is *STRICTLY* *LARGER* than (**NOT** equal to or larger than) the
largest value in the range (or array) searched, Excel immediately
returns a reference to the last number entry in the range (array)
searched. If the value sought occurs in the range or array (as 1 does
multiple times in the array 1/(MID(...)=",")), then Excel could return
a reference to *ANY* of those matches, usually the middle such match. I
didn't want a middle match, so I ensured my value sought would be
larger than any number value in the array I searched.


davidm


Harlan,

That was a brilliant exposition on the facetiously dubbed "BIGNUMBER"
theory. I do have a query though. Isn't the effectiveness of the search
(by the formula) critically dependent upon the data being "ordered or
srorted"?

In the orignial sample by Roger, the string was *Pharma-e,, ,Albert
House, South Esplanade, Guernsey, Channel Islands,,GY1 1AN ,(014816 ),
,Pharmacy, , ,202268*and this is by no means ordred or sorted. If the
erratic nature of LOOKUP with unsorted data is anything to go by, is
there not a danger that uncritical application of the formula may (at
times) lead to errorneous results? Albeilt,for some of the time we
might yet get correct results, the rub is zero tolerance in striving
for accuracy with no room for the hit and miss

Any enlightenment will be appreciated


--
davidm
------------------------------------------------------------------------
davidm's Profile: http://www.excelforum.com/member.php...o&userid=20645
View this thread: http://www.excelforum.com/showthread...hreadid=391015


Roger Govier

Harlan,
Thankyou. I now understand. I can't fault the laziness <vbg

--
Regards
Roger Govier
"Harlan Grove" wrote in message
oups.com...
Roger Govier wrote...
...
However, I am still puzzled as to why you are looking up "2".
If I substitute "0" for "2" I get #N/A
If I substitute "1" for "2" I get the string from GY1 to the end
Any value from 2 onward returns the desired result.
Why is it 2?


It could be 1.0000000000001, but I'm lazy and prefer 2 because it's
shorter.

The point I seem to have failed to make is that when the value sought
is *STRICTLY* *LARGER* than (**NOT** equal to or larger than) the
largest value in the range (or array) searched, Excel immediately
returns a reference to the last number entry in the range (array)
searched. If the value sought occurs in the range or array (as 1 does
multiple times in the array 1/(MID(...)=",")), then Excel could return
a reference to *ANY* of those matches, usually the middle such match. I
didn't want a middle match, so I ensured my value sought would be
larger than any number value in the array I searched.




Tushar Mehta

In article ,
says...
Harlan
That is awesome and works as described.
I have been trying to work out how it works, but can't quite figure it out.
Could you explain?


It relies on the fact that XL doesn't validate the arguments to a bunch
of functions, all of the lookup type. If MS ever delivers on its
stated claim of trustworthy computing all these 'tricks' will break.

In this case the garbage spewed by MS's reliance on GIGO happens to be
someone else's treasure. {grin}

--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

Tushar Mehta

Consider the simpler array formula
=MID(A1,MAX(ROW(INDIRECT("1:"&LEN(A1)))*N(MID(A1,R OW(INDIRECT("1:"&LEN
(A1))),1)=","))+1,1024)

--
An array formula is entered with CTRL-SHIFT-ENTER rather than just
ENTER. If done correctly, XL will display curly brackets { and }
around the formula

Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

In article ,
says...
Absolutely fantastic, thanks very much

Thanks
Michael

"sk" wrote:

Assuming your records are housed in A1:A1200

A bit cumbersome but try -

=MID(A1,MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:" &LEN(A1))))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN( A1))))))+1,LEN(A1)-MAX(IF(ISERR(FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1) )))),"",FIND(",",A1,ROW(INDIRECT("1:"&LEN(A1)))))) )

Hit Ctrl+Shft+Enter

-sk

Michael wrote:
I have a address string (variable length) with ID numbers at the end and want
to be able to identify just the array at the end which could be any number in
length example
Pharma-e,, ,Albert House, South Esplanade, Guernsey, Channel Islands,
,GY1 1AN ,(01481 736 736), ,Pharmacy, , ,202268

as I want to use the number for a lookup, please can anyone help with a
formula as there are 12,000 records.

Thanks





Roger Govier

Hi Tushar

Thank you also for your explanation.
If MS ever delivers on its
stated claim of trustworthy computing all these 'tricks' will break.

Well, we live on a farm and all the pigs here are still on the ground <VBG
--
Regards
Roger Govier
"Tushar Mehta" wrote in message
om...
In article ,
says...
Harlan
That is awesome and works as described.
I have been trying to work out how it works, but can't quite figure it
out.
Could you explain?


It relies on the fact that XL doesn't validate the arguments to a bunch
of functions, all of the lookup type. If MS ever delivers on its
stated claim of trustworthy computing all these 'tricks' will break.

In this case the garbage spewed by MS's reliance on GIGO happens to be
someone else's treasure. {grin}

--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions




Harlan Grove

Tushar Mehta wrote...
It relies on the fact that XL doesn't validate the arguments to a bunch
of functions, all of the lookup type. If MS ever delivers on its
stated claim of trustworthy computing all these 'tricks' will break.


Validate arguments? Excel provides no semantic validation in any
function. Excel returns errors or wrong results when fed faulty
arguments. That's how all spreadsheets work. That's how most functional
and procedural languages work too.

Does trustworthy computing extend to any derivative programs produced
by Microsoft's users?

Strictly with respect to lookup and match functions, FWLIW 123 works
the same way as Excel. Approximate matching uses binary search. Binary
search performs bracketting as it's first step. If the lookup value is
larger than the last (valid) value in the lookup range, these functions
short circuit and return references to the last (valid) value. Maybe
this functionality will be 'fixed' at some later date, in which case I
can only hope Microsoft adds reverse lookup and match functions that
search from end to start.

In this case the garbage spewed by MS's reliance on GIGO happens to be
someone else's treasure. {grin}


GIGO? Purely on practical grounds, it'd REALLY slow Excel down if it
had to validate that the first column or row of lookup ranges or arrays
were in sorted order. Maybe you'd like the wait, but I prefer Excel to
behave as it does now.


Harlan Grove

davidm wrote...
....
. . . . Isn't the effectiveness of the search
(by the formula) critically dependent upon the data being "ordered or
srorted"?


Only when searching for values equal to or less than the largest value
in the range or array searched. If the value sought is larger than the
largest value in the range or array searched, then the initial
bracketting step of the lookup or match function returns immediately
with a reference to the last value in the range or array searched. That
is, these functions work as follows.

If lookupvalue < FirstValueInLookupRangeOrArray Then
return #N/A
ElseIf lookupvalue LastValueInLookupRangeOrArray Then
return reference to LastValueInLookupRangeOrArray
Else
search between first and last entries . . .
End If

In the orignial sample by Roger, the string was *Pharma-e,, ,Albert
House, South Esplanade, Guernsey, Channel Islands,,GY1 1AN ,(014816 ),
,Pharmacy, , ,202268*and this is by no means ordred or sorted. If the
erratic nature of LOOKUP with unsorted data is anything to go by, is
there not a danger that uncritical application of the formula may (at
times) lead to errorneous results? Albeilt,for some of the time we
might yet get correct results, the rub is zero tolerance in striving
for accuracy with no room for the hit and miss


Any time you pass an unsorted lookup table to {V|H|}LOOKUP with 4th
argument missing, TRUE or nonzero numeric, you could get inaccurate
results. The sole exception, given how Excel works now (all versions at
least from Excel 97 through Excel 2003, but likely all versions back to
Excel 1.0 for Macs), is when the lookup value is strictly greater than
the largest value of the same type in the range or array searched. In
that situation, Excel *consistently* returns a reference to that last
match.

I can offer no proof that this will always work because I don't have
Excel's source code handy to see exactly how Excel does this. However,
I'm comfortable with the empirical basis for this behavior. This has
become a widely used idiom. If it didn't work, I'm very confident
someone would have posted an example where it failed by now. No one
has, so I'm willing to rely on this functionality myself. You're free
not to do so if that'd make you more comfortable.


Harlan Grove

"Tushar Mehta" wrote...
Consider the simpler array formula
=MID(A1,MAX(ROW(INDIRECT("1:"&LEN(A1)))
*N(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=","))+1, 1024)

....

Gosh, why not consider doing it manually?

Where to begin deconstructing this? Let's start with the inconsistency of
recommending against using undocumented lookup/match functionality but
recmmending using N()'s undocumented behavior when fed arrays of booleans.
Next there's explicit use of ROW(INDIRECT("1:"&LEN(A1))) rather than using a
defined name with a fixed upper bound, but (what the heck) using an
arbitrarily large 3rd argment to the outer MID call.

If you want to use only explicitly documented functionality of built-in
functions only, shouldn't you use

=RIGHT(A1,LEN(A1)-MAX(IF(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=",",
ROW(INDIRECT("1:"&LEN(A1))))))

? Otherwise it's your preferred undocumented behavior against someone else's
undocumented behavior.



Tushar Mehta

In article ,
says...
? Otherwise it's your preferred undocumented behavior against someone else's
undocumented behavior.


Oh, really?

HLOOKUP, VLOOKUP, LOOKUP return incorrect values in Excel
http://support.microsoft.com/default...b;en-us;181201

There's a similar article about MATCH.

MS has also documented a list of XL functions that accept arrays as
arguments. Of course, given how incompatible MS's search algorithms
and my way of thinking are I cannot find it after 30 minutes of
searching. But, I believe N() is on that list.

The bottom line is this. All of you who enjoy exploiting *documented*
bugs in XL are welcome to do so. If -- and yes I know all about pigs
not yet becoming airbore -- MS fixes those holes in its software you
will have no one to blame but yourself. However, hoisting those
"solutions" onto others *without* warning about what you are doing is
what I object to.

--
Regards,

Tushar Mehta
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

In article ,
says...
"Tushar Mehta" wrote...
Consider the simpler array formula
=MID(A1,MAX(ROW(INDIRECT("1:"&LEN(A1)))
*N(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=","))+1, 1024)

...

Gosh, why not consider doing it manually?

Where to begin deconstructing this? Let's start with the inconsistency of
recommending against using undocumented lookup/match functionality but
recmmending using N()'s undocumented behavior when fed arrays of booleans.
Next there's explicit use of ROW(INDIRECT("1:"&LEN(A1))) rather than using a
defined name with a fixed upper bound, but (what the heck) using an
arbitrarily large 3rd argment to the outer MID call.

If you want to use only explicitly documented functionality of built-in
functions only, shouldn't you use

=RIGHT(A1,LEN(A1)-MAX(IF(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)=",",
ROW(INDIRECT("1:"&LEN(A1))))))

? Otherwise it's your preferred undocumented behavior against someone else's
undocumented behavior.




Harlan Grove

Tushar Mehta wrote...
In article ,
says...
? Otherwise it's your preferred undocumented behavior against someone els=

e's
undocumented behavior.


Oh, really?

HLOOKUP, VLOOKUP, LOOKUP return incorrect values in Excel
http://support.microsoft.com/default...b;en-us;181201

Which says:

"CAUSE
This behavior occurs when either of the following conditions is true:

=B7 The range specified for the "table_array" argument (LOOKUP) or the
range specified for the "lookup_vector" argument (VLOOKUP and HLOOKUP)
is not sorted in ascending order.

-or-

=B7 Number formatting is applied to the range that is hiding the
underlying values."

With respect to the first point:

The semantic issue here is that the table range isn't sorted. This is a
problem in all languages that provide binary search against arbitrary
arrays. For binary search to work, the arrays must be sorted, usually
in ascending order. If you feed binary search an unsorted array, you do
have GIGO.

One very slow way to fix this is to have the search procedure sort its
input array. I suppose there's a case to be made for that, but it'd
really slow things down, and it'd more than eliminate the benefits of
binary search vs linear search.

With respect to the second point:

At what point do users become responsible for what they feed to
functions?

This particular KB article seems clearly intended for fairly naive
users who wouldn't consider reading online help, to wit for VLOOKUP:

"Table_array is the table of information in which data is looked up.
Use a reference to a range or a range name, such as Database or List.

If range_lookup is TRUE, the values in the first column of table_array
must be placed in ascending order: ..., -2, -1, 0, 1, 2, ..., A-Z,
FALSE, TRUE; otherwise VLOOKUP may not give the correct value. If
range_lookup is FALSE, table_array does not need to be sorted. [...]"

MS has also documented a list of XL functions that accept arrays as
arguments. Of course, given how incompatible MS's search algorithms
and my way of thinking are I cannot find it after 30 minutes of
searching. But, I believe N() is on that list.


Unlike you, I believe it should be sufficient to refer to online help.
That is, 'documented' functionality should be what online help says. So
for the N function:

"N

Returns a value converted to a number.

Syntax

N(value)

Value is the value you want converted. N converts values listed in
the following table.

If value is or refers to N returns

A number That number

A date, in one of the built-in date formats available in Microsoft
Excel The serial number of that date

TRUE 1

FALSE 0

An error value, such as #DIV/0! The error value
Anything else 0"

It does mention TRUE and FALSE, but not arrays. N (and T) work VERY
DIFFERENTLY when fed range vs array arguments. N({1,2,3,4}<2.5) returns
{1,1,0,0}, but with C1:C4 containing {TRUE;TRUE;FALSE;FALSE}, N(C1:C4)
returns 1. Self-evident to anyone who used to use 123, but likely
surprising to everyone else.

The bottom line is this. All of you who enjoy exploiting *documented*
bugs in XL are welcome to do so. If -- and yes I know all about pigs
not yet becoming airbore -- MS fixes those holes in its software you
will have no one to blame but yourself. However, hoisting those
"solutions" onto others *without* warning about what you are doing is
what I object to.


First find that url to a Microsoft source that explicitly confirms that
N and T functions are *intended* to handle arrays, then I'll consider
that you're not abusing the N function. In other words, no I won't just
take your word for it.

Second, Microsoft may change worksheet function behavior, but would it
abandon compatibility with other spreadsheets entirely? Excel's lookup
and match functions behave the *same* as Lotus 123 when the lookup
value is larger than any value in the lookup range and the lookup or
match mode is nonexact. I suppose I should overlook your ignorance of
compatability issues.

There may be a chance that Microsoft would deprecate this particular
bit of functionality. I wonder whether I'd be surprised or not if they
didn't provide prior warning. I know I wouldn't be surprised if many IT
departments upgraded without checking for semantic changes like this.

Anyway, if Microsoft is going to go to the uncharacteristic effort of
changing core functionality, maybe they'll add some as well as break
other. If Excel only supported regular expressions, as OpenOffice Calc
and Gnumeric do, one could just have used a formula like the following
OpenOffice Calc one to find the substring following the final comma.

=3DMID(A1;SEARCH("[^,]+$";A1);256)

But my money would be on no changes even 'real soon now'.



All times are GMT +1. The time now is 03:26 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com