![]() |
Extracting the last set of words from a text string
What formula would I use (preferably using a morefunc REGEX.MID function) to
extract the last series of words/acronyms from a text string? The key to identifiying where that last set of words begins, is that they will always follow either an "N" or a number that may or may not have a decimal. See the examples below: 2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC 102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE ....here I am trying to extract "DICKINSON COLLEGE" 2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74 3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO ....here I am trying to extract "CUMBERLAND CNTY HO" -- iperlovsky |
Extracting the last set of words from a text string
On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky
wrote: What formula would I use (preferably using a morefunc REGEX.MID function) to extract the last series of words/acronyms from a text string? The key to identifiying where that last set of words begins, is that they will always follow either an "N" or a number that may or may not have a decimal. See the examples below: 2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC 102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE ...here I am trying to extract "DICKINSON COLLEGE" 2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74 3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO ...here I am trying to extract "CUMBERLAND CNTY HO" Try this: =REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1) It does presume that the last set of words contains only capital letters. That can be broadened, if necessary, I think. --ron |
Extracting the last set of words from a text string
Hey Ron,
That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ....here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ....and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky "Ron Rosenfeld" wrote: On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky wrote: What formula would I use (preferably using a morefunc REGEX.MID function) to extract the last series of words/acronyms from a text string? The key to identifiying where that last set of words begins, is that they will always follow either an "N" or a number that may or may not have a decimal. See the examples below: 2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC 102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE ...here I am trying to extract "DICKINSON COLLEGE" 2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74 3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO ...here I am trying to extract "CUMBERLAND CNTY HO" Try this: =REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1) It does presume that the last set of words contains only capital letters. That can be broadened, if necessary, I think. --ron |
Extracting the last set of words from a text string
Try this:
Function lastpart(r As Range) As String lastpast = "" s = Split(r.Value, " ") fnd = False For i = UBound(s) To LBound(s) Step -1 If s(i) = "N" Or IsNumeric(s(i)) Then fnd = True Exit For End If Next If fnd = False Or i = UBound(s) Then Exit Function End If lastpart = s(i + 1) For j = i + 2 To UBound(s) lastpart = lastpart & " " & s(j) Next End Function -- Gary''s Student gsnu200708 "IPerlovsky" wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky "Ron Rosenfeld" wrote: On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky wrote: What formula would I use (preferably using a morefunc REGEX.MID function) to extract the last series of words/acronyms from a text string? The key to identifiying where that last set of words begins, is that they will always follow either an "N" or a number that may or may not have a decimal. See the examples below: 2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC 102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE ...here I am trying to extract "DICKINSON COLLEGE" 2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74 3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO ...here I am trying to extract "CUMBERLAND CNTY HO" Try this: =REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1) It does presume that the last set of words contains only capital letters. That can be broadened, if necessary, I think. --ron |
Extracting the last set of words from a text string
anyway we can work this into a formula w/o VB?
-- iperlovsky "Gary''s Student" wrote: Try this: Function lastpart(r As Range) As String lastpast = "" s = Split(r.Value, " ") fnd = False For i = UBound(s) To LBound(s) Step -1 If s(i) = "N" Or IsNumeric(s(i)) Then fnd = True Exit For End If Next If fnd = False Or i = UBound(s) Then Exit Function End If lastpart = s(i + 1) For j = i + 2 To UBound(s) lastpart = lastpart & " " & s(j) Next End Function -- Gary''s Student gsnu200708 "IPerlovsky" wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky "Ron Rosenfeld" wrote: On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky wrote: What formula would I use (preferably using a morefunc REGEX.MID function) to extract the last series of words/acronyms from a text string? The key to identifiying where that last set of words begins, is that they will always follow either an "N" or a number that may or may not have a decimal. See the examples below: 2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC 102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE ...here I am trying to extract "DICKINSON COLLEGE" 2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74 3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO ...here I am trying to extract "CUMBERLAND CNTY HO" Try this: =REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1) It does presume that the last set of words contains only capital letters. That can be broadened, if necessary, I think. --ron |
Extracting the last set of words from a text string
You can manually perform on the worksheet what the VBA does:
1. Text to Columns to split the string up using the space as the separator 2. Work backwards from column IV towards column A looking for N or a number 3. Re-concatenate everything to the right of the find. Problem is that I don't know how to make s simple function do this. I only offered the VBA in case you have not received a worksheet solution. Perhaps you could try the VBA out. If you make a copy of your workbook and experiment with the copy, you have nothing to lose: Macros are very easy to install and use: 1. CNTRL-F11 brings up the VBE window 2. ALT-I ALT-M opens a fresh module 3. paste the stuff in and close the VBE window If you save the workbook, the macro will be saved with it. To remove the macro: 1. bring up the VBE window as above 2. clear the code out 3. close the VBE window To learn more about macros in general, see: http://www.mvps.org/dmcritchie/excel/getstarted.htm To use the VBA function: 1. put a sample string in cell A1 2. in another cell enter: =lastpart(A1) -- Gary's Student gsnu200708 "IPerlovsky" wrote: anyway we can work this into a formula w/o VB? -- iperlovsky "Gary''s Student" wrote: Try this: Function lastpart(r As Range) As String lastpast = "" s = Split(r.Value, " ") fnd = False For i = UBound(s) To LBound(s) Step -1 If s(i) = "N" Or IsNumeric(s(i)) Then fnd = True Exit For End If Next If fnd = False Or i = UBound(s) Then Exit Function End If lastpart = s(i + 1) For j = i + 2 To UBound(s) lastpart = lastpart & " " & s(j) Next End Function -- Gary''s Student gsnu200708 "IPerlovsky" wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky "Ron Rosenfeld" wrote: On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky wrote: What formula would I use (preferably using a morefunc REGEX.MID function) to extract the last series of words/acronyms from a text string? The key to identifiying where that last set of words begins, is that they will always follow either an "N" or a number that may or may not have a decimal. See the examples below: 2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC 102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE ...here I am trying to extract "DICKINSON COLLEGE" 2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74 3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO ...here I am trying to extract "CUMBERLAND CNTY HO" Try this: =REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1) It does presume that the last set of words contains only capital letters. That can be broadened, if necessary, I think. --ron |
Extracting the last set of words from a text string
On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky
wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky Well, that does make things a bit more complicated. And I'll try a somewhat different approach, using REGEX.SUBSTITUTE =REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]") It works on all of your samples, as well as on a sample that ends with a number. It assumes that your substring to be extracted consists only of capital letters, digits and spaces. However, if the company name ends with a digit, the function will fail. It's possible to handle that issue, but it's getting late. --ron |
Extracting the last set of words from a text string
On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote: On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky Well, that does make things a bit more complicated. And I'll try a somewhat different approach, using REGEX.SUBSTITUTE =REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]") It works on all of your samples, as well as on a sample that ends with a number. It assumes that your substring to be extracted consists only of capital letters, digits and spaces. However, if the company name ends with a digit, the function will fail. It's possible to handle that issue, but it's getting late. --ron OK, this should work even if the Company name ends with a digit: =REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|(\s[-+]?\d*\.?\d+(\s|$)))([A-Z\d\s]*$)","[7]") --ron |
Extracting the last set of words from a text string
On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote: On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky Well, that does make things a bit more complicated. And I'll try a somewhat different approach, using REGEX.SUBSTITUTE =REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]") It works on all of your samples, as well as on a sample that ends with a number. It assumes that your substring to be extracted consists only of capital letters, digits and spaces. However, if the company name ends with a digit, the function will fail. It's possible to handle that issue, but it's getting late. --ron A little simpler, and will handle terminal company digits: =REGEX.SUBSTITUTE(A1,"([\s\S]+)\s((N)|([-+]?\d*\.?\d+))(\s|$)([A-Z\d\s]*$)","[6]") --ron |
Extracting the last set of words from a text string
On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote: On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky Well, that does make things a bit more complicated. And I'll try a somewhat different approach, using REGEX.SUBSTITUTE =REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]") It works on all of your samples, as well as on a sample that ends with a number. It assumes that your substring to be extracted consists only of capital letters, digits and spaces. However, if the company name ends with a digit, the function will fail. It's possible to handle that issue, but it's getting late. --ron And even simpler: =REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]") --ron |
Extracting the last set of words from a text string
The formula is not always applicable. For example:
2/26/2007 MN 094797-D7 BLOOMINGTON MINN IND 1000000 5.25 2/1/2010 104.24 3.70 3.59 11 Y 2/1/2010 Aa2 NR Y 2/1/2010 ....here it returns the whole line? "2/26/2007 MN 094797-D7 BLOOMINGTON MINN IND 1000000 5.25 2/1/2010 104.24 3.70 3.59 11 Y 2/1/2010 Aa2 NR Y 2/1/2010" .... and in this example as well "2/26/2007 NM 827513-EW SILVER CITY N M CONS 1000000 3.8 8/1/2016 MBIA ST AI 100.316 3.75 3.69 6 Y Aaa NR Y 8/1/2014" -- iperlovsky "Ron Rosenfeld" wrote: On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld wrote: On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky Well, that does make things a bit more complicated. And I'll try a somewhat different approach, using REGEX.SUBSTITUTE =REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]") It works on all of your samples, as well as on a sample that ends with a number. It assumes that your substring to be extracted consists only of capital letters, digits and spaces. However, if the company name ends with a digit, the function will fail. It's possible to handle that issue, but it's getting late. --ron And even simpler: =REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]") --ron |
Extracting the last set of words from a text string
I figured out one that works - a combination of logical statements and
morefunc. I do, however, have a couple of strings that are not complying with the formula: The formula is: =IF(OR(RIGHT(A1,2)=" Y",RIGHT(A1,2)=" N",ISNUMBER(RIGHT(A1,1)*1)=TRUE),"",IF(REGEX.MID(A 1,"(?<=\sN\s)[A-Z\W]+",-1)="",REGEX.MID(A1,"(?<=\d\s)[A-Z\W]+",-1),REGEX.MID(A1,"(?<=\sN\s)[A-Z\W]+",-1))) It works on a nearly complete data set of 5000 rows except for these example: 2/22/2007 NY 64971S-AA NEW YORK N Y CITY IND1MM+ 5.4 7/1/2019 99.712 5.43 3.80 163 Y Y Caa2 CCC+ Y 8/4/2009 8/4/2007 102 AMR CORPORATION ....where it returns "Y CITY IND" 2/23/2007 NY 531127-AC LIBERTY N Y DEV CORP 1MM+ 5.25 10/1/2035 116.639 4.24 4.06 18 Y Aa3 AA- Y GOLDMAN SACHS GROU ....where it returns "Y DEV CORP" 2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74 3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101 CUMBERLAND CNTY HO ....where it returns "C" it is clear that the formula in these cases is returning the words following a spaced "N". Any thoughts on this one? -- iperlovsky |
Extracting the last set of words from a text string
Okay. I got one that works for all strings in the set:
=TRIM(IF(OR(RIGHT(B2,2)=" Y",RIGHT(B2,2)=" N",ISNUMBER(RIGHT(B2,1)*1)=TRUE),"",TEXTREVERSE(LE FT(TEXTREVERSE(B2),REGEX.FIND(TEXTREVERSE(B2),"\sN \s|\sY\s|\d",1)-1)))) Ah, TEXTREVERSE................... -- iperlovsky "Ron Rosenfeld" wrote: On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld wrote: On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky wrote: Hey Ron, That works for most of the strings, but in the following examples it pulls a little more info than we need: 2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20 3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY ...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY COMPANY" 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N ...and here "Y AAA N", but I want nothing to pull I guess, if the last character is an "N" or a number, then it should not pull anything. -- iperlovsky Well, that does make things a bit more complicated. And I'll try a somewhat different approach, using REGEX.SUBSTITUTE =REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]") It works on all of your samples, as well as on a sample that ends with a number. It assumes that your substring to be extracted consists only of capital letters, digits and spaces. However, if the company name ends with a digit, the function will fail. It's possible to handle that issue, but it's getting late. --ron And even simpler: =REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]") --ron |
Extracting the last set of words from a text string
IPerlovsky wrote...
Okay. I got one that works for all strings in the set: =TRIM(IF(OR(RIGHT(B2,2)=" Y",RIGHT(B2,2)=" N", ISNUMBER(RIGHT(B2,1)*1)=TRUE),"", TEXTREVERSE(LEFT(TEXTREVERSE(B2),REGEX.FIND(TEXTR EVERSE(B2), "\sN\s|\sY\s|\d",1)-1)))) .... OK, but this should work too. =IF(REGEX.COMP(B2,"( [YN]|\d) +$"),"", REGEX.SUBSTITUTE(B2,".+( [YN]|\d) ","")) Quibble: don't use \s unless you have tabs in your whitespace. A simple space, " " within the double quotes, is more efficient when you only have spaces in your whitespace. |
Extracting the last set of words from a text string
On Thu, 1 Mar 2007 13:43:00 -0800, IPerlovsky
wrote: Okay. I got one that works for all strings in the set: =TRIM(IF(OR(RIGHT(B2,2)=" Y",RIGHT(B2,2)=" N",ISNUMBER(RIGHT(B2,1)*1)=TRUE),"",TEXTREVERSE(L EFT(TEXTREVERSE(B2),REGEX.FIND(TEXTREVERSE(B2),"\s N\s|\sY\s|\d",1)-1)))) Ah, TEXTREVERSE................... 2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311 3.64 3.59 5 Y AAA N with your formula above gives: AAA N On the other hand, this modification of my previous seems to work on all the examples you've posted: =IF(REGEX.COMP(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)"), REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]"),"") --ron |
All times are GMT +1. The time now is 02:20 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com