ExcelBanter - Extracting the last set of words from a text string

ExcelBanter

ExcelBanter (https://www.excelbanter.com/)

- Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)

- - Extracting the last set of words from a text string (https://www.excelbanter.com/excel-worksheet-functions/132755-extracting-last-set-words-text-string.html)

IPerlovsky

Extracting the last set of words from a text string

What formula would I use (preferably using a morefunc REGEX.MID function) to
extract the last series of words/acronyms from a text string? The key to
identifiying where that last set of words begins, is that they will always
follow either an "N" or a number that may or may not have a decimal. See the
examples below:

2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC
102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE

....here I am trying to extract "DICKINSON COLLEGE"

2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74
3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO

....here I am trying to extract "CUMBERLAND CNTY HO"

--
iperlovsky

Ron Rosenfeld

Extracting the last set of words from a text string

On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky
wrote:

What formula would I use (preferably using a morefunc REGEX.MID function) to
extract the last series of words/acronyms from a text string? The key to
identifiying where that last set of words begins, is that they will always
follow either an "N" or a number that may or may not have a decimal. See the
examples below:

2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC
102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE

...here I am trying to extract "DICKINSON COLLEGE"

2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74
3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO

...here I am trying to extract "CUMBERLAND CNTY HO"

Try this:

=REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1)

It does presume that the last set of words contains only capital letters. That
can be broadened, if necessary, I think.
--ron

IPerlovsky

Extracting the last set of words from a text string

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

....here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

....and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

"Ron Rosenfeld" wrote:

On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky
wrote:

What formula would I use (preferably using a morefunc REGEX.MID function) to
extract the last series of words/acronyms from a text string? The key to
identifiying where that last set of words begins, is that they will always
follow either an "N" or a number that may or may not have a decimal. See the
examples below:

2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC
102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE

...here I am trying to extract "DICKINSON COLLEGE"

2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74
3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO

...here I am trying to extract "CUMBERLAND CNTY HO"

Try this:

=REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1)

It does presume that the last set of words contains only capital letters. That
can be broadened, if necessary, I think.
--ron

Gary''s Student

Extracting the last set of words from a text string

Try this:

Function lastpart(r As Range) As String
lastpast = ""
s = Split(r.Value, " ")
fnd = False
For i = UBound(s) To LBound(s) Step -1
If s(i) = "N" Or IsNumeric(s(i)) Then
fnd = True
Exit For
End If
Next

If fnd = False Or i = UBound(s) Then
Exit Function
End If

lastpart = s(i + 1)
For j = i + 2 To UBound(s)
lastpart = lastpart & " " & s(j)
Next
End Function
--
Gary''s Student
gsnu200708

"IPerlovsky" wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

"Ron Rosenfeld" wrote:

On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky
wrote:

What formula would I use (preferably using a morefunc REGEX.MID function) to
extract the last series of words/acronyms from a text string? The key to
identifiying where that last set of words begins, is that they will always
follow either an "N" or a number that may or may not have a decimal. See the
examples below:

2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC
102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE

...here I am trying to extract "DICKINSON COLLEGE"

2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74
3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO

...here I am trying to extract "CUMBERLAND CNTY HO"

Try this:

=REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1)

It does presume that the last set of words contains only capital letters. That
can be broadened, if necessary, I think.
--ron

IPerlovsky

Extracting the last set of words from a text string

anyway we can work this into a formula w/o VB?

--
iperlovsky

"Gary''s Student" wrote:

Try this:

Function lastpart(r As Range) As String
lastpast = ""
s = Split(r.Value, " ")
fnd = False
For i = UBound(s) To LBound(s) Step -1
If s(i) = "N" Or IsNumeric(s(i)) Then
fnd = True
Exit For
End If
Next

If fnd = False Or i = UBound(s) Then
Exit Function
End If

lastpart = s(i + 1)
For j = i + 2 To UBound(s)
lastpart = lastpart & " " & s(j)
Next
End Function
--
Gary''s Student
gsnu200708

"IPerlovsky" wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

"Ron Rosenfeld" wrote:

On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky
wrote:

What formula would I use (preferably using a morefunc REGEX.MID function) to
extract the last series of words/acronyms from a text string? The key to
identifiying where that last set of words begins, is that they will always
follow either an "N" or a number that may or may not have a decimal. See the
examples below:

2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC
102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE

...here I am trying to extract "DICKINSON COLLEGE"

2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74
3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO

...here I am trying to extract "CUMBERLAND CNTY HO"

Try this:

=REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1)

It does presume that the last set of words contains only capital letters. That
can be broadened, if necessary, I think.
--ron

Gary''s Student

Extracting the last set of words from a text string

You can manually perform on the worksheet what the VBA does:

1. Text to Columns to split the string up using the space as the separator
2. Work backwards from column IV towards column A looking for N or a number
3. Re-concatenate everything to the right of the find.

Problem is that I don't know how to make s simple function do this.

I only offered the VBA in case you have not received a worksheet solution.

Perhaps you could try the VBA out. If you make a copy of your workbook and
experiment with the copy, you have nothing to lose:

Macros are very easy to install and use:

1. CNTRL-F11 brings up the VBE window
2. ALT-I
ALT-M opens a fresh module
3. paste the stuff in and close the VBE window

If you save the workbook, the macro will be saved with it.

To remove the macro:

1. bring up the VBE window as above
2. clear the code out
3. close the VBE window

To learn more about macros in general, see:

http://www.mvps.org/dmcritchie/excel/getstarted.htm

To use the VBA function:
1. put a sample string in cell A1
2. in another cell enter:
=lastpart(A1)

--
Gary's Student
gsnu200708

"IPerlovsky" wrote:

anyway we can work this into a formula w/o VB?

--
iperlovsky

"Gary''s Student" wrote:

Try this:

Function lastpart(r As Range) As String
lastpast = ""
s = Split(r.Value, " ")
fnd = False
For i = UBound(s) To LBound(s) Step -1
If s(i) = "N" Or IsNumeric(s(i)) Then
fnd = True
Exit For
End If
Next

If fnd = False Or i = UBound(s) Then
Exit Function
End If

lastpart = s(i + 1)
For j = i + 2 To UBound(s)
lastpart = lastpart & " " & s(j)
Next
End Function
--
Gary''s Student
gsnu200708

"IPerlovsky" wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

"Ron Rosenfeld" wrote:

On Wed, 28 Feb 2007 10:10:33 -0800, IPerlovsky
wrote:

What formula would I use (preferably using a morefunc REGEX.MID function) to
extract the last series of words/acronyms from a text string? The key to
identifiying where that last set of words begins, is that they will always
follow either an "N" or a number that may or may not have a decimal. See the
examples below:

2/26/2007 PA 23061N-BE CUMBERLAND CNTY PA 1000000 5.25 11/1/2008 AMBAC
102.575 3.64 3.59 5 P Aaa AAA N DICKINSON COLLEGE

...here I am trying to extract "DICKINSON COLLEGE"

2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74
3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101.1 CUMBERLAND CNTY HO

...here I am trying to extract "CUMBERLAND CNTY HO"

Try this:

=REGEX.MID(A1,"((?<=\sN\s)|(?<=\d\s))[A-Z\s]+",-1)

It does presume that the last set of words contains only capital letters. That
can be broadened, if necessary, I think.
--ron

Ron Rosenfeld

Extracting the last set of words from a text string

On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky
wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

Well, that does make things a bit more complicated.

And I'll try a somewhat different approach, using REGEX.SUBSTITUTE

=REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]")

It works on all of your samples, as well as on a sample that ends with a
number. It assumes that your substring to be extracted consists only of
capital letters, digits and spaces.

However, if the company name ends with a digit, the function will fail. It's
possible to handle that issue, but it's getting late.

--ron

Ron Rosenfeld

Extracting the last set of words from a text string

On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote:

On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky
wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

Well, that does make things a bit more complicated.

And I'll try a somewhat different approach, using REGEX.SUBSTITUTE

=REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]")

It works on all of your samples, as well as on a sample that ends with a
number. It assumes that your substring to be extracted consists only of
capital letters, digits and spaces.

However, if the company name ends with a digit, the function will fail. It's
possible to handle that issue, but it's getting late.

--ron

OK, this should work even if the Company name ends with a digit:

=REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|(\s[-+]?\d*\.?\d+(\s|$)))([A-Z\d\s]*$)","[7]")

--ron

Ron Rosenfeld

Extracting the last set of words from a text string

On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote:

On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky
wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

Well, that does make things a bit more complicated.

And I'll try a somewhat different approach, using REGEX.SUBSTITUTE

=REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]")

It works on all of your samples, as well as on a sample that ends with a
number. It assumes that your substring to be extracted consists only of
capital letters, digits and spaces.

However, if the company name ends with a digit, the function will fail. It's
possible to handle that issue, but it's getting late.

--ron

A little simpler, and will handle terminal company digits:

=REGEX.SUBSTITUTE(A1,"([\s\S]+)\s((N)|([-+]?\d*\.?\d+))(\s|$)([A-Z\d\s]*$)","[6]")

--ron

Ron Rosenfeld

Extracting the last set of words from a text string

On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote:

On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky
wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

Well, that does make things a bit more complicated.

And I'll try a somewhat different approach, using REGEX.SUBSTITUTE

=REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]")

It works on all of your samples, as well as on a sample that ends with a
number. It assumes that your substring to be extracted consists only of
capital letters, digits and spaces.

However, if the company name ends with a digit, the function will fail. It's
possible to handle that issue, but it's getting late.

--ron

And even simpler:

=REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]")

--ron

IPerlovsky

Extracting the last set of words from a text string

The formula is not always applicable. For example:

2/26/2007 MN 094797-D7 BLOOMINGTON MINN IND 1000000 5.25 2/1/2010 104.24
3.70 3.59 11 Y 2/1/2010 Aa2 NR Y 2/1/2010

....here it returns the whole line?

"2/26/2007 MN 094797-D7 BLOOMINGTON MINN IND 1000000 5.25 2/1/2010 104.24
3.70 3.59 11 Y 2/1/2010 Aa2 NR Y 2/1/2010"

.... and in this example as well

"2/26/2007 NM 827513-EW SILVER CITY N M CONS 1000000 3.8 8/1/2016 MBIA ST AI
100.316 3.75 3.69 6 Y Aaa NR Y 8/1/2014"

--
iperlovsky

"Ron Rosenfeld" wrote:

On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote:

On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky
wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

Well, that does make things a bit more complicated.

And I'll try a somewhat different approach, using REGEX.SUBSTITUTE

=REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]")

It works on all of your samples, as well as on a sample that ends with a
number. It assumes that your substring to be extracted consists only of
capital letters, digits and spaces.

However, if the company name ends with a digit, the function will fail. It's
possible to handle that issue, but it's getting late.

--ron

And even simpler:

=REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]")

--ron

IPerlovsky

Extracting the last set of words from a text string

I figured out one that works - a combination of logical statements and
morefunc. I do, however, have a couple of strings that are not complying
with the formula:

The formula is:

=IF(OR(RIGHT(A1,2)=" Y",RIGHT(A1,2)="
N",ISNUMBER(RIGHT(A1,1)*1)=TRUE),"",IF(REGEX.MID(A 1,"(?<=\sN\s)[A-Z\W]+",-1)="",REGEX.MID(A1,"(?<=\d\s)[A-Z\W]+",-1),REGEX.MID(A1,"(?<=\sN\s)[A-Z\W]+",-1)))

It works on a nearly complete data set of 5000 rows except for these example:

2/22/2007 NY 64971S-AA NEW YORK N Y CITY IND1MM+ 5.4 7/1/2019 99.712 5.43
3.80 163 Y Y Caa2 CCC+ Y 8/4/2009 8/4/2007 102 AMR CORPORATION

....where it returns "Y CITY IND"

2/23/2007 NY 531127-AC LIBERTY N Y DEV CORP 1MM+ 5.25 10/1/2035 116.639 4.24
4.06 18 Y Aa3 AA- Y GOLDMAN SACHS GROU

....where it returns "Y DEV CORP"

2/26/2007 NC 230584-CS CUMBERLAND CNTY N C1MM+ 5.25 10/1/2009 104.603 3.74
3.59 15 P 10/1/2009 A3 A- Y 10/1/2009 101 CUMBERLAND CNTY HO

....where it returns "C"

it is clear that the formula in these cases is returning the words following
a spaced "N". Any thoughts on this one?

--
iperlovsky

IPerlovsky

Extracting the last set of words from a text string

Okay. I got one that works for all strings in the set:

=TRIM(IF(OR(RIGHT(B2,2)=" Y",RIGHT(B2,2)="
N",ISNUMBER(RIGHT(B2,1)*1)=TRUE),"",TEXTREVERSE(LE FT(TEXTREVERSE(B2),REGEX.FIND(TEXTREVERSE(B2),"\sN \s|\sY\s|\d",1)-1))))

Ah, TEXTREVERSE...................

--
iperlovsky

"Ron Rosenfeld" wrote:

On Wed, 28 Feb 2007 20:21:04 -0500, Ron Rosenfeld
wrote:

On Wed, 28 Feb 2007 11:26:25 -0800, IPerlovsky
wrote:

Hey Ron,

That works for most of the strings, but in the following examples it pulls a
little more info than we need:

2/26/2007 TX 106213-FH BRAZOS RIVER AUTH TE1MM+ 5.75 11/1/2011 102.244 5.20
3.59 161 Y Baa2 BBB N TXU ENERGY COMPANY

...here it pulls "BBB N TXU ENERGY COMPANY" but I want just "TXU ENERGY
COMPANY"

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

...and here "Y AAA N", but I want nothing to pull

I guess, if the last character is an "N" or a number, then it should not
pull anything.
--
iperlovsky

Well, that does make things a bit more complicated.

And I'll try a somewhat different approach, using REGEX.SUBSTITUTE

=REGEX.SUBSTITUTE(A1,"([\s\S]+)((\sN(\s|$))|([\s\.\d]\d(\s|$)))([A-Z\d\s]*$)","[7]")

It works on all of your samples, as well as on a sample that ends with a
number. It assumes that your substring to be extracted consists only of
capital letters, digits and spaces.

However, if the company name ends with a digit, the function will fail. It's
possible to handle that issue, but it's getting late.

--ron

And even simpler:

=REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]")

--ron

Harlan Grove

Extracting the last set of words from a text string

IPerlovsky wrote...
Okay. I got one that works for all strings in the set:

=TRIM(IF(OR(RIGHT(B2,2)=" Y",RIGHT(B2,2)=" N",
ISNUMBER(RIGHT(B2,1)*1)=TRUE),"",
TEXTREVERSE(LEFT(TEXTREVERSE(B2),REGEX.FIND(TEXTR EVERSE(B2),
"\sN\s|\sY\s|\d",1)-1))))
....

OK, but this should work too.

=IF(REGEX.COMP(B2,"( [YN]|\d) +$"),"",
REGEX.SUBSTITUTE(B2,".+( [YN]|\d) ",""))

Quibble: don't use \s unless you have tabs in your whitespace. A
simple space, " " within the double quotes, is more efficient when you
only have spaces in your whitespace.

Ron Rosenfeld

Extracting the last set of words from a text string

On Thu, 1 Mar 2007 13:43:00 -0800, IPerlovsky
wrote:

Okay. I got one that works for all strings in the set:

=TRIM(IF(OR(RIGHT(B2,2)=" Y",RIGHT(B2,2)="
N",ISNUMBER(RIGHT(B2,1)*1)=TRUE),"",TEXTREVERSE(L EFT(TEXTREVERSE(B2),REGEX.FIND(TEXTREVERSE(B2),"\s N\s|\sY\s|\d",1)-1))))

Ah, TEXTREVERSE...................

2/26/2007 LA 988844-CN ZACHARY LA CMNTY SC 1000000 4 3/1/2008 CIFG 100.311
3.64 3.59 5 Y AAA N

with your formula above gives:

AAA N

On the other hand, this modification of my previous seems to work on all the
examples you've posted:

=IF(REGEX.COMP(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)"),
REGEX.SUBSTITUTE(A1,"([\s\S]+)\s(N|[-+]?\d*\.?\d+)(\s|$)([A-Z\d\s]*$)","[4]"),"")
--ron

All times are GMT +1. The time now is 02:20 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com