Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
I have 11,000 long text strings in Excel... I had used LEFT operation to
limit the strings to 500 charaters, now I need to trim the strings to the last available "." so that I can delete those uncompleted sentence after the last "." I tried to use "Text to Column", but it does not give me good result... Please HELP!! Thank you! |
#2
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
With the original text in A1
This formula (shown in segments for readability) shortens that string to 500 characters, then truncates it after the last period: B1: =LEFT(LEFT(A1,500),FIND(CHAR(7), SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1,50 0))- LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1) Copy that formula down as far as you need. Is that something you can work with? -------------------------- Regards, Ron Microsoft MVP (Excel) (XL2003, Win XP) "VCKW" wrote in message ... I have 11,000 long text strings in Excel... I had used LEFT operation to limit the strings to 500 charaters, now I need to trim the strings to the last available "." so that I can delete those uncompleted sentence after the last "." I tried to use "Text to Column", but it does not give me good result... Please HELP!! Thank you! |
#3
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
I realize that, using a maximum of 500 characters, you probably will never
run into the situation where there are no periods in the (initially) truncated string of text; however, if you should ever want to modify this formula for use in a situation where that possibility exists (that is, if you ever need to change the 500 to something much, much less), then the following function will return the string of text, truncated to 500 characters, if there is no periods within the first 500 characters; also, if A1 is empty, the formula returns the empty string. =LEFT(A1,FIND(CHAR(7),SUBSTITUTE(LEFT(A1,500)&".", ".",CHAR(7),NOT(ISNUMBER(FIND(".",LEFT(A1,500))))+ LEN(LEFT(A1,500))-LEN(SUBSTITUTE(LEFT(A1,500),".",""))))) Although the formula is similar to the one Ron posted, it is built slightly differently to account for the functionality I outlined above; however, I did use the same CHAR(7) substitution character that Ron used. Rick "VCKW" wrote in message ... I have 11,000 long text strings in Excel... I had used LEFT operation to limit the strings to 500 charaters, now I need to trim the strings to the last available "." so that I can delete those uncompleted sentence after the last "." I tried to use "Text to Column", but it does not give me good result... Please HELP!! Thank you! |
#4
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Sun, 25 Nov 2007 00:13:31 -0500, "Ron Coderre"
wrote: This formula (shown in segments for readability) shortens that string to 500 characters, then truncates it after the last period: B1: =LEFT(LEFT(A1,500),FIND(CHAR(7), SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1,5 00))- LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1) Your formula also returns the character following the ".". Suggest you omit the last "+1" --ron |
#5
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
Thanks everyone!!! :) It works the way I want and this save me hundred of
hours!! |
#6
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
Thanks, Ron. Good catch.
-------------------------- Best Regards, Ron Microsoft MVP (Excel) (XL2003, Win XP) "Ron Rosenfeld" wrote in message ... On Sun, 25 Nov 2007 00:13:31 -0500, "Ron Coderre" wrote: This formula (shown in segments for readability) shortens that string to 500 characters, then truncates it after the last period: B1: =LEFT(LEFT(A1,500),FIND(CHAR(7), SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1, 500))- LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1) Your formula also returns the character following the ".". Suggest you omit the last "+1" --ron |
#7
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
Hello,
You already got some solutions, but just for the fun of it: =regexpreplace(A1,"^(.*\.)?.*$","$1") This UDF you can get he http://www.sulprobil.com/html/regexp.html Regards, Bernd |
#8
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
You already got some solutions, but just for the fun of it:
=regexpreplace(A1,"^(.*\.)?.*$","$1") This UDF you can get he http://www.sulprobil.com/html/regexp.html Well, if you are looking for a UDF type solution, this one-liner would do what your code does, as it applies to the OP's request (assuming you added the part to initially truncate the text at 500 characters before looking for the last period)... Function TruncateAtPeriod(ByVal Source As String, _ Optional TrimAt As Long = 30000) As String TruncateAtPeriod = Left(Source, InStrRev(Left(Source, TrimAt), ".")) End Function where the OP would call it like this... =TruncateAtPeriod(A1,500) Both of our code returns the empty string if, in the unlikely event, no periods exist within the text (in my case, prior to the 500th character), so additional code would be needed in order to return the whole string of text (assuming that is what the OP would want to happen when no periods are found within the text). Rick |
#9
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
Hello Rick,
.... additional code would be needed in order to return the whole string of text (assuming that is what the OP would want to happen when no periods are found within the text). .... Less code in my example: just omit the "?" :-) Regards, Bernd |
#10
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
additional code would be needed in order to return the whole string of
text (assuming that is what the OP would want to happen when no periods are found within the text). ... Less code in my example: just omit the "?" :-) Well, I wasn't talking about a *lot* of additional code. Still a one-liner.... Function TruncateAtPeriod(ByVal Source As String, _ Optional TrimAt As Long = 30000) As String TruncateAtPeriod = Left(Source, InStrRev(Left(Source, TrimAt), ".") - _ Len(Source) * (Not Source Like "*.*")) End Function Besides, you still have to include *more* code somewhere to perform the initial truncation (to 500 characters).<g Rick |
#11
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Tue, 27 Nov 2007 09:52:56 -0800 (PST), Bernd P wrote:
Hello Rick, ... additional code would be needed in order to return the whole string of text (assuming that is what the OP would want to happen when no periods are found within the text). ... Less code in my example: just omit the "?" :-) Regards, Bernd Bernd, A couple of points. Your "code" is not really shorter if you include the VBA code required for the UDF. But, more importantly, I don't believe your code will work with a multiline text string because of the limitations of the VBScript flavor of regex. In particular, there is no provision, in VBScript to enable <dot matches newline. OK, I just downloaded the UDF and tested it and, indeed, it does not work on multiline text strings. If you want to use regular expressions for this exercise, I would suggest the following UDF with a regex constructed according to the maximum length of the string (500 in this case, but there is an optional argument to change that: ========================================= Option Explicit Function reTruncateAtDot(str As String, _ Optional NumChars As Long = 500) As String Dim re As Object, mc As Object Dim sPat As String sPat = "^[\s\S]{1," & NumChars & "}\." Set re = CreateObject("vbscript.regexp") re.Pattern = sPat Set mc = re.Execute(str) reTruncateAtDot = mc(0) End Function ============================== The OP could use either: =reTruncateAtDot(A1) which would truncate at the last dot prior to the 501st character, or use an optional NumChars to change that. The regex with the 500 characters "hard-coded" would look like: "^[\s\S]{1,500}\." If you wanted to use your formula, and have it work with multiline text strings, then try: =regexpreplace(A1,"^([\s\S]*\.)?[\s\S]*$","$1") But you'd still need the second step to trim to 500 characters. Oh, and in my contribution, since the OP did not specify what he wants to happen if there are no dots in the first 500 characters, a VALUE error will be returned. That, obviously, could be changed depending on the OP's wishes. --ron |
#12
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
additional code would be needed in order to return the whole string of
text (assuming that is what the OP would want to happen when no periods are found within the text). ... Less code in my example: just omit the "?" :-) A couple of points. Your "code" is not really shorter if you include the VBA code required for the UDF. That was going roughly going to be my answer when I first read Bernd's response; but I think, given the trimmed portion of the previous responses in his posting and leaving aside the 500 character maximum truncation requested by the OP, he was simply just addressing my comment about needing *additional* code for my function to be able to return the entire text string whereas to do the same with his code only required removing the question mark. Yes, there is the question of the needed UDF code to make his formula line work as well as the 500 character truncation issue; but I'm pretty sure Bernd's posting was meant to focus in on this much narrower issue without addressing anything else. Rick But, more importantly, I don't believe your code will work with a multiline text string because of the limitations of the VBScript flavor of regex. In particular, there is no provision, in VBScript to enable <dot matches newline. OK, I just downloaded the UDF and tested it and, indeed, it does not work on multiline text strings. If you want to use regular expressions for this exercise, I would suggest the following UDF with a regex constructed according to the maximum length of the string (500 in this case, but there is an optional argument to change that: ========================================= Option Explicit Function reTruncateAtDot(str As String, _ Optional NumChars As Long = 500) As String Dim re As Object, mc As Object Dim sPat As String sPat = "^[\s\S]{1," & NumChars & "}\." Set re = CreateObject("vbscript.regexp") re.Pattern = sPat Set mc = re.Execute(str) reTruncateAtDot = mc(0) End Function ============================== The OP could use either: =reTruncateAtDot(A1) which would truncate at the last dot prior to the 501st character, or use an optional NumChars to change that. The regex with the 500 characters "hard-coded" would look like: "^[\s\S]{1,500}\." If you wanted to use your formula, and have it work with multiline text strings, then try: =regexpreplace(A1,"^([\s\S]*\.)?[\s\S]*$","$1") But you'd still need the second step to trim to 500 characters. Oh, and in my contribution, since the OP did not specify what he wants to happen if there are no dots in the first 500 characters, a VALUE error will be returned. That, obviously, could be changed depending on the OP's wishes. --ron |
#13
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Tue, 27 Nov 2007 16:36:10 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: That was going roughly going to be my answer when I first read Bernd's response; but I think, given the trimmed portion of the previous responses in his posting and leaving aside the 500 character maximum truncation requested by the OP, he was simply just addressing my comment about needing *additional* code for my function to be able to return the entire text string whereas to do the same with his code only required removing the question mark. Yes, there is the question of the needed UDF code to make his formula line work as well as the 500 character truncation issue; but I'm pretty sure Bernd's posting was meant to focus in on this much narrower issue without addressing anything else. Oh, I see. Of course, I don't believe his regex will work for multiline text strings, for the reason I pointed out in my response. And it seems likely to me that a 500 character text string in Excel would likely have at least one linefeed. So he should substitute "[\s\S]" for the "." in his regex. --ron |
#14
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
That was going roughly going to be my answer when I first read Bernd's
response; but I think, given the trimmed portion of the previous responses in his posting and leaving aside the 500 character maximum truncation requested by the OP, he was simply just addressing my comment about needing *additional* code for my function to be able to return the entire text string whereas to do the same with his code only required removing the question mark. Yes, there is the question of the needed UDF code to make his formula line work as well as the 500 character truncation issue; but I'm pretty sure Bernd's posting was meant to focus in on this much narrower issue without addressing anything else. Oh, I see. Of course, I don't believe his regex will work for multiline text strings, for the reason I pointed out in my response. And it seems likely to me that a 500 character text string in Excel would likely have at least one linefeed. So he should substitute "[\s\S]" for the "." in his regex. I'll take your word for the regex stuff... the last time I did anything with regular expressions was back in the mid-1980s on a UNIX based mini-computer... I'm afraid I remember next to nothing from back then (except that it was quite easy to write complex regular expressions whose operation was perfectly obvious during their construction but that one could not understand how they worked some 10 minutes or so later on<g). Rick |
#15
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
Hi Ron, hi Rick,
let us agree that regex are quite powerful. Since they are line- orientated (or we are facing line restrictions) we can wait and see whether my suggesion will work for the OP. And now let us wait for the next two dozen questions regarding string manipulation and let's count the overall sum of UDF rows plus calls to them (it)... Regards, Bernd |
#16
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Tue, 27 Nov 2007 17:53:14 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: (except that it was quite easy to write complex regular expressions whose operation was perfectly obvious during their construction but that one could not understand how they worked some 10 minutes or so later on<g). That is still the case! --ron |
#17
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Tue, 27 Nov 2007 14:55:12 -0800 (PST), Bernd P wrote:
Hi Ron, hi Rick, let us agree that regex are quite powerful. Since they are line- orientated (or we are facing line restrictions) we can wait and see whether my suggesion will work for the OP. And now let us wait for the next two dozen questions regarding string manipulation and let's count the overall sum of UDF rows plus calls to them (it)... Regards, Bernd Why do you write they are "line-oriented"? At least in the few flavors with which I am familiar, multiline implementations are common. The issue here is that with VBA, setting multiline to TRUE (which is allowed in the UDF you recommended) only changes the behavior of "^" and "$", and not the behavior of "." as is true for other flavors. Simple enough to work around, though. --ron |
#18
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Tue, 27 Nov 2007 18:50:17 -0500, Ron Rosenfeld
wrote: On Tue, 27 Nov 2007 14:55:12 -0800 (PST), Bernd P wrote: Hi Ron, hi Rick, let us agree that regex are quite powerful. Since they are line- orientated (or we are facing line restrictions) we can wait and see whether my suggesion will work for the OP. And now let us wait for the next two dozen questions regarding string manipulation and let's count the overall sum of UDF rows plus calls to them (it)... Regards, Bernd Why do you write they are "line-oriented"? At least in the few flavors with which I am familiar, multiline implementations are common. The issue here is that with VBA, setting multiline to TRUE (which is allowed in the UDF you recommended) only changes the behavior of "^" and "$", and not the behavior of "." as is true for other flavors. Simple enough to work around, though. --ron I really should have included the qualification that the original Unix editors, and regex engines were, indeed, line oriented. However, when I first worked with Unix (actually I think it was Xenix, if I recall correctly) back in the early '80's, vi (a screen editor) had already been developed and was pretty widely distributed. --ron |
#19
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Tue, 27 Nov 2007 18:50:17 -0500, Ron Rosenfeld
wrote: On Tue, 27 Nov 2007 14:55:12 -0800 (PST), Bernd P wrote: Hi Ron, hi Rick, let us agree that regex are quite powerful. Since they are line- orientated (or we are facing line restrictions) we can wait and see whether my suggesion will work for the OP. And now let us wait for the next two dozen questions regarding string manipulation and let's count the overall sum of UDF rows plus calls to them (it)... Regards, Bernd Why do you write they are "line-oriented"? At least in the few flavors with which I am familiar, multiline implementations are common. The issue here is that with VBA, setting multiline to TRUE (which is allowed in the UDF you recommended) only changes the behavior of "^" and "$", and not the behavior of "." as is true for other flavors. Simple enough to work around, though. --ron Just for kicks, if the OP does not want to return an error if there are no dots in the string, here is a regex that will 1. Return a maximum of 500 characters and, in order of priority a. Return up to the last "." if there is one b. If no ".", then split at the last <space c. If no "." or <space, return 500 characters. (note that it does not return the ending <space if that is what it is splitting on. It should work with multiline text strings in VBA: "^([\s\S]{1,500}\.|[\s\S]{1,500}(?= )|[\s\S]{1,500})" --ron |
#20
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
Just for kicks, if the OP does not want to return an error if there are no
dots in the string, here is a regex that will 1. Return a maximum of 500 characters and, in order of priority a. Return up to the last "." if there is one b. If no ".", then split at the last <space c. If no "." or <space, return 500 characters. (note that it does not return the ending <space if that is what it is splitting on. It should work with multiline text strings in VBA: "^([\s\S]{1,500}\.|[\s\S]{1,500}(?= )|[\s\S]{1,500})" I like the "split at last space" condition 'challenge'. A little more "wordy" than your regex solution, but still a one-liner (my own self-imposed restriction) VBA function... Function Truncate(ByVal Source As String, _ Optional TrimAt As Long = 30000) As String Truncate = RTrim$(Left(Source, InStrRev(Left(Source, TrimAt), ".") - _ (InStrRev(Left(Source, TrimAt), " ")) * _ (Not Left(Source, TrimAt) Like "*.*") - _ (Len(Left(Source, TrimAt))) * _ (Not Left(Source, TrimAt) Like "*[. ]*"))) End Function Note that I still left the optional TrimAt argument so the OP can set it to 500 (or any other value desired) in the calling formula. Rick |
#21
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
I like the "split at last space" condition 'challenge'. A little more
"wordy" than your regex solution, but still a one-liner (my own self-imposed restriction) VBA function... Function Truncate(ByVal Source As String, _ Optional TrimAt As Long = 30000) As String Truncate = RTrim$(Left(Source, InStrRev(Left(Source, TrimAt), ".") - _ (InStrRev(Left(Source, TrimAt), " ")) * _ (Not Left(Source, TrimAt) Like "*.*") - _ (Len(Left(Source, TrimAt))) * _ (Not Left(Source, TrimAt) Like "*[. ]*"))) End Function Note that I still left the optional TrimAt argument so the OP can set it to 500 (or any other value desired) in the calling formula. And this is how I would write the function in "real life" (that is, without forcing it to a one-liner)... Function Truncate(ByVal Text As String, _ Optional TrimAt As Long = 30000) As String Text = Left$(Text, TrimAt) Truncate = Left$(Text, InStrRev(Text, ".")) If Len(Truncate) = 0 Then Truncate = RTrim(Left(Text, InStrRev(Text, " "))) If Len(Truncate) = 0 Then Truncate = Text End Function Note: I changed some argument names to avoid using line continuations Rick |
#22
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
I like the "split at last space" condition 'challenge'. A little more
"wordy" than your regex solution, but still a one-liner (my own self-imposed restriction) VBA function... Function Truncate(ByVal Source As String, _ Optional TrimAt As Long = 30000) As String Truncate = RTrim$(Left(Source, InStrRev(Left(Source, TrimAt), ".") - _ (InStrRev(Left(Source, TrimAt), " ")) * _ (Not Left(Source, TrimAt) Like "*.*") - _ (Len(Left(Source, TrimAt))) * _ (Not Left(Source, TrimAt) Like "*[. ]*"))) End Function Note that I still left the optional TrimAt argument so the OP can set it to 500 (or any other value desired) in the calling formula. And this is how I would write the function in "real life" (that is, without forcing it to a one-liner)... Function Truncate(ByVal Text As String, _ Optional TrimAt As Long = 30000) As String Text = Left$(Text, TrimAt) Truncate = Left$(Text, InStrRev(Text, ".")) If Len(Truncate) = 0 Then Truncate = RTrim(Left(Text, InStrRev(Text, " "))) If Len(Truncate) = 0 Then Truncate = Text End Function Note: I changed some argument names to avoid using line continuations Damn! The line wrapped at an 'unfortunate' spot (a blank) anyway. Here is the same function renamed to Chop instead of Truncate; it should have each statement fit on a single line without wrapping... Function Chop(ByVal Text As String, _ Optional TrimAt As Long = 30000) As String Text = Left$(Text, TrimAt) Chop = Left$(Text, InStrRev(Text, ".")) If Len(Chop) = 0 Then Chop = RTrim(Left(Text, InStrRev(Text, " "))) If Len(Chop) = 0 Then Chop = Text End Function Rick |
#23
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
On Wed, 28 Nov 2007 11:07:53 -0500, "Rick Rothstein \(MVP - VB\)"
wrote: Note that I still left the optional TrimAt argument so the OP can set it to 500 (or any other value desired) in the calling formula. That is still a feature of the UDF I wrote to test the regex; it's similar to the previous one I posted with ...Optional NumChars as Long = 500 ... but the regex the UDF constructs substitutes NumChars for the 500 in the one I posted. By the way, there is a program called RegexBuddy which makes designing and debugging regexes quite simple. --ron |
#24
Posted to microsoft.public.excel.misc
|
|||
|
|||
How to trim a string to the last "."
By the way, there is a program called RegexBuddy which makes
designing and debugging regexes quite simple. I looked RegexBuddy up via Google... that looks like an interesting product. I bookmarked it in the event I ever decide to get back into Regular Expression. Thanks for mentioning it. Rick |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
change "true" and "false" to "availble" and "out of stock" | Excel Worksheet Functions | |||
HELP on "left","right","find","len","substitute" functions | Excel Discussion (Misc queries) | |||
Count occurences of "1"/"0" (or"TRUE"/"FALSE") in a row w. conditions in the next | New Users to Excel | |||
Complex if test program possible? If "value" "value", paste "value" in another cell? | Excel Discussion (Misc queries) | |||
Insert "-" in text "1234567890" to have a output like this"123-456-7890" | Excel Discussion (Misc queries) |