Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
I have 11,000 long text strings in Excel... I had used LEFT operation to
limit the strings to 500 charaters, now I need to trim the strings to the last available "." so that I can delete those uncompleted sentence after the last "." I tried to use "Text to Column", but it does not give me good result... Please HELP!! Thank you! |
#2
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
With the original text in A1
This formula (shown in segments for readability) shortens that string to 500 characters, then truncates it after the last period: B1: =LEFT(LEFT(A1,500),FIND(CHAR(7), SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1,50 0))- LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1) Copy that formula down as far as you need. Is that something you can work with? -------------------------- Regards, Ron Microsoft MVP (Excel) (XL2003, Win XP) "VCKW" wrote in message ... I have 11,000 long text strings in Excel... I had used LEFT operation to limit the strings to 500 charaters, now I need to trim the strings to the last available "." so that I can delete those uncompleted sentence after the last "." I tried to use "Text to Column", but it does not give me good result... Please HELP!! Thank you! |
#3
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
On Sun, 25 Nov 2007 00:13:31 -0500, "Ron Coderre"
wrote: This formula (shown in segments for readability) shortens that string to 500 characters, then truncates it after the last period: B1: =LEFT(LEFT(A1,500),FIND(CHAR(7), SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1,5 00))- LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1) Your formula also returns the character following the ".". Suggest you omit the last "+1" --ron |
#4
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Thanks, Ron. Good catch.
-------------------------- Best Regards, Ron Microsoft MVP (Excel) (XL2003, Win XP) "Ron Rosenfeld" wrote in message ... On Sun, 25 Nov 2007 00:13:31 -0500, "Ron Coderre" wrote: This formula (shown in segments for readability) shortens that string to 500 characters, then truncates it after the last period: B1: =LEFT(LEFT(A1,500),FIND(CHAR(7), SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1, 500))- LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1) Your formula also returns the character following the ".". Suggest you omit the last "+1" --ron |
#5
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
I realize that, using a maximum of 500 characters, you probably will never
run into the situation where there are no periods in the (initially) truncated string of text; however, if you should ever want to modify this formula for use in a situation where that possibility exists (that is, if you ever need to change the 500 to something much, much less), then the following function will return the string of text, truncated to 500 characters, if there is no periods within the first 500 characters; also, if A1 is empty, the formula returns the empty string. =LEFT(A1,FIND(CHAR(7),SUBSTITUTE(LEFT(A1,500)&".", ".",CHAR(7),NOT(ISNUMBER(FIND(".",LEFT(A1,500))))+ LEN(LEFT(A1,500))-LEN(SUBSTITUTE(LEFT(A1,500),".",""))))) Although the formula is similar to the one Ron posted, it is built slightly differently to account for the functionality I outlined above; however, I did use the same CHAR(7) substitution character that Ron used. Rick "VCKW" wrote in message ... I have 11,000 long text strings in Excel... I had used LEFT operation to limit the strings to 500 charaters, now I need to trim the strings to the last available "." so that I can delete those uncompleted sentence after the last "." I tried to use "Text to Column", but it does not give me good result... Please HELP!! Thank you! |
#6
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Thanks everyone!!! :) It works the way I want and this save me hundred of
hours!! |
#7
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Hello,
You already got some solutions, but just for the fun of it: =regexpreplace(A1,"^(.*\.)?.*$","$1") This UDF you can get he http://www.sulprobil.com/html/regexp.html Regards, Bernd |
#8
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
You already got some solutions, but just for the fun of it:
=regexpreplace(A1,"^(.*\.)?.*$","$1") This UDF you can get he http://www.sulprobil.com/html/regexp.html Well, if you are looking for a UDF type solution, this one-liner would do what your code does, as it applies to the OP's request (assuming you added the part to initially truncate the text at 500 characters before looking for the last period)... Function TruncateAtPeriod(ByVal Source As String, _ Optional TrimAt As Long = 30000) As String TruncateAtPeriod = Left(Source, InStrRev(Left(Source, TrimAt), ".")) End Function where the OP would call it like this... =TruncateAtPeriod(A1,500) Both of our code returns the empty string if, in the unlikely event, no periods exist within the text (in my case, prior to the 500th character), so additional code would be needed in order to return the whole string of text (assuming that is what the OP would want to happen when no periods are found within the text). Rick |
#9
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Hello Rick,
.... additional code would be needed in order to return the whole string of text (assuming that is what the OP would want to happen when no periods are found within the text). .... Less code in my example: just omit the "?" :-) Regards, Bernd |
#10
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
additional code would be needed in order to return the whole string of
text (assuming that is what the OP would want to happen when no periods are found within the text). ... Less code in my example: just omit the "?" :-) Well, I wasn't talking about a *lot* of additional code. Still a one-liner.... Function TruncateAtPeriod(ByVal Source As String, _ Optional TrimAt As Long = 30000) As String TruncateAtPeriod = Left(Source, InStrRev(Left(Source, TrimAt), ".") - _ Len(Source) * (Not Source Like "*.*")) End Function Besides, you still have to include *more* code somewhere to perform the initial truncation (to 500 characters).<g Rick |
#11
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
On Tue, 27 Nov 2007 09:52:56 -0800 (PST), Bernd P wrote:
Hello Rick, ... additional code would be needed in order to return the whole string of text (assuming that is what the OP would want to happen when no periods are found within the text). ... Less code in my example: just omit the "?" :-) Regards, Bernd Bernd, A couple of points. Your "code" is not really shorter if you include the VBA code required for the UDF. But, more importantly, I don't believe your code will work with a multiline text string because of the limitations of the VBScript flavor of regex. In particular, there is no provision, in VBScript to enable <dot matches newline. OK, I just downloaded the UDF and tested it and, indeed, it does not work on multiline text strings. If you want to use regular expressions for this exercise, I would suggest the following UDF with a regex constructed according to the maximum length of the string (500 in this case, but there is an optional argument to change that: ========================================= Option Explicit Function reTruncateAtDot(str As String, _ Optional NumChars As Long = 500) As String Dim re As Object, mc As Object Dim sPat As String sPat = "^[\s\S]{1," & NumChars & "}\." Set re = CreateObject("vbscript.regexp") re.Pattern = sPat Set mc = re.Execute(str) reTruncateAtDot = mc(0) End Function ============================== The OP could use either: =reTruncateAtDot(A1) which would truncate at the last dot prior to the 501st character, or use an optional NumChars to change that. The regex with the 500 characters "hard-coded" would look like: "^[\s\S]{1,500}\." If you wanted to use your formula, and have it work with multiline text strings, then try: =regexpreplace(A1,"^([\s\S]*\.)?[\s\S]*$","$1") But you'd still need the second step to trim to 500 characters. Oh, and in my contribution, since the OP did not specify what he wants to happen if there are no dots in the first 500 characters, a VALUE error will be returned. That, obviously, could be changed depending on the OP's wishes. --ron |
#12
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
additional code would be needed in order to return the whole string of
text (assuming that is what the OP would want to happen when no periods are found within the text). ... Less code in my example: just omit the "?" :-) A couple of points. Your "code" is not really shorter if you include the VBA code required for the UDF. That was going roughly going to be my answer when I first read Bernd's response; but I think, given the trimmed portion of the previous responses in his posting and leaving aside the 500 character maximum truncation requested by the OP, he was simply just addressing my comment about needing *additional* code for my function to be able to return the entire text string whereas to do the same with his code only required removing the question mark. Yes, there is the question of the needed UDF code to make his formula line work as well as the 500 character truncation issue; but I'm pretty sure Bernd's posting was meant to focus in on this much narrower issue without addressing anything else. Rick But, more importantly, I don't believe your code will work with a multiline text string because of the limitations of the VBScript flavor of regex. In particular, there is no provision, in VBScript to enable <dot matches newline. OK, I just downloaded the UDF and tested it and, indeed, it does not work on multiline text strings. If you want to use regular expressions for this exercise, I would suggest the following UDF with a regex constructed according to the maximum length of the string (500 in this case, but there is an optional argument to change that: ========================================= Option Explicit Function reTruncateAtDot(str As String, _ Optional NumChars As Long = 500) As String Dim re As Object, mc As Object Dim sPat As String sPat = "^[\s\S]{1," & NumChars & "}\." Set re = CreateObject("vbscript.regexp") re.Pattern = sPat Set mc = re.Execute(str) reTruncateAtDot = mc(0) End Function ============================== The OP could use either: =reTruncateAtDot(A1) which would truncate at the last dot prior to the 501st character, or use an optional NumChars to change that. The regex with the 500 characters "hard-coded" would look like: "^[\s\S]{1,500}\." If you wanted to use your formula, and have it work with multiline text strings, then try: =regexpreplace(A1,"^([\s\S]*\.)?[\s\S]*$","$1") But you'd still need the second step to trim to 500 characters. Oh, and in my contribution, since the OP did not specify what he wants to happen if there are no dots in the first 500 characters, a VALUE error will be returned. That, obviously, could be changed depending on the OP's wishes. --ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
change "true" and "false" to "availble" and "out of stock" | Excel Worksheet Functions | |||
HELP on "left","right","find","len","substitute" functions | Excel Discussion (Misc queries) | |||
Count occurences of "1"/"0" (or"TRUE"/"FALSE") in a row w. conditions in the next | New Users to Excel | |||
Complex if test program possible? If "value" "value", paste "value" in another cell? | Excel Discussion (Misc queries) | |||
Insert "-" in text "1234567890" to have a output like this"123-456-7890" | Excel Discussion (Misc queries) |