Numbers of repeats of a string
Dear all,
I am looking to detect repeats (sequences of letters or numbers which are repeating at least twice within a string) in some strings containing between 20-5000 letters. Has somebody any idea how could I perform that using an Excel macro? I would like that the string to be evaluated could be on cell "A1" and that the detected repeats could be listed bellow A3; and that the number of times that they appear in the sentence could be listed bellow cells B3, as the following small example: QGAGGAAGGAGQ 4 Repeats detected Number GA 3 AG 3 GAG 2 GG 2 Somebody could help me? Thanks in advance, Luciano |
Numbers of repeats of a string
Hi Luciano,
You could use code like: Sub TestString() Dim BigStr As String Dim MyStr As String Dim TmpStr As String BigStr = "The quick brown fox jumps over the lazy dog." MyStr = InputBox("String to Find") TmpStr = Replace(BigStr, MyStr, "", , , vbBinaryCompare) MsgBox "Repeats of " & MyStr & " in Test String: " & (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Sub Using vbBinaryCompare forces a case-sensitive test (eg "the" is not counted in a search for "The"). If you don't want the test to be case sensitive, use vbTextCompare instead. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in message ... Dear all, I am looking to detect repeats (sequences of letters or numbers which are repeating at least twice within a string) in some strings containing between 20-5000 letters. Has somebody any idea how could I perform that using an Excel macro? I would like that the string to be evaluated could be on cell "A1" and that the detected repeats could be listed bellow A3; and that the number of times that they appear in the sentence could be listed bellow cells B3, as the following small example: QGAGGAAGGAGQ 4 Repeats detected Number GA 3 AG 3 GAG 2 GG 2 Somebody could help me? Thanks in advance, Luciano |
Numbers of repeats of a string
Dear macropod,
I have tested your code, but it did not worked for my need, Thank you anyway, Luciano On 14 abr, 21:32, "macropod" wrote: Hi Luciano, You could use code like: Sub TestString() Dim BigStr As String Dim MyStr As String Dim TmpStr As String BigStr = "The quick brown fox jumps over the lazy dog." MyStr = InputBox("String to Find") TmpStr = Replace(BigStr, MyStr, "", , , vbBinaryCompare) MsgBox "Repeats of " & MyStr & " in Test String: " & (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Sub Using vbBinaryCompare forces a case-sensitive test (eg "the" is not counted in a search for "The"). If you don't want the test to be case sensitive, use vbTextCompare instead. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in ... Dear all, I am looking to detect repeats (sequences of letters or numbers which are repeating at least twice within a string) in some strings containing between 20-5000 letters. Has somebody any idea how could I perform that using an Excel macro? I would like that the string to be evaluated could be on cell "A1" and that the detected repeats could be listed bellow A3; and that the number of times that they appear in the sentence could be listed bellow cells B3, as the following small example: QGAGGAAGGAGQ 4 Repeats detected * * * * Number GA * * * * * * * *3 AG * * * * * * * * * * *3 GAG * * * * * * * * * * 2 GG * * * * * * * * * * * * * * *2 Somebody could help me? Thanks in advance, Luciano |
Numbers of repeats of a string
Hi Luciano,
You can implement the logic fvrom the code I posted earlier via a User-Defined Function: Public Function Repeats(BigStr As Variant, MyStr As Variant, Compare As Boolean) As Integer Dim TmpStr As String TmpStr = Replace(BigStr, MyStr, "", , , Compare ^ 2) Repeats = (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Function If you place the function in a standard vba module, you can use a formula like: =Repeats(A1,A2,1) or =Repeats(A1,A2,0) to retrieve the number of times the string in A2 is found in A1. The 1/0 parameter specifies a case-sensitive or case-insensitive match, respectively. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in message ... Dear macropod, I have tested your code, but it did not worked for my need, Thank you anyway, Luciano On 14 abr, 21:32, "macropod" wrote: Hi Luciano, You could use code like: Sub TestString() Dim BigStr As String Dim MyStr As String Dim TmpStr As String BigStr = "The quick brown fox jumps over the lazy dog." MyStr = InputBox("String to Find") TmpStr = Replace(BigStr, MyStr, "", , , vbBinaryCompare) MsgBox "Repeats of " & MyStr & " in Test String: " & (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Sub Using vbBinaryCompare forces a case-sensitive test (eg "the" is not counted in a search for "The"). If you don't want the test to be case sensitive, use vbTextCompare instead. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in ... Dear all, I am looking to detect repeats (sequences of letters or numbers which are repeating at least twice within a string) in some strings containing between 20-5000 letters. Has somebody any idea how could I perform that using an Excel macro? I would like that the string to be evaluated could be on cell "A1" and that the detected repeats could be listed bellow A3; and that the number of times that they appear in the sentence could be listed bellow cells B3, as the following small example: QGAGGAAGGAGQ 4 Repeats detected Number GA 3 AG 3 GAG 2 GG 2 Somebody could help me? Thanks in advance, Luciano |
Numbers of repeats of a string
Hi macropod,
In my case I will not looking for a specific repeat. My intention is list all subsequences that are present in a full string in A1 (e.g. for the string AADFDGAAV, the string AA appear 2 times and therefore should be counted. At the right side of the subsequences (repeats) listed it will be necessary to figure out the number of times that each one of them appear. And in another cell the number of different repeats that could be detected. Did you understand? Thanks in advance, Luciano On 15 abr, 00:23, "macropod" wrote: Hi Luciano, You can implement the logic fvrom the code I posted earlier via a User-Defined Function: Public Function Repeats(BigStr As Variant, MyStr As Variant, Compare As Boolean) As Integer Dim TmpStr As String TmpStr = Replace(BigStr, MyStr, "", , , Compare ^ 2) Repeats = (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Function If you place the function in a standard vba module, you can use a formula like: =Repeats(A1,A2,1) or =Repeats(A1,A2,0) to retrieve the number of times the string in A2 is found in A1. The 1/0 parameter specifies a case-sensitive or case-insensitive match, respectively. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in ... Dear macropod, I have tested your code, but it did not worked for my need, Thank you anyway, Luciano On 14 abr, 21:32, "macropod" wrote: Hi Luciano, You could use code like: Sub TestString() Dim BigStr As String Dim MyStr As String Dim TmpStr As String BigStr = "The quick brown fox jumps over the lazy dog." MyStr = InputBox("String to Find") TmpStr = Replace(BigStr, MyStr, "", , , vbBinaryCompare) MsgBox "Repeats of " & MyStr & " in Test String: " & (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Sub Using vbBinaryCompare forces a case-sensitive test (eg "the" is not counted in a search for "The"). If you don't want the test to be case sensitive, use vbTextCompare instead. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in ... Dear all, I am looking to detect repeats (sequences of letters or numbers which are repeating at least twice within a string) in some strings containing between 20-5000 letters. Has somebody any idea how could I perform that using an Excel macro? I would like that the string to be evaluated could be on cell "A1" and that the detected repeats could be listed bellow A3; and that the number of times that they appear in the sentence could be listed bellow cells B3, as the following small example: QGAGGAAGGAGQ 4 Repeats detected Number GA 3 AG 3 GAG 2 GG 2 Somebody could help me? Thanks in advance, Luciano |
Numbers of repeats of a string
Hello Luciano,
If I understand you correctly, the correct result for your example is: AG 3 GA 3 AGG 2 AGGA 2 GAG 2 GG 2 GGA 2 You can get this if you array-enter into A3:B9: =GSort(pfreq(TRANSPOSE(GenSubStrings(A1))),"DA","N S","21") GSort you will find he http://www.sulprobil.com/html/sort_vba.html Pfreq is he http://www.sulprobil.com/html/pfreq.html Function GenSubStrings(s As String) As Variant ReDim v(1 To 10000) As Variant Dim i As Long, j As Long, k As Long 'Generate all substrings of s with length 2 to Len(s)-1 For i = 2 To Len(s) - 1 For j = 1 To Len(s) - i + 1 k = k + 1 v(k) = Mid(s, j, i) Next j Next i ReDim Preserve v(1 To k) As Variant GenSubStrings = v End Function Regards, Bernd |
Numbers of repeats of a string
Hi Bernd,
Your suggestion is excellent! However, I would like to list only the subsequences that appear 2 times or more. That ones that were observed only one time should not be listed. Thanks in advance, Luciano On 15 abr, 07:42, Bernd P wrote: Hello Luciano, If I understand you correctly, the correct result for your example is: AG * * *3 GA * * *3 AGG * * 2 AGGA * *2 GAG * * 2 GG * * *2 GGA * * 2 You can get this if you array-enter into A3:B9: =GSort(pfreq(TRANSPOSE(GenSubStrings(A1))),"DA","N S","21") GSort you will find hehttp://www.sulprobil.com/html/sort_vba.html Pfreq is hehttp://www.sulprobil.com/html/pfreq.html Function GenSubStrings(s As String) As Variant ReDim v(1 To 10000) As Variant Dim i As Long, j As Long, k As Long 'Generate all substrings of s with length 2 to Len(s)-1 For i = 2 To Len(s) - 1 * * For j = 1 To Len(s) - i + 1 * * * * k = k + 1 * * * * v(k) = Mid(s, j, i) * * Next j Next i ReDim Preserve v(1 To k) As Variant GenSubStrings = v End Function Regards, Bernd |
Numbers of repeats of a string
Hello Luciano,
That was left as homework exercise :-) Here we a Function ShowRepetitions(v As Variant) As Variant Dim i As Long, j As Long On Error Resume Next Do While v(i, 2) 1 i = i + 1 Loop If v(i, 2) = 1 Then i = i - 1 On Error GoTo 0 ReDim vR(1 To Application.Caller.Rows.Count, 1 To 2) As Variant For j = 1 To Application.Caller.Rows.Count If j i Then vR(j, 1) = "" vR(j, 2) = "" Else vR(j, 1) = v(j, 1) vR(j, 2) = v(j, 2) End If Next j ShowRepetitions = vR End Function Array-enter: =showrepetitions(GSort(pfreq(TRANSPOSE(GenSubStrin gs (A1))),"DA","NS","21")) Regards, Bernd |
Numbers of repeats of a string
Dear Bernd,
Thank you very much for the code. It is working fine, but for some situations the strings size is limiting its use since it takes several minutes, and even hours to perform the calculation. Unfortunately 90% of the cases I have strings bigger than 200 characters. :( Thank you again, Luciano On 15 abr, 11:38, Bernd P wrote: Hello Luciano, That was left as homework exercise :-) Here we a Function ShowRepetitions(v As Variant) As Variant Dim i As Long, j As Long On Error Resume Next Do While v(i, 2) 1 * * i = i + 1 Loop If v(i, 2) = 1 Then i = i - 1 On Error GoTo 0 ReDim vR(1 To Application.Caller.Rows.Count, 1 To 2) As Variant For j = 1 To Application.Caller.Rows.Count * * If j i Then * * * * vR(j, 1) = "" * * * * vR(j, 2) = "" * * Else * * * * vR(j, 1) = v(j, 1) * * * * vR(j, 2) = v(j, 2) * * End If Next j ShowRepetitions = vR End Function Array-enter: =showrepetitions(GSort(pfreq(TRANSPOSE(GenSubStrin gs (A1))),"DA","NS","21")) Regards, Bernd |
Numbers of repeats of a string
Dear Bernd,
I can not understand why, but I have always to put some string on cell "A1" to have strings determined if not the code crashes and appear errors on cells bellow A1. Have you any idea about what is happening? Thanks in advance, Luciano On 15 abr, 13:32, Luciano Paulino da Silva wrote: Dear Bernd, Thank you very much for the code. It is working fine, but for some situations the strings size is limiting its use since it takes several minutes, and even hours to perform the calculation. Unfortunately 90% of the cases I have strings bigger than 200 characters. :( Thank you again, Luciano On 15 abr, 11:38, Bernd P wrote: Hello Luciano, That was left as homework exercise :-) Here we a Function ShowRepetitions(v As Variant) As Variant Dim i As Long, j As Long On Error Resume Next Do While v(i, 2) 1 * * i = i + 1 Loop If v(i, 2) = 1 Then i = i - 1 On Error GoTo 0 ReDim vR(1 To Application.Caller.Rows.Count, 1 To 2) As Variant For j = 1 To Application.Caller.Rows.Count * * If j i Then * * * * vR(j, 1) = "" * * * * vR(j, 2) = "" * * Else * * * * vR(j, 1) = v(j, 1) * * * * vR(j, 2) = v(j, 2) * * End If Next j ShowRepetitions = vR End Function Array-enter: =showrepetitions(GSort(pfreq(TRANSPOSE(GenSubStrin gs (A1))),"DA","NS","21")) Regards, Bernd |
Numbers of repeats of a string
Hello Luciano,
The runtime depends heavily on the length of your strings and the length of the substrings. If you can live with limited substrings (length shorter than a certain threshold) then you can reduce the runtime substantially. Change the line in GenSubStrings For i = 2 To Len(s) - 1 to For i = 2 To worksheetfunction.min(10,Len(s) - 1) for example. The formula I gave you referred to A1. So your input is expected to be in A1. Do you need this to be changed? Regards, Bernd |
Numbers of repeats of a string
Hi Luciano,
Fair enough - I see Bernard P has given you some pointers. Do note, though, that testing strings of 5000 letters for all possible repeat combinations can be extremely time consuming. After all, in a string of 5000 letters, duplicate matches of as many as 4999 characters are possible (consider a string of 5000 'A's, for example). -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in message ... Hi macropod, In my case I will not looking for a specific repeat. My intention is list all subsequences that are present in a full string in A1 (e.g. for the string AADFDGAAV, the string AA appear 2 times and therefore should be counted. At the right side of the subsequences (repeats) listed it will be necessary to figure out the number of times that each one of them appear. And in another cell the number of different repeats that could be detected. Did you understand? Thanks in advance, Luciano On 15 abr, 00:23, "macropod" wrote: Hi Luciano, You can implement the logic fvrom the code I posted earlier via a User-Defined Function: Public Function Repeats(BigStr As Variant, MyStr As Variant, Compare As Boolean) As Integer Dim TmpStr As String TmpStr = Replace(BigStr, MyStr, "", , , Compare ^ 2) Repeats = (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Function If you place the function in a standard vba module, you can use a formula like: =Repeats(A1,A2,1) or =Repeats(A1,A2,0) to retrieve the number of times the string in A2 is found in A1. The 1/0 parameter specifies a case-sensitive or case-insensitive match, respectively. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in ... Dear macropod, I have tested your code, but it did not worked for my need, Thank you anyway, Luciano On 14 abr, 21:32, "macropod" wrote: Hi Luciano, You could use code like: Sub TestString() Dim BigStr As String Dim MyStr As String Dim TmpStr As String BigStr = "The quick brown fox jumps over the lazy dog." MyStr = InputBox("String to Find") TmpStr = Replace(BigStr, MyStr, "", , , vbBinaryCompare) MsgBox "Repeats of " & MyStr & " in Test String: " & (Len(BigStr) - Len(TmpStr)) / Len(MyStr) End Sub Using vbBinaryCompare forces a case-sensitive test (eg "the" is not counted in a search for "The"). If you don't want the test to be case sensitive, use vbTextCompare instead. -- Cheers macropod [MVP - Microsoft Word] "Luciano Paulino da Silva" wrote in ... Dear all, I am looking to detect repeats (sequences of letters or numbers which are repeating at least twice within a string) in some strings containing between 20-5000 letters. Has somebody any idea how could I perform that using an Excel macro? I would like that the string to be evaluated could be on cell "A1" and that the detected repeats could be listed bellow A3; and that the number of times that they appear in the sentence could be listed bellow cells B3, as the following small example: QGAGGAAGGAGQ 4 Repeats detected Number GA 3 AG 3 GAG 2 GG 2 Somebody could help me? Thanks in advance, Luciano |
All times are GMT +1. The time now is 07:52 AM. |
Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
ExcelBanter.com