View Single Post
  #8   Report Post  
Posted to microsoft.public.excel.programming
Luciano Paulino da Silva Luciano Paulino da Silva is offline
external usenet poster
 
Posts: 77
Default Palindromes and repeats

On 24 out, 10:33, John Coleman wrote:
On Oct 24, 6:16*am,LucianoPaulino da Silva



wrote:
On 24 out, 02:10, John Coleman wrote:


Dear John Coleman,
Thank you very much for your fantastic code. It is almost perfect for
my situation.
However, I need that instead of a same detected palindrome is being
repeated all the time it appear in the string, it should be shown once
a time in A column, and that B column should be shown the frequence of
occcurrences of that specific palindrome. Do you know some way to do
that?
Thank you,
Luciano


DearLuciano,


This should work:


'''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''


Function FindMaxInitPalin(SearchString As String) As String
* * Dim i As Long, j As Long
* * Dim init As String
* * Dim reflection As String
* * For i = 1 To Len(SearchString)
* * * * init = Mid(SearchString, 1, i)
* * * * reflection = Mid(init, i) & reflection
* * * * If init = reflection Then j = i
* * Next i
* * FindMaxInitPalin = Mid(SearchString, 1, j)
End Function


Function FindMaxPalins(SearchString As String) As Variant
* * Dim Palindromes As Object
* * Dim InputString As String
* * Dim chomp As String
* * Dim n As Long


* * Set Palindromes = CreateObject("Scripting.Dictionary")
* * InputString = SearchString
* * Do While Len(InputString) 0
* * * * chomp = FindMaxInitPalin(InputString)
* * * * InputString = Mid(InputString, Len(chomp) + 1)
* * * * If Len(chomp) 1 Then
* * * * * * If Not Palindromes.Exists(chomp) Then
* * * * * * * * Palindromes.Add chomp, 1
* * * * * * Else
* * * * * * * * Palindromes.Item(chomp) = Palindromes.Item(chomp) + 1
* * * * * * End If
* * * * End If
* * Loop
* * If Palindromes.count 0 Then Set FindMaxPalins = Palindromes
End Function


Sub Main()
* * Dim SearchString As String
* * Dim Results As Variant
* * Dim i As Long, n As Long
* * Dim palindrome As Variant


* * On Error GoTo no_palindromes
* * SearchString = InputBox("Enter a string")
* * Set Results = FindMaxPalins(SearchString)
* * n = Results.count
* * Range("A1").Value = n & " distinct palindromes found"
* * Range("A2").Value = "Palindrome"
* * Range("B2").Value = "Frequency"
* * For Each palindrome In Results.Keys
* * * * i = i + 1
* * * * Range("A2").Offset(i).Value = palindrome
* * * * Range("B2").Offset(i).Value = Results.Item(palindrome)
* * Next palindrome
* * Exit Sub
no_palindromes:
* * Range("A1").Value = "No palindromes found"
End Sub


'''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''*'''''''''''''''''
Also, there is no need to stick to an inputbox for entering the
string. It is easy enough to modify main() so that it gets its input
from a cell


Out of curiosity, what sort of things are you doing with this code?
Seems interesting.


-John Coleman


Dear John Coleman,
Thank you very much for your help. Let me try explain the aim of the
present study. Let me first introduce myself. My name isLuciano
Paulino da Silva, I`m a scientific researcher in the area of
bionanotechnology. In some of our studies we have characterized the
sequences of proteins and peptides that are formed by strings of 20
possible aminoacid residues commonly represented by letters. We always
needs a lot of statistical and mathematical comparisons among strings.
In this way, we have for some situations even thousands of these
strings and we are interested in analyze them from databanks. Some
time ago, I realized that some of these protein sequences have inside
them the presence of palindromes and repeats and such structures are
very important for their biological functions. In this way, this code
and others are important to test some situations in order to in some
days perform these analyses in large scale. In this way, if it would
be your interest I would be pleased if you accept collaborate with us.
Concerning your code, at present we need some additional things like:
1-The string shoud be typed in A1 cell instead use an inputbox;
2-In addition to palindromes, an identitical approach will have to be
performed to repeats in the strings;


Thanks in advance,
Luciano- Hide quoted text -


- Show quoted text -


DearLuciano,

I suspected that your interests were biological. I am a mathematician
who likes to program as a hobby (and sometimes as an aid to research
or teaching). I have a friend in the biology department who sought my
aid a while back in writing a program which generated random protein
strings with certain properties, and I recognized G,A,Q from your
original post as possibly being abbreviations for glycine, etc. I am
on sabbatical this semester, so have some time on my hands and would
be willing to help you out some (but am currently writing a paper - so
I don't have unlimitted time), though for more serious analysis you
should probably recruit a specialist in bioinformatics who understands
string processing and pattern matching much more than I do. Feel free
to email me if you want. My address is jcoleman followed by @ followed
by franciscan dot edu (don't want to make life too easy for robot
spammers).

What do you mean by repeats? Do you mean strings like GAQGAQ (where
the repeated sequence is adjacent) or GAQCGAQ (where the repeated
sequence can be separated by intervening letters)? And how would you
handle GAQGAQGAQ? Also - do you want all repeats, or do you want
repeats which are in some sense maximal (similar to the palindrome
code I wrote)? Please clarify.

I wrote a new main() to take its input from A1 and report lengths in
column C. The other two functions in the code were unchanged , so just
delete main() and replace it with:

Sub Main()
* * Dim SearchString As String
* * Dim Results As Variant
* * Dim i As Long, n As Long
* * Dim palindrome As Variant
* * Dim R As Range 'output anchor

* * On Error GoTo no_palindromes
* * Set Results = FindMaxPalins(Range("A1").Value)
* * Set R = Range("A3")
* * n = Results.count
* * R.Value = n & " distinct palindromes found"
* * R.Offset(1).Value = "Palindrome"
* * R.Offset(1, 1).Value = "Frequency"
* * R.Offset(1, 2).Value = "Length"
* * For Each palindrome In Results.Keys
* * * * i = i + 1
* * * * R.Offset(i + 1).Value = palindrome
* * * * R.Offset(i + 1, 1).Value = Results.Item(palindrome)
* * * * R.Offset(i + 1, 2).Value = Len(palindrome)
* * Next palindrome
* * Exit Sub
no_palindromes:
* * R.Value = "No palindromes found"
End Sub

Hope that helps

-John Coleman


Dear John Coleman,
Thak you very much for your code. It is perfect for this investigation
approach.
That`s nice, I`m a biologist but among my graduate students there are
a mathematician because your area is very important to our studies.
That`s interesting. I have also a smal VBA-based program to generate
random polypeptides and this is one of our interests.
I will prepare a small text telling you our interests in some string
processing and you will be able to tell us if could help us.

Yes, the repeat sequence could be adjacent or separated by intervening
letters. Both situations are possible on protein sequences. In the
case of GAQGAQGAQ it would be 3 GAQ repeats, using the same logic of
the palindromes. Yes, it is similar to the palindromes, and even some
palindromes are repeats but not all repeats are palindromes...
Thanks in advance,
Luciano