Text manipulation
paulinoluciano wrote...
....
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADA OEKOQPPDAOPSKAEPQ
This sequence must be put in cell A2.
....
Rules:
a) Fragment the sequence before K but not always (you could have lost cut).
b) Sequence is not cut if K is found before FP
Results:
ASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOKSPADAO EKOQPPDAOPSKAEPQ
0 lost cut = Cutting the sequence all the time in which K is present
(The subsequences of this process should be put in B column:
AASSASDK
ASASDASFAFSASASADK
ASASAFPKQREWEAQEOK
SPADAOEK
OQPPDAOPSK
AEPQ
You could use formulas.
B2:
=LEFT($A$2,FIND("K",SUBSTITUTE($A$2,"FPK","###")))
B3:
=REPLACE(LEFT($A$2,FIND("K",SUBSTITUTE($A$2,"FPK", "###")&"K",
SUMPRODUCT(LEN(B$2:B2))+1)),1,SUMPRODUCT(LEN(B$2:B 2)),"")
Fill B3 down as needed. Filling it into B4:B8 (one cell more than
needed) gives B2:B8
AASSASDK
ASASDASFAFSASASADK
ASASAFPKQREWEAQEOK
SPADAOEK
OQPPDAOPSK
AEPQ
<blank
1 lost cut = Cutting the sequence after the first K present in the
sequence (The subsequences of this process should be put in C column::
AASSASDKASASDASFAFSASASADK
ASASAFPKQREWEAQEOKSPADAOEK
OQPPDAOPSKAEPQ
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ
You have all the information you need for this in column B.
C2:
=INDEX(B$2:B$99,2*ROWS(C$2:C2)-1)&INDEX(B$2:B$99,2*ROWS(C$2:C2))
Fill C2 down as needed. Filling it into C3:C5 (one more than needed)
gives C2:C5
AASSASDKASASDASFAFSASASADK
ASASAFPKQREWEAQEOKSPADAOEK
OQPPDAOPSKAEPQ
<blank
2 lost cut = = Cutting the sequence after the second K (just for the
third and following) present in the sequence (The subsequences of this
process should be put in D column:
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ
D2:
=INDEX(B$2:B$99,3*ROWS(D$2:D2)-2)&INDEX(B$2:B$99,3*ROWS(D$2:D2)-1)
&INDEX(B$2:B$99,3*ROWS(D$2:D2))
Fill D2 down as needed. Filling it into D3:D4 (one more than needed)
gives D2:D4
AASSASDKASASDASFAFSASASADKASASAFPKQREWEAQEOK
SPADAOEKOQPPDAOPSKAEPQ
<blank
Repair that in some cases I need lost cuts in which you cut after 1, 2,
3, 4,... specific characters.
I have to specify such rules in some place of the sheet containing the
precursor text.
The rules a
Cut after "XXX" (In this example I have put K but the some cell in the
sheet must contain what is the character in which the sequence will be
fragmented). In some cases it could be more than only one character
(e.g. K and R; nor necessarily together)
Cut before "XXX" (The cut may be after like previous example or before
the character)
Never before "XXX" (In some cases I have prohibitive situations; e.g.
It must not cut a sequence in K if K is preceeded by P or by RP)
The RP preceding K is redundant if P alone preceding K indicates a
prohibited situation. You'd only need to check for PK.
Never after "XXX" (Same for after)
Number of times that the character could be missed prior cut "XXX" (In
some place of the sheet I must explicit how many characters could be
"lost" prior cut (see example).
Generalizing the formulas above with the character to match in a cell
named CC and the prohibited sequence (in this case FPK) in a cell named
PS,
B2:
=LEFT($A$2,FIND(CC,SUBSTITUTE($A$2,PS,REPT("#",LEN (PS)))))
B3:
=REPLACE(LEFT($A$2,FIND(CC,SUBSTITUTE($A$2,PS,REPT ("#",LEN(PS)))&CC,
SUMPRODUCT(LEN(B$2:B2))+1)),1,SUMPRODUCT(LEN(B$2:B 2)),"")
The column C, D, etc. formulas wouldn't need to change.
If you have multiple prohibited sequences, then regular expressions
would be MUCH BETTER tools for doing this. Symbolic processing like
this is reducible to text processing, but Excel provides poor built-in
tools for text processing, but since it was meant to calculate numbers
this shouldn't be surprising.
|