ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)
-   -   Isolating string combinations (https://www.excelbanter.com/excel-worksheet-functions/240969-isolating-string-combinations.html)

Demosthenes

Isolating string combinations
 
This is a further iteration of a problem I've been working on and posted
earlier under the subject: "Question for Mr. T. Valko." The question is now
up for general consideration.

Say you have the following data:

AB
CD
CD
CE
CD
CF
AG
CD
CD
AB
CD
AB
CD
AB
CD
CF
CG
CE
CW

Which can be in one or two columns. I'm trying to analyze the makeup of the
"CX" occurrences (i.e., whether they are CD, CF, CG, etc), but ONLY IN CHAINS
longer than 4 occurrences (i.e., 4 C's in a row or longer). I could do this
if only I could figure out a way to create a helper column that is composed
of just these instances that I'm considering. So my question is: How do I
create a helper column that is composed of only the chains of 4 or longer?
Output for the above would be:

AB
CD
CD
CE
CD
CF

AB
CD
CF
CG
CE
CW

Does anyone have any thoughts on an easy way to do this?

Sam Wilson

Isolating string combinations
 
Hi,

You can do it with two helper columns. I'm assuming your column starts from
A2:

In B2 use the following and copy down
=IF(LEFT(A2,1) = LEFT(A1,1),B1,ROW())

In C2 use the following and copy down
=IF(COUNTIF(B:B,B2)3,A2,IF(COUNTIF(B:B,B3)4,A2," "))

Column C is then your helper column.

Sam


"Demosthenes" wrote:

This is a further iteration of a problem I've been working on and posted
earlier under the subject: "Question for Mr. T. Valko." The question is now
up for general consideration.

Say you have the following data:

AB
CD
CD
CE
CD
CF
AG
CD
CD
AB
CD
AB
CD
AB
CD
CF
CG
CE
CW

Which can be in one or two columns. I'm trying to analyze the makeup of the
"CX" occurrences (i.e., whether they are CD, CF, CG, etc), but ONLY IN CHAINS
longer than 4 occurrences (i.e., 4 C's in a row or longer). I could do this
if only I could figure out a way to create a helper column that is composed
of just these instances that I'm considering. So my question is: How do I
create a helper column that is composed of only the chains of 4 or longer?
Output for the above would be:

AB
CD
CD
CE
CD
CF

AB
CD
CF
CG
CE
CW

Does anyone have any thoughts on an easy way to do this?


Bernd P

Isolating string combinations
 
Hello Sam,

Your formula is not working. Copy down the AG value into A9 and A10,
for example.

My suggestion consists of two helper columns and one resulting column:
I assume data starts from row 2 and row 1 contains titles in all
columns (text). [If row 1 contains empty cells output in column D
starts from row 3]
Enter into B2:
=IF(LEFT(A2,1)="C",(B10)*(B1+1),--(SUMPRODUCT(--(LEFT(A3:A6,1)="C"))
=4))
Enter into C2:
=(B20)*(MAX(C$1:C1)+1+(C1=0))
Enter into D2:
=IF(ISERROR(MATCH(ROW()-1,C:C,0)),"",INDEX(A:A,MATCH(ROW()-1,C:C,0)))
and now copy C2:D2 down as far as necessary.

Regards,
Bernd

Sam Wilson

Isolating string combinations
 
Yep, I assumed each series of CX was preceded by a single AY for the pattern
posted. If that's not true it'll show series of As.


"Bernd P" wrote:

Hello Sam,

Your formula is not working. Copy down the AG value into A9 and A10,
for example.

My suggestion consists of two helper columns and one resulting column:
I assume data starts from row 2 and row 1 contains titles in all
columns (text). [If row 1 contains empty cells output in column D
starts from row 3]
Enter into B2:
=IF(LEFT(A2,1)="C",(B10)*(B1+1),--(SUMPRODUCT(--(LEFT(A3:A6,1)="C"))
=4))
Enter into C2:
=(B20)*(MAX(C$1:C1)+1+(C1=0))
Enter into D2:
=IF(ISERROR(MATCH(ROW()-1,C:C,0)),"",INDEX(A:A,MATCH(ROW()-1,C:C,0)))
and now copy C2:D2 down as far as necessary.

Regards,
Bernd


Demosthenes

Isolating string combinations
 
Bernd,

seems to work! thanks for your help. i think i was trying to make it too
complicated.

"Bernd P" wrote:

Hello Sam,

Your formula is not working. Copy down the AG value into A9 and A10,
for example.

My suggestion consists of two helper columns and one resulting column:
I assume data starts from row 2 and row 1 contains titles in all
columns (text). [If row 1 contains empty cells output in column D
starts from row 3]
Enter into B2:
=IF(LEFT(A2,1)="C",(B10)*(B1+1),--(SUMPRODUCT(--(LEFT(A3:A6,1)="C"))
=4))
Enter into C2:
=(B20)*(MAX(C$1:C1)+1+(C1=0))
Enter into D2:
=IF(ISERROR(MATCH(ROW()-1,C:C,0)),"",INDEX(A:A,MATCH(ROW()-1,C:C,0)))
and now copy C2:D2 down as far as necessary.

Regards,
Bernd



All times are GMT +1. The time now is 06:08 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
ExcelBanter.com