ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)
-   -   partial match help (https://www.excelbanter.com/excel-worksheet-functions/446444-partial-match-help.html)

cupcakeluv_333

partial match help
 
Hello,
I am in desperate need of help! Here are my details:
I have one column with different gene identities, such as "gi|351702631|gb|EHB05550.1|".
I have another column with the identities matched with gene descriptions, for example "351702631gb|EHB05550.1|EHB05550.1cGMP-gated cation channel alpha-1 [Heterocephalus glaber]".
I need to find the match for column 1 in column 2; however, as you can see, they are not exact matches. I need to put the match from column 2 into a 3rd column. The lengths of column 1 and 2 do not match.
Please help!
Thanks :)

Spencer101

Quote:

Originally Posted by cupcakeluv_333 (Post 1603201)
Hello,
I am in desperate need of help! Here are my details:
I have one column with different gene identities, such as "gi|351702631|gb|EHB05550.1|".
I have another column with the identities matched with gene descriptions, for example "351702631gb|EHB05550.1|EHB05550.1cGMP-gated cation channel alpha-1 [Heterocephalus glaber]".
I need to find the match for column 1 in column 2; however, as you can see, they are not exact matches. I need to put the match from column 2 into a 3rd column. The lengths of column 1 and 2 do not match.
Please help!
Thanks :)

Any chance you could post a sample workbook with a manually input example of what you want the result to look like?

Ron Rosenfeld[_2_]

partial match help
 
On Thu, 28 Jun 2012 03:35:08 +0000, cupcakeluv_333 wrote:


Hello,
I am in desperate need of help! Here are my details:
I have one column with different gene identities, such as
"gi|351702631|gb|EHB05550.1|".
I have another column with the identities matched with gene
descriptions, for example "351702631gb|EHB05550.1|EHB05550.1cGMP-gated
cation channel alpha-1 [Heterocephalus glaber]".
I need to find the match for column 1 in column 2; however, as you can
see, they are not exact matches. I need to put the match from column 2
into a 3rd column. The lengths of column 1 and 2 do not match.
Please help!
Thanks :)


This is not straightforward as it would require multiple substitutions in one or the other to develop a match. In other words, in the example you present, one would have to remove from the gene identity the leading "gi|" and the second "|" in order to develop a partial match. Without knowing how these gene identity strings, and gene defintion strings are constructed, it would be very difficult to develop an accurate algorithm to determine what kinds of matches are proper, and what are improper.

Some questions that come to mind have to do with the location of the pipes, especially since they are different in both instances;
the leading "gi|" in the gene identity string -- is there something at the beginning that can always be ignored?
the significance of the 2nd EHB05550.1 in the gene description string
how to determine how much of the gene identity has to match with the gene description in order to constitute a proper match
etc.



All times are GMT +1. The time now is 08:38 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com