Home |
Search |
Today's Posts |
#1
![]() |
|||
|
|||
![]()
Hello,
I am in desperate need of help! Here are my details: I have one column with different gene identities, such as "gi|351702631|gb|EHB05550.1|". I have another column with the identities matched with gene descriptions, for example "351702631gb|EHB05550.1|EHB05550.1cGMP-gated cation channel alpha-1 [Heterocephalus glaber]". I need to find the match for column 1 in column 2; however, as you can see, they are not exact matches. I need to put the match from column 2 into a 3rd column. The lengths of column 1 and 2 do not match. Please help! Thanks :) |
#2
![]() |
|||
|
|||
![]() Quote:
|
#3
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
On Thu, 28 Jun 2012 03:35:08 +0000, cupcakeluv_333 wrote:
Hello, I am in desperate need of help! Here are my details: I have one column with different gene identities, such as "gi|351702631|gb|EHB05550.1|". I have another column with the identities matched with gene descriptions, for example "351702631gb|EHB05550.1|EHB05550.1cGMP-gated cation channel alpha-1 [Heterocephalus glaber]". I need to find the match for column 1 in column 2; however, as you can see, they are not exact matches. I need to put the match from column 2 into a 3rd column. The lengths of column 1 and 2 do not match. Please help! Thanks :) This is not straightforward as it would require multiple substitutions in one or the other to develop a match. In other words, in the example you present, one would have to remove from the gene identity the leading "gi|" and the second "|" in order to develop a partial match. Without knowing how these gene identity strings, and gene defintion strings are constructed, it would be very difficult to develop an accurate algorithm to determine what kinds of matches are proper, and what are improper. Some questions that come to mind have to do with the location of the pipes, especially since they are different in both instances; the leading "gi|" in the gene identity string -- is there something at the beginning that can always be ignored? the significance of the 2nd EHB05550.1 in the gene description string how to determine how much of the gene identity has to match with the gene description in order to constitute a proper match etc. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Vlookup on partial match | Excel Worksheet Functions | |||
Find partial match from column A,B and fill partial match in C? | Excel Discussion (Misc queries) | |||
DSUM Partial Match | Excel Worksheet Functions | |||
vbscript for partial match in two columns | Excel Programming | |||
partial lookup/match | Excel Worksheet Functions |