![]() |
find duplicates between rows, keep or delete entries based on ranked relevance
Hello all,
I am a biologist, I use Excel often but am new to VBA. Essentially, I am looking to create a macro which gleans certain data from one worksheet and refines and deposits it into an adjacent worksheet. If the answer to my problem is already out there, sorry, I searched and did not find it--probably I don't know enough on the topic to search efficiently. I have data classifying behavior of marked organisms. For a given date, we may have recorded behavioral data on a unique animal more than once. On some days I have 6 rows of data for a particular animal (identified with an alphanumeric code, such as T332), and on some days I have only one observation of that individual. Some behaviors are more "important" than others, so if animal #T332 was recorded 6 times on a given day I only need to keep the one row with the behavior deemed most useful to our research question. Ultimately, I want one row of data per animal per day. There are many hundreds of uniquely identified organisms. I'm trying to create a macro that will: 1. Search the worksheet, finding occurrences (rows) where the same animal was observed more than once on the same date. E.g. if the fields under columns: 'Name', 'Month', 'Day' and 'Year' are = between 2 or more rows, I want to go to step 2. 2. Consider the group of rows which met the above criteria, the row having the most "important" behavior (column 'rbi') observed on that day is retained. Other rows are discarded. rbi order of importance (high priority to low) is: S, NP, WP, WF, OM, A, U. 3. The "winning" row is sent to an adjacent results worksheet. When there aren't multiple rows per animal per date, the program sends the lone entry to the results worksheet, and the program goes on to search the entire worksheet. I'm turning to this forum because I haven't found much discussion or information (that I understand at least) online or in the books which I could use to do this. Any ideas would be great, if possible, please include as many translations from code language to English as you can. Even just help on coding out the first step would be incredibly helpful-i.e. how to find and select rows that have the same values under certain columns. Any Ideas? Thanks for taking the time to think about this! |
find duplicates between rows, keep or delete entries based on ranked relevance
Hi Mark,
Thanks for the note... it's a good idea (and essentially what we have been doing), but I need to automate the process because there are many hundreds of unique animals named, and many thousands of rows of observations in the database. Do you know, is there a way to attach an example of my file to the group? otherwise I could send an example worksheet to anyone who is interested. Thanks again, Elias Mark Driscol wrote: Perhaps you could use an AutoFilter? Filter on a particular animal, look for which behavior among the filtered rows is most important, and copy that to another worksheet. That solution doesn't use VBA or automate the process, but without knowing more about your data and how it is organized in your worksheet it might be difficult to craft a suitable reply. Mark wrote: Hello all, I am a biologist, I use Excel often but am new to VBA. Essentially, I am looking to create a macro which gleans certain data from one worksheet and refines and deposits it into an adjacent worksheet. If the answer to my problem is already out there, sorry, I searched and did not find it--probably I don't know enough on the topic to search efficiently. I have data classifying behavior of marked organisms. For a given date, we may have recorded behavioral data on a unique animal more than once. On some days I have 6 rows of data for a particular animal (identified with an alphanumeric code, such as T332), and on some days I have only one observation of that individual. Some behaviors are more "important" than others, so if animal #T332 was recorded 6 times on a given day I only need to keep the one row with the behavior deemed most useful to our research question. Ultimately, I want one row of data per animal per day. There are many hundreds of uniquely identified organisms. I'm trying to create a macro that will: 1. Search the worksheet, finding occurrences (rows) where the same animal was observed more than once on the same date. E.g. if the fields under columns: 'Name', 'Month', 'Day' and 'Year' are = between 2 or more rows, I want to go to step 2. 2. Consider the group of rows which met the above criteria, the row having the most "important" behavior (column 'rbi') observed on that day is retained. Other rows are discarded. rbi order of importance (high priority to low) is: S, NP, WP, WF, OM, A, U. 3. The "winning" row is sent to an adjacent results worksheet. When there aren't multiple rows per animal per date, the program sends the lone entry to the results worksheet, and the program goes on to search the entire worksheet. I'm turning to this forum because I haven't found much discussion or information (that I understand at least) online or in the books which I could use to do this. Any ideas would be great, if possible, please include as many translations from code language to English as you can. Even just help on coding out the first step would be incredibly helpful-i.e. how to find and select rows that have the same values under certain columns. Any Ideas? Thanks for taking the time to think about this! |
All times are GMT +1. The time now is 09:19 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com