Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Remove Duplicates logic
I have a spreadsheet with duplicate rows. When I run the remove duplicates
task from the data tab, I get inconsistent results. For example, the spreadsheet contains the following lines Line 1: a 123 1 Line 2: a 123 2 Line 3: b 123 1 Line 4 : b 123 2 I check the columns to compare the first and second columns to determine duplicates, then run the task to remove duplicates The result is New Line 1: a 123 1 New Line 2: b 123 2 What logic is being used to select the first duplicate of a pair to be saved (line 1) and then to select the second duplicate of a pair (line 4)? The actual data that I am using is financial data extracted to a spreadsheet and then merged into my master sheet. I want to be able to save the latest version of the data (the last column is used as the identifier), and remove the previous version of the data (eg. in the example, save the values with the 2 in the last column and eliminate the duplicates which happen to contain the 1 in the previous column). In the actual data the column with the version information is actually a date. |
#2
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Remove Duplicates logic
Only the first instance of a set of values is kept, so it will remove line 2
and line 4. To test this, color the cells. To do what I think you are looking to do, first sort by the date column (newest to oldest), then highlight all three columns and choose 'remove duplicates'. Uncheck the date column and hit OK. Is that right? "Bob" wrote: I have a spreadsheet with duplicate rows. When I run the remove duplicates task from the data tab, I get inconsistent results. For example, the spreadsheet contains the following lines Line 1: a 123 1 Line 2: a 123 2 Line 3: b 123 1 Line 4 : b 123 2 I check the columns to compare the first and second columns to determine duplicates, then run the task to remove duplicates The result is New Line 1: a 123 1 New Line 2: b 123 2 What logic is being used to select the first duplicate of a pair to be saved (line 1) and then to select the second duplicate of a pair (line 4)? The actual data that I am using is financial data extracted to a spreadsheet and then merged into my master sheet. I want to be able to save the latest version of the data (the last column is used as the identifier), and remove the previous version of the data (eg. in the example, save the values with the 2 in the last column and eliminate the duplicates which happen to contain the 1 in the previous column). In the actual data the column with the version information is actually a date. |
#3
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Remove Duplicates logic
Thanks, the process that you outline and the logic you describe is how I
thought that it would work. Unfortunately that is not the result that I am getting. The first of each duplicated row is not necessarily the row maintained. I have sorted the data in date order, but have not used the date field in my duplicate identification criteria. Any other suggestions? "~L" wrote: Only the first instance of a set of values is kept, so it will remove line 2 and line 4. To test this, color the cells. To do what I think you are looking to do, first sort by the date column (newest to oldest), then highlight all three columns and choose 'remove duplicates'. Uncheck the date column and hit OK. Is that right? "Bob" wrote: I have a spreadsheet with duplicate rows. When I run the remove duplicates task from the data tab, I get inconsistent results. For example, the spreadsheet contains the following lines Line 1: a 123 1 Line 2: a 123 2 Line 3: b 123 1 Line 4 : b 123 2 I check the columns to compare the first and second columns to determine duplicates, then run the task to remove duplicates The result is New Line 1: a 123 1 New Line 2: b 123 2 What logic is being used to select the first duplicate of a pair to be saved (line 1) and then to select the second duplicate of a pair (line 4)? The actual data that I am using is financial data extracted to a spreadsheet and then merged into my master sheet. I want to be able to save the latest version of the data (the last column is used as the identifier), and remove the previous version of the data (eg. in the example, save the values with the 2 in the last column and eliminate the duplicates which happen to contain the 1 in the previous column). In the actual data the column with the version information is actually a date. |
#4
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Remove Duplicates logic
Include the date in your duplicate identification criteria selection, but
uncheck the box for the date column when the menu comes up. The duplicate finder will evaluate every checked item against the other items within the column. With the column sorted by date newest to oldest,, only older duplicates will be eliminated. Unless... If date isn't included, the a/b 123 data will be deleted and shifted but the dates will not (and your data will be incorrect because the row alignment has changed). If teh date column is included but not unchecked, nothing will be deleted because the different dates make the rows not duplicates. If it still isn't working, check to make sure a doesn't equal b (for example, if a cell displays 1 but is formatted a certain way, it might actually be equal to 1.234). Extra spaces are often the culprit when comparing text (" Hi there" "Hi there " and Hi there" are not equal to "Hi there"). In the case of missing rows, the duplicate finder can eliminate items so that if a row is hidden, an item can shift from a non-hidden row to a hidden one. If a hidden row falls within the selected range, it will be evaluated. And if none of that works, I must not be clear on the problem. "Bob" wrote: Thanks, the process that you outline and the logic you describe is how I thought that it would work. Unfortunately that is not the result that I am getting. The first of each duplicated row is not necessarily the row maintained. I have sorted the data in date order, but have not used the date field in my duplicate identification criteria. Any other suggestions? "~L" wrote: Only the first instance of a set of values is kept, so it will remove line 2 and line 4. To test this, color the cells. To do what I think you are looking to do, first sort by the date column (newest to oldest), then highlight all three columns and choose 'remove duplicates'. Uncheck the date column and hit OK. Is that right? "Bob" wrote: I have a spreadsheet with duplicate rows. When I run the remove duplicates task from the data tab, I get inconsistent results. For example, the spreadsheet contains the following lines Line 1: a 123 1 Line 2: a 123 2 Line 3: b 123 1 Line 4 : b 123 2 I check the columns to compare the first and second columns to determine duplicates, then run the task to remove duplicates The result is New Line 1: a 123 1 New Line 2: b 123 2 What logic is being used to select the first duplicate of a pair to be saved (line 1) and then to select the second duplicate of a pair (line 4)? The actual data that I am using is financial data extracted to a spreadsheet and then merged into my master sheet. I want to be able to save the latest version of the data (the last column is used as the identifier), and remove the previous version of the data (eg. in the example, save the values with the 2 in the last column and eliminate the duplicates which happen to contain the 1 in the previous column). In the actual data the column with the version information is actually a date. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How to remove row duplicates | Excel Discussion (Misc queries) | |||
Remove Rows That Are NOT Duplicates | Excel Worksheet Functions | |||
Remove Duplicates | New Users to Excel | |||
How to remove duplicates? | Excel Discussion (Misc queries) | |||
Remove duplicates | Excel Discussion (Misc queries) |