View Single Post
  #3   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Bob Bob is offline
external usenet poster
 
Posts: 972
Default Remove Duplicates logic

Thanks, the process that you outline and the logic you describe is how I
thought that it would work. Unfortunately that is not the result that I am
getting. The first of each duplicated row is not necessarily the row
maintained. I have sorted the data in date order, but have not used the date
field in my duplicate identification criteria.

Any other suggestions?


"~L" wrote:

Only the first instance of a set of values is kept, so it will remove line 2
and line 4.

To test this, color the cells.

To do what I think you are looking to do, first sort by the date column
(newest to oldest), then highlight all three columns and choose 'remove
duplicates'. Uncheck the date column and hit OK.

Is that right?

"Bob" wrote:

I have a spreadsheet with duplicate rows. When I run the remove duplicates
task from the data tab, I get inconsistent results. For example, the
spreadsheet contains the following lines
Line 1: a 123 1
Line 2: a 123 2
Line 3: b 123 1
Line 4 : b 123 2

I check the columns to compare the first and second columns to determine
duplicates, then run the task to remove duplicates

The result is

New Line 1: a 123 1
New Line 2: b 123 2

What logic is being used to select the first duplicate of a pair to be saved
(line 1) and then to select the second duplicate of a pair (line 4)?

The actual data that I am using is financial data extracted to a spreadsheet
and then merged into my master sheet. I want to be able to save the latest
version of the data (the last column is used as the identifier), and remove
the previous version of the data (eg. in the example, save the values with
the 2 in the last column and eliminate the duplicates which happen to contain
the 1 in the previous column). In the actual data the column with the
version information is actually a date.