View Single Post
  #6   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Max Max is offline
external usenet poster
 
Posts: 9,221
Default Extract duplicate rows to another worksheet

Just another option to play with ..

Assume source data is in a sheet named: X
within cols A to G, data from row2 down to row 22000
where cols A to C are the key cols: LastName, FirstName & Address

In a new sheet named: U (for "Uniques", say)
Paste the 6 col headers from X's A1:F1 into B1:G1

Put in A2:
=IF(COUNTBLANK(X!A2:C2)=3,"",IF(SUMPRODUCT((X!$A$2 :A2=X!A2)*(X!$B$2:B2=X!B2)*(X!$C$2:C2=X!C2))<2,ROW (),""))
Copy A2 down to cover the max expected extent of source data, say down to
A22200. (Leave A1 blank)

Then put in B2:
=IF(ROW(A1)COUNT($A:$A),"",INDEX(X!A:A,MATCH(SMAL L($A:$A,ROW(A1)),$A:$A,0)))
Copy B2 to G2. Select B2:G2, fill down by the smallest extent sufficient to
cover the max expected number of unique lines from X, say down to G5000. When
the entire uniques extract is exhausted you'd see only "blank" rows. Cols B
to G will extract only the unique* lines from X, with all results neatly
bunched at the top. Hide away col A if desired.
*ie lines in X with unique: LastName - FirstName - Address

Then dress up U nicely, make a copy of it, rename the copy as say: D (for
"Duplicates").

In sheet: D,

Replace the formula in A2 by:
=IF(COUNTBLANK(X!A2:C2)=3,"",IF(U!A2="",ROW(),""))
then copy A2 down to A22200 as before. (Leave A1 blank)

Select B2:G2 (same formulas - no change), fill down by the smallest extent
sufficient to cover the max expected "balance" duplicate lines from X, say
down to G17200 (assuming 5,000 max unique lines, balance duplicates expected
is 22200 - 5000 = 17200 roughly). When the entire duplicates extract is
exhausted you'd see only "blank" rows, as in the uniques sheet U. Cols B to G
will extract only the duplicate lines from X, with all results neatly bunched
at the top. Hide away col A if desired.

Easiest way to try out the above is to simply rename a copy of your source
sheet as: X, ensure the data structure is as assumed, then plug-in the
formulas "as-is", and power away. You can always restore the source sheet's
name later and leave it to Excel to auto-change the formulas to suit.
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
"Northwoods" wrote:
I have a client history that includes similar (Name, address) and dissimilar
(date) information. I want to extract duplicate records to another worksheet
and leave only 1 record per client on the other. Problem is querying only the
first 3 columns, LastName, FirstName & Address and ignoring the remaining 3
columns that have dissimilar information and the system would consider unique.
There are 22,000 records.
Any ideas most appreciated