ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)
-   -   Extract duplicate rows to another worksheet (https://www.excelbanter.com/excel-worksheet-functions/115062-extract-duplicate-rows-another-worksheet.html)

Northwoods

Extract duplicate rows to another worksheet
 
I have a client history that includes similar (Name, address) and dissimilar
(date) information. I want to extract duplicate records to another worksheet
and leave only 1 record per client on the other. Problem is querying only the
first 3 columns, LastName, FirstName & Address and ignoring the remaining 3
columns that have dissimilar information and the system would consider unique.
There are 22,000 records.
Any ideas most appreciated


Ron Coderre

Extract duplicate rows to another worksheet
 
Here's an approach pulls the unique records onto another sheet:

Assumptions:
Sheet1 contains your data in cells A1:C1000
Sheet2 is where you want the extracted unique data to be displayed

Using Sheet2:
A1: LastName
B1: FirstName
C1: Address

InsertNameDefine
Names in workbook: Sheet2!Extract
Refers to: =Sheet2!$A$1:$C$1

Still using Sheet2:
InsertNameDefine
Names in workbook: Sheet2!Database
Refers to: =Sheet1!$A$1:$C$1000

(Notice: you are on Sheet2, and creating a Sheet2-level range name, but
the referenced range is on Sheet1)

The reason: An advanced filter cannot SEND data to another sheet, but
it can PULL data FROM another sheet.

Now...set up the Advanced Data Filter:
<Data<Filter<Advanced Filter
Select: Copy to another location
List Range: (press F3 and select Database)
Criteria Range: (none for this exercise)
Copy To: (press F3 and select Extract)
Check: Unique records only
Click [OK]

Note: if you want to run that Advanced Data Filter again,
you'll need to re-select the range "Database" each time

Is that something you can work with?
***********
Regards,
Ron

XL2002, WinXP


"Northwoods" wrote:

I have a client history that includes similar (Name, address) and dissimilar
(date) information. I want to extract duplicate records to another worksheet
and leave only 1 record per client on the other. Problem is querying only the
first 3 columns, LastName, FirstName & Address and ignoring the remaining 3
columns that have dissimilar information and the system would consider unique.
There are 22,000 records.
Any ideas most appreciated


Max

Extract duplicate rows to another worksheet
 
Just another option to play with ..

Assume source data is in a sheet named: X
within cols A to G, data from row2 down to row 22000
where cols A to C are the key cols: LastName, FirstName & Address

In a new sheet named: U (for "Uniques", say)
Paste the 6 col headers from X's A1:F1 into B1:G1

Put in A2:
=IF(COUNTBLANK(X!A2:C2)=3,"",IF(SUMPRODUCT((X!$A$2 :A2=X!A2)*(X!$B$2:B2=X!B2)*(X!$C$2:C2=X!C2))<2,ROW (),""))
Copy A2 down to cover the max expected extent of source data, say down to
A22200. (Leave A1 blank)

Then put in B2:
=IF(ROW(A1)COUNT($A:$A),"",INDEX(X!A:A,MATCH(SMAL L($A:$A,ROW(A1)),$A:$A,0)))
Copy B2 to G2. Select B2:G2, fill down by the smallest extent sufficient to
cover the max expected number of unique lines from X, say down to G5000. When
the entire uniques extract is exhausted you'd see only "blank" rows. Cols B
to G will extract only the unique* lines from X, with all results neatly
bunched at the top. Hide away col A if desired.
*ie lines in X with unique: LastName - FirstName - Address

Then dress up U nicely, make a copy of it, rename the copy as say: D (for
"Duplicates").

In sheet: D,

Replace the formula in A2 by:
=IF(COUNTBLANK(X!A2:C2)=3,"",IF(U!A2="",ROW(),""))
then copy A2 down to A22200 as before. (Leave A1 blank)

Select B2:G2 (same formulas - no change), fill down by the smallest extent
sufficient to cover the max expected "balance" duplicate lines from X, say
down to G17200 (assuming 5,000 max unique lines, balance duplicates expected
is 22200 - 5000 = 17200 roughly). When the entire duplicates extract is
exhausted you'd see only "blank" rows, as in the uniques sheet U. Cols B to G
will extract only the duplicate lines from X, with all results neatly bunched
at the top. Hide away col A if desired.

Easiest way to try out the above is to simply rename a copy of your source
sheet as: X, ensure the data structure is as assumed, then plug-in the
formulas "as-is", and power away. You can always restore the source sheet's
name later and leave it to Excel to auto-change the formulas to suit.
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
"Northwoods" wrote:
I have a client history that includes similar (Name, address) and dissimilar
(date) information. I want to extract duplicate records to another worksheet
and leave only 1 record per client on the other. Problem is querying only the
first 3 columns, LastName, FirstName & Address and ignoring the remaining 3
columns that have dissimilar information and the system would consider unique.
There are 22,000 records.
Any ideas most appreciated


Northwoods

Extract duplicate rows to another worksheet
 
Thanks everyone,
I am going to get to work and post the results.
Northwoods

"Max" wrote:

Just another option to play with ..

Assume source data is in a sheet named: X
within cols A to G, data from row2 down to row 22000
where cols A to C are the key cols: LastName, FirstName & Address

In a new sheet named: U (for "Uniques", say)
Paste the 6 col headers from X's A1:F1 into B1:G1

Put in A2:
=IF(COUNTBLANK(X!A2:C2)=3,"",IF(SUMPRODUCT((X!$A$2 :A2=X!A2)*(X!$B$2:B2=X!B2)*(X!$C$2:C2=X!C2))<2,ROW (),""))
Copy A2 down to cover the max expected extent of source data, say down to
A22200. (Leave A1 blank)

Then put in B2:
=IF(ROW(A1)COUNT($A:$A),"",INDEX(X!A:A,MATCH(SMAL L($A:$A,ROW(A1)),$A:$A,0)))
Copy B2 to G2. Select B2:G2, fill down by the smallest extent sufficient to
cover the max expected number of unique lines from X, say down to G5000. When
the entire uniques extract is exhausted you'd see only "blank" rows. Cols B
to G will extract only the unique* lines from X, with all results neatly
bunched at the top. Hide away col A if desired.
*ie lines in X with unique: LastName - FirstName - Address

Then dress up U nicely, make a copy of it, rename the copy as say: D (for
"Duplicates").

In sheet: D,

Replace the formula in A2 by:
=IF(COUNTBLANK(X!A2:C2)=3,"",IF(U!A2="",ROW(),""))
then copy A2 down to A22200 as before. (Leave A1 blank)

Select B2:G2 (same formulas - no change), fill down by the smallest extent
sufficient to cover the max expected "balance" duplicate lines from X, say
down to G17200 (assuming 5,000 max unique lines, balance duplicates expected
is 22200 - 5000 = 17200 roughly). When the entire duplicates extract is
exhausted you'd see only "blank" rows, as in the uniques sheet U. Cols B to G
will extract only the duplicate lines from X, with all results neatly bunched
at the top. Hide away col A if desired.

Easiest way to try out the above is to simply rename a copy of your source
sheet as: X, ensure the data structure is as assumed, then plug-in the
formulas "as-is", and power away. You can always restore the source sheet's
name later and leave it to Excel to auto-change the formulas to suit.
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
"Northwoods" wrote:
I have a client history that includes similar (Name, address) and dissimilar
(date) information. I want to extract duplicate records to another worksheet
and leave only 1 record per client on the other. Problem is querying only the
first 3 columns, LastName, FirstName & Address and ignoring the remaining 3
columns that have dissimilar information and the system would consider unique.
There are 22,000 records.
Any ideas most appreciated


Max

Extract duplicate rows to another worksheet
 
Thanks for posting back !
Let us know how it went for you
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---
"Northwoods" wrote in message
...
Thanks everyone,
I am going to get to work and post the results.
Northwoods




Northwoods

Extract duplicate rows to another worksheet
 
Hi Ron, your solution worked great but I miss the necessary 3 extra columns.
They can't be used to determine unique records because it is dissimular data.
However, when all is said and done, I still need those 3 columns included in
the Unique list.
Max, I need to know a quick way to copy the formula down to the 22,000 row.
I was dragging down and then I thought hey there must be a better way. Any
tips?
Thanks
fands

"Ron Coderre" wrote:

Here's an approach pulls the unique records onto another sheet:

Assumptions:
Sheet1 contains your data in cells A1:C1000
Sheet2 is where you want the extracted unique data to be displayed

Using Sheet2:
A1: LastName
B1: FirstName
C1: Address

InsertNameDefine
Names in workbook: Sheet2!Extract
Refers to: =Sheet2!$A$1:$C$1

Still using Sheet2:
InsertNameDefine
Names in workbook: Sheet2!Database
Refers to: =Sheet1!$A$1:$C$1000

(Notice: you are on Sheet2, and creating a Sheet2-level range name, but
the referenced range is on Sheet1)

The reason: An advanced filter cannot SEND data to another sheet, but
it can PULL data FROM another sheet.

Now...set up the Advanced Data Filter:
<Data<Filter<Advanced Filter
Select: Copy to another location
List Range: (press F3 and select Database)
Criteria Range: (none for this exercise)
Copy To: (press F3 and select Extract)
Check: Unique records only
Click [OK]

Note: if you want to run that Advanced Data Filter again,
you'll need to re-select the range "Database" each time

Is that something you can work with?
***********
Regards,
Ron

XL2002, WinXP


"Northwoods" wrote:

I have a client history that includes similar (Name, address) and dissimilar
(date) information. I want to extract duplicate records to another worksheet
and leave only 1 record per client on the other. Problem is querying only the
first 3 columns, LastName, FirstName & Address and ignoring the remaining 3
columns that have dissimilar information and the system would consider unique.
There are 22,000 records.
Any ideas most appreciated


Ron Coderre

Extract duplicate rows to another worksheet
 
I need a little more information....

A client has 5 records, all with identical LastName, FirstName, and Address
values.
I'm guessing that the other fields contain other information, different for
each record.

Of those 5 records, what deteremines which record contains the data in the
other fields that you want associated with the "unique" record?

***********
Regards,
Ron

XL2002, WinXP


"Northwoods" wrote:

Hi Ron, your solution worked great but I miss the necessary 3 extra columns.
They can't be used to determine unique records because it is dissimular data.
However, when all is said and done, I still need those 3 columns included in
the Unique list.
Max, I need to know a quick way to copy the formula down to the 22,000 row.
I was dragging down and then I thought hey there must be a better way. Any
tips?
Thanks
fands

"Ron Coderre" wrote:

Here's an approach pulls the unique records onto another sheet:

Assumptions:
Sheet1 contains your data in cells A1:C1000
Sheet2 is where you want the extracted unique data to be displayed

Using Sheet2:
A1: LastName
B1: FirstName
C1: Address

InsertNameDefine
Names in workbook: Sheet2!Extract
Refers to: =Sheet2!$A$1:$C$1

Still using Sheet2:
InsertNameDefine
Names in workbook: Sheet2!Database
Refers to: =Sheet1!$A$1:$C$1000

(Notice: you are on Sheet2, and creating a Sheet2-level range name, but
the referenced range is on Sheet1)

The reason: An advanced filter cannot SEND data to another sheet, but
it can PULL data FROM another sheet.

Now...set up the Advanced Data Filter:
<Data<Filter<Advanced Filter
Select: Copy to another location
List Range: (press F3 and select Database)
Criteria Range: (none for this exercise)
Copy To: (press F3 and select Extract)
Check: Unique records only
Click [OK]

Note: if you want to run that Advanced Data Filter again,
you'll need to re-select the range "Database" each time

Is that something you can work with?
***********
Regards,
Ron

XL2002, WinXP


"Northwoods" wrote:

I have a client history that includes similar (Name, address) and dissimilar
(date) information. I want to extract duplicate records to another worksheet
and leave only 1 record per client on the other. Problem is querying only the
first 3 columns, LastName, FirstName & Address and ignoring the remaining 3
columns that have dissimilar information and the system would consider unique.
There are 22,000 records.
Any ideas most appreciated


Max

Extract duplicate rows to another worksheet
 
"Northwoods" wrote
.. Max, I need to know a quick way to copy the formula
down to the 22,000 row. I was dragging down and then
I thought hey there must be a better way. Any tips?


99% of the time, I'd just simply copy the formula/s by dragging down.

Anyway, here's an alternative to dragging down. Since we're dealing with
large ranges and calc intensive formulas here, as a precaution, I'd set the
book's calc mode to "Manual" first, via clicking Tools Options
Calculation tab Check "Manual" OK.

Eg for:

Put in A2:
=IF(COUNTBLANK(X!A2:C2)=3,"",IF(SUMPRODUCT((X!$A$2 :A2=X!A2)*(X!$B$2:B2=X!B2)*(X!$C$2:C2=X!C2))<2,ROW (),""))
Copy A2 down to cover the max expected extent of source data,
say down to A22200.


Type the required range in the namebox, viz.: A2:A22200
Press ENTER. This will select the range with A2 active

Then with range selected, copy & paste the formula directly into the formula
bar (this will be for A2, the active cell) and press CTRL+ENTER (instead of
just pressing ENTER). This will fill the formulas for the entire range
A2:A22200

As for the other fill down for a multi-col range ..

Then put in B2:
=IF(ROW(A1)COUNT($A:$A),"",INDEX(X!A:A,MATCH(SMAL L($A:$A,ROW(A1)),$A:$A,0)))
Copy B2 to G2. Select B2:G2, fill down by the smallest extent sufficient
to
cover the max expected number of unique lines from X, say down to G5000.


After copying B2 to G2, just right-click to copy the range B2:G2
Then type the reqd range in the name box, ie: B3:G5000
press ENTER (B3 will be active). Then right-click on the selected range,
choose Paste special check "Formulas" OK.

And after filling all of the above / whenever required, we should remember
to press F9 to calculate.
--
Max
Singapore
http://savefile.com/projects/236895
xdemechanik
---




All times are GMT +1. The time now is 04:24 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com