About Us

I have a spreadsheet with about 20,000 rows of data and I am trying to
eliminate unnecessary rows. Columns B&C contain numbers, and columns D
through AR contain text. Many rows contain identical information in
columns B, C, D, E, & F, but the information in the remaining columns
may differ. I would like to eliminate rows with duplicate data in
columns B through F, retaining the rows with the greatest number of
entries in columns G through AR.

An example of what I would like to get to is shown below.

BEFORE
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL
Row 2 12 34 DD EE FF GG HH II JJ
Row 3 12 34 DD EE FF GG HH II

AFTER
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL

I am not familiar with Visual Basic, so I would prefer to handle this
with a formula, if possible.

Can anyone help?

Thanks!

Sandi

How about a little work:

First, do this against a copy--just in case!

Insert two new columns A:C in that worksheet.

In A1, put this formula:
=row()

In B1:
=d1&CHAR(1)&e1&CHAR(1)&f1&CHAR(1)&g1&CHAR(10)&h1
(it concatenates the stuff in C:G (formerly A:F)

Then in C1, put this:
=COUNTA(J1:AU1)
(formerly G:AR)

And drag down 20000 rows.
select A:C
edit|copy
edit|paste special|Values
(things will work faster if you have values, not formulas)

Add a new row 1 (for headers).

Now select the whole range and sort by column A in ascending order and a
secondary key of column B (descending order).

The top one of each group will be the one that had the most entries.

Now select column B.
And do Data|Filter|Advanced filter
filter in place and Unique records only.

Now the rows you can see are the ones you want.
Select those visible rows
edit|copy
paste to a new worksheet.

Sort this data by column A to put it back in the original order.
delete columns A:C

foofoo wrote:

I have a spreadsheet with about 20,000 rows of data and I am trying to
eliminate unnecessary rows. Columns B&C contain numbers, and columns D
through AR contain text. Many rows contain identical information in
columns B, C, D, E, & F, but the information in the remaining columns
may differ. I would like to eliminate rows with duplicate data in
columns B through F, retaining the rows with the greatest number of
entries in columns G through AR.

An example of what I would like to get to is shown below.

BEFORE
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL
Row 2 12 34 DD EE FF GG HH II JJ
Row 3 12 34 DD EE FF GG HH II

AFTER
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL

I am not familiar with Visual Basic, so I would prefer to handle this
with a formula, if possible.

Can anyone help?

Thanks!

Sandi

--

Dave Peterson

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search
Display Modes
Switch to Linear Mode Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Find duplicate rows then deleting them	SSHO_99	Excel Worksheet Functions	4	May 4th 07 11:17 AM
Showing Duplicate Rows	EaglesNest	Excel Discussion (Misc queries)	4	March 26th 05 05:21 PM
Removing Duplicate Data	Jai Cutmore	Excel Worksheet Functions	2	March 7th 05 09:27 PM
How to delete duplicate rows in Excel 2000?	PAL@Emory	Excel Discussion (Misc queries)	1	January 28th 05 03:08 PM
Duplicate Rows	TO	Excel Worksheet Functions	2	November 12th 04 06:55 PM

Menu

About Us