Remember Me?

#1
April 1st 05, 07:59 PM
 foofoo Posts: n/a
Removing Near-Duplicate Rows, Leaving Those w/Most Data in Specific Columns

I have a spreadsheet with about 20,000 rows of data and I am trying to
eliminate unnecessary rows. Columns B&C contain numbers, and columns D
through AR contain text. Many rows contain identical information in
columns B, C, D, E, & F, but the information in the remaining columns
may differ. I would like to eliminate rows with duplicate data in
columns B through F, retaining the rows with the greatest number of
entries in columns G through AR.

An example of what I would like to get to is shown below.

BEFORE
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL
Row 2 12 34 DD EE FF GG HH II JJ
Row 3 12 34 DD EE FF GG HH II

AFTER
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL

I am not familiar with Visual Basic, so I would prefer to handle this
with a formula, if possible.

Can anyone help?

Thanks!

Sandi

#2
April 2nd 05, 12:02 AM
 Dave Peterson Posts: n/a

How about a little work:

First, do this against a copy--just in case!

Insert two new columns A:C in that worksheet.

In A1, put this formula:
=row()

In B1:
=d1&CHAR(1)&e1&CHAR(1)&f1&CHAR(1)&g1&CHAR(10)&h1
(it concatenates the stuff in C:G (formerly A:F)

Then in C1, put this:
=COUNTA(J1:AU1)
(formerly G:AR)

And drag down 20000 rows.
select A:C
edit|copy
edit|paste special|Values
(things will work faster if you have values, not formulas)

Add a new row 1 (for headers).

Now select the whole range and sort by column A in ascending order and a
secondary key of column B (descending order).

The top one of each group will be the one that had the most entries.

Now select column B.
And do Data|Filter|Advanced filter
filter in place and Unique records only.

Now the rows you can see are the ones you want.
Select those visible rows
edit|copy
paste to a new worksheet.

Sort this data by column A to put it back in the original order.
delete columns A:C

foofoo wrote:

I have a spreadsheet with about 20,000 rows of data and I am trying to
eliminate unnecessary rows. Columns B&C contain numbers, and columns D
through AR contain text. Many rows contain identical information in
columns B, C, D, E, & F, but the information in the remaining columns
may differ. I would like to eliminate rows with duplicate data in
columns B through F, retaining the rows with the greatest number of
entries in columns G through AR.

An example of what I would like to get to is shown below.

BEFORE
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL
Row 2 12 34 DD EE FF GG HH II JJ
Row 3 12 34 DD EE FF GG HH II

AFTER
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL

I am not familiar with Visual Basic, so I would prefer to handle this
with a formula, if possible.

Can anyone help?

Thanks!

Sandi

--

Dave Peterson

 Thread Tools Search this Thread Search this Thread: Advanced Search Display Modes Linear Mode

 Posting Rules Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are On

 Similar Threads Thread Thread Starter Forum Replies Last Post SSHO_99 Excel Worksheet Functions 4 May 4th 07 11:17 AM EaglesNest Excel Discussion (Misc queries) 4 March 26th 05 05:21 PM Jai Cutmore Excel Worksheet Functions 2 March 7th 05 09:27 PM [email protected] Excel Discussion (Misc queries) 1 January 28th 05 03:08 PM TO Excel Worksheet Functions 2 November 12th 04 06:55 PM

All times are GMT +1. The time now is 02:17 AM.