Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
foofoo
 
Posts: n/a
Default Removing Near-Duplicate Rows, Leaving Those w/Most Data in Specific Columns

I have a spreadsheet with about 20,000 rows of data and I am trying to
eliminate unnecessary rows. Columns B&C contain numbers, and columns D
through AR contain text. Many rows contain identical information in
columns B, C, D, E, & F, but the information in the remaining columns
may differ. I would like to eliminate rows with duplicate data in
columns B through F, retaining the rows with the greatest number of
entries in columns G through AR.

An example of what I would like to get to is shown below.

BEFORE
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL
Row 2 12 34 DD EE FF GG HH II JJ
Row 3 12 34 DD EE FF GG HH II

AFTER
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL

I am not familiar with Visual Basic, so I would prefer to handle this
with a formula, if possible.

Can anyone help?




Thanks!




Sandi

  #2   Report Post  
Dave Peterson
 
Posts: n/a
Default

How about a little work:

First, do this against a copy--just in case!

Insert two new columns A:C in that worksheet.

In A1, put this formula:
=row()

In B1:
=d1&CHAR(1)&e1&CHAR(1)&f1&CHAR(1)&g1&CHAR(10)&h1
(it concatenates the stuff in C:G (formerly A:F)

Then in C1, put this:
=COUNTA(J1:AU1)
(formerly G:AR)

And drag down 20000 rows.
select A:C
edit|copy
edit|paste special|Values
(things will work faster if you have values, not formulas)

Add a new row 1 (for headers).

Now select the whole range and sort by column A in ascending order and a
secondary key of column B (descending order).

The top one of each group will be the one that had the most entries.

Now select column B.
And do Data|Filter|Advanced filter
filter in place and Unique records only.

Now the rows you can see are the ones you want.
Select those visible rows
edit|copy
paste to a new worksheet.

Sort this data by column A to put it back in the original order.
delete columns A:C





foofoo wrote:

I have a spreadsheet with about 20,000 rows of data and I am trying to
eliminate unnecessary rows. Columns B&C contain numbers, and columns D
through AR contain text. Many rows contain identical information in
columns B, C, D, E, & F, but the information in the remaining columns
may differ. I would like to eliminate rows with duplicate data in
columns B through F, retaining the rows with the greatest number of
entries in columns G through AR.

An example of what I would like to get to is shown below.

BEFORE
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL
Row 2 12 34 DD EE FF GG HH II JJ
Row 3 12 34 DD EE FF GG HH II

AFTER
B C D E F G H I J K L
Row 1 12 34 DD EE FF GG HH II JJ KK LL

I am not familiar with Visual Basic, so I would prefer to handle this
with a formula, if possible.

Can anyone help?

Thanks!

Sandi


--

Dave Peterson
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Find duplicate rows then deleting them SSHO_99 Excel Worksheet Functions 4 May 4th 07 11:17 AM
Showing Duplicate Rows EaglesNest Excel Discussion (Misc queries) 4 March 26th 05 06:21 PM
Removing Duplicate Data Jai Cutmore Excel Worksheet Functions 2 March 7th 05 10:27 PM
How to delete duplicate rows in Excel 2000? PAL@Emory Excel Discussion (Misc queries) 1 January 28th 05 04:08 PM
Duplicate Rows TO Excel Worksheet Functions 2 November 12th 04 07:55 PM


All times are GMT +1. The time now is 04:36 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"