Home |
Search |
Today's Posts |
|
#1
![]() |
|||
|
|||
![]()
I have a spreadsheet with about 20,000 rows of data and I am trying to
eliminate unnecessary rows. Columns B&C contain numbers, and columns D through AR contain text. Many rows contain identical information in columns B, C, D, E, & F, but the information in the remaining columns may differ. I would like to eliminate rows with duplicate data in columns B through F, retaining the rows with the greatest number of entries in columns G through AR. An example of what I would like to get to is shown below. BEFORE B C D E F G H I J K L Row 1 12 34 DD EE FF GG HH II JJ KK LL Row 2 12 34 DD EE FF GG HH II JJ Row 3 12 34 DD EE FF GG HH II AFTER B C D E F G H I J K L Row 1 12 34 DD EE FF GG HH II JJ KK LL I am not familiar with Visual Basic, so I would prefer to handle this with a formula, if possible. Can anyone help? Thanks! Sandi |
#2
![]() |
|||
|
|||
![]()
How about a little work:
First, do this against a copy--just in case! Insert two new columns A:C in that worksheet. In A1, put this formula: =row() In B1: =d1&CHAR(1)&e1&CHAR(1)&f1&CHAR(1)&g1&CHAR(10)&h1 (it concatenates the stuff in C:G (formerly A:F) Then in C1, put this: =COUNTA(J1:AU1) (formerly G:AR) And drag down 20000 rows. select A:C edit|copy edit|paste special|Values (things will work faster if you have values, not formulas) Add a new row 1 (for headers). Now select the whole range and sort by column A in ascending order and a secondary key of column B (descending order). The top one of each group will be the one that had the most entries. Now select column B. And do Data|Filter|Advanced filter filter in place and Unique records only. Now the rows you can see are the ones you want. Select those visible rows edit|copy paste to a new worksheet. Sort this data by column A to put it back in the original order. delete columns A:C foofoo wrote: I have a spreadsheet with about 20,000 rows of data and I am trying to eliminate unnecessary rows. Columns B&C contain numbers, and columns D through AR contain text. Many rows contain identical information in columns B, C, D, E, & F, but the information in the remaining columns may differ. I would like to eliminate rows with duplicate data in columns B through F, retaining the rows with the greatest number of entries in columns G through AR. An example of what I would like to get to is shown below. BEFORE B C D E F G H I J K L Row 1 12 34 DD EE FF GG HH II JJ KK LL Row 2 12 34 DD EE FF GG HH II JJ Row 3 12 34 DD EE FF GG HH II AFTER B C D E F G H I J K L Row 1 12 34 DD EE FF GG HH II JJ KK LL I am not familiar with Visual Basic, so I would prefer to handle this with a formula, if possible. Can anyone help? Thanks! Sandi -- Dave Peterson |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Find duplicate rows then deleting them | Excel Worksheet Functions | |||
Showing Duplicate Rows | Excel Discussion (Misc queries) | |||
Removing Duplicate Data | Excel Worksheet Functions | |||
How to delete duplicate rows in Excel 2000? | Excel Discussion (Misc queries) | |||
Duplicate Rows | Excel Worksheet Functions |