Thread: Comparing Rows
View Single Post
  #8   Report Post  
Posted to microsoft.public.excel.programming
joel joel is offline
external usenet poster
 
Posts: 9,101
Default Comparing Rows

One thing that might help speed up the code s to reduce the size of the
worksheet. Once a cell is used excel sometimes thinks there is data in the
cell. So when you do a .end(xlup) or .end(xltoleft) you get the wrong
results. to solve this probelm is you manually delete the rows and columns
after the used data this helps

to delte the unused rows
1) select A1
2) Press Shift-Cntl-Down Arrow to get to last row
3) Select the row number after the last row of data
4) Press again Shift-Cntl-down Arrow all the rows after your data will be
selected
5) Right click any row number that is selected and choose delete

Repeat the same for columns instead of down arrow use right arrow.

"Monomeeth" wrote:

Hi Joel

Okay, retested the macro on a test worksheet I created. This worksheet had
data in the range A1:AC543. Oh, and yes, the first row was a header row.

Results are as follows:

REMOVE DUPLICATES ALL COLUMNS
Okay, same problem as before. It didn't take long to get the Visual Basic
error message "400". When I click on Help all I get is a blank white window.
I did notice, however, that the worksheet had been sorted - so the error
occurs sometime after the sort is done. However, the duplicate rows had not
beed deleted.

I did try to find out what the error 400 was about, but all I found was some
info relating to showing forms which are already visible - but in this case
we there is no form.

Anyway, hope this helps to narrow the proble down.

REMOVE DUPLICATES ONE COLUMN
Success! This worked. I deliberately designed the test worksheet to have
duplicate rows - both in terms of every column and in terms of a single
column. The macro went through fairly quickly and everything was A-OK.

Thanks for this Joel - I will just have to bear in mind it may not work on
larger worksheets - or take care to run it when the computer isn't doing
anything else in case it's a memory issue.

Now all I have to do is sort out the first Macro.

Thanks again for your help!

:)

--
If you can measure it, you can improve it!


"Monomeeth" wrote:

Hi Joel

Thank you very much for yoiur continued help. I really appreciate it. I also
like the way you have chosen to approach this.

I decided to test both your macros by doing a straight copy and paste as is.
I chose a worksheet which has data in the range A1:AL29420. The results were
as follows:

REMOVE DUPLICATES ALL COLUMNS
Macro seemed to run fine as it was obviously doing something. However, after
a few minutes I got a strange error message from Visual Basic. All it said
was "400". Nothing else, the only options I had was an OK button and a Help
button, but when I clicked on Help nothing seemed to happen.

REMOVE DUPLICATES ONE COLUMN
Macro started fine - I was able to select the column and get it running.
However, I waited for an hour and the Macro seemed to still be running. There
were no error messages, but Excel said it was not responding, while Visual
Basic Editor said it was still running.

Sorry, I can't give any further clues.

Now that I'm thinking about it, I will try these macros on much smaller
worksheets. I will post back shortly with the results.

Thanks Joel.

:)
--
If you can measure it, you can improve it!





"Joel" wrote:

The code assumes there is a header row in both cases. I asked the questions
becasue I want to keep the code as simple as possible and to make it run
quicker. Becasue the columns lengths varied I had to check every row for the
last column.

To make the code run faster instead of deleting rows one at a time. I
placed an X in column IV for rows that needed to be deleted. Then filtered
the X's using Autofilter and removed the rows with the X's. See comments in
the code.

Sub RemoveDuplicatesAllColumns()

'clear columnn IV incase data ther is data from last run
Columns("IV").Delete

'find last row
'use column A to determine last row
LastRow = Range("A" & Rows.Count).End(xlUp).Row

'find last column
'check every row to determine last column
EndColumn = 0
For RowCount = 1 To LastRow
LastColumn = Cells(RowCount).End(xlUp).Column
If LastColumn EndColumn Then
EndColumn = LastColumn
End If
Next RowCount


'sort cells three columns at a time
For ColCount = 1 To EndColumn Step 3
Rows("1:" & LastRow).Sort _
header:=xlYes, _
key1:=Cells(1, ColCount), _
order1:=xlAscending, _
key2:=Cells(1, ColCount + 1), _
order2:=xlAscending, _
key3:=Cells(1, ColCount + 1), _
order3:=xlAscending

Next ColCount

'compare rows putting a X in column IV where duplicates exist
RowCount = 2
Match = False 'use to indicate that last compared rows
'either matched or didn't match
Do While RowCount < LastRow
If Match = False Then
CompareRow = RowCount + 1
Else
CompareRow = CompareRow + 1
End If
'check if column lengths are equal if not skip
LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
Match = True
If LastCol = LastCompareCol Then
For ColCount = 1 To LastCol
If Cells(RowCount, ColCount) < _
Cells(CompareRow, ColCount) Then

Match = False
Exit For
End If
Next ColCount
If Match = True Then
Range("IV" & CompareRow) = "X"
End If
Else
Match = False
RowCount = RowCount + 1
End If
Loop


'remove rows with X's in column IV
Set c = Columns("IV").Find(what:="X", _
LookIn:=xlValues, lookat:=xlWhole)
'only filer if at least one X is found
If Not c Is Nothing Then
Columns("IV").AutoFilter Field:=1, Criteria1:="X"
Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
Columns("IV").Delete
End If

End Sub

Sub RemoveDuplicatesOneColumn()

'clear columnn IV incase data ther is data from last run
Columns("IV").Delete

'select column
Set Selected = Application.InputBox(Prompt:="Select one column", _
Title:="Select one column", Type:=8)

SelectCol = Selected.Column

'find last row
'use column A to determine last row
LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row


'sort by column selected
Rows("1:" & LastRow).Sort _
header:=xlYes, _
key1:=Cells(1, SelectCol), _
order1:=xlAscending

'compare rows putting a X in column IV where duplicates exist
RowCount = 2
Match = False 'use to indicate that last compared rows
'either matched or didn't match
Do While RowCount < LastRow
If Match = False Then
CompareRow = RowCount + 1
Else
CompareRow = CompareRow + 1
End If
If Cells(RowCount, SelectCol) = _
Cells(CompareRow, SelectCol) Then

Range("IV" & CompareRow) = "X"
Match = True
Else
Match = False
RowCount = RowCount + 1
End If
Loop

'remove rows with X's in column IV
Set c = Columns("IV").Find(what:="X", _
LookIn:=xlValues, lookat:=xlWhole)
'only filer if at least one X is found
If Not c Is Nothing Then
Columns("IV").AutoFilter Field:=1, Criteria1:="X"
Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
Columns("IV").Delete
End If

End Sub




"Monomeeth" wrote:

Hi Joel

Thanks very much for your help. My answers are as follows:

1) Is it acceptable to sort the worksheet in the macro?

Yes.

2) For macro ONE what is the last column?

This will be different as I want to be able to run the macro on various
spreadsheets. Although I suppose you could set the last column as IV (using
Excel 2003) and therefore it wouldn't matter as some cells may be blank.

3) Will the last column always be the same when you run the macro?

No. See answer to 2 above.

4) Does every Row have data filed to the last column.

Not necessarily. Some cells could be blank.


At the risk of providing too much info, here is some context for you:

BACKGROUND
Basically I am trying to remove duplicate rows from worksheets which import
data from various online forms submitted by users. Unfortunately, sometimes
users submit the exact same data multiple times (for whatever reason) and
this causes problems when we need to do an analysis of the collated data. So
I want to be able to remove duplicate rows where every cell in the row is
exactly the same as every corresponding cell within another row, and some of
these cells could be blank.

As for Macro TWO, I need this for another reason altogether. Last column is
not constant and yes there may be blank cells within the row.

JUST A THOUGHT
I suppose you could have the one macro do both jobs by designing it so that
the user can specify a range to interrogate - the user could select a single
column or a group of columns - but I don't know whether that would complicate
things as I don't want the user to specify WHAT the macro is actually looking
for, only WHERE to look for duplicates.

EXAMPLE

I may have a worksheet with a data range of A1:AG25000. Some cells within
this range may be blank for whatever reason. Now, I may want to remove any
duplicate rows where data in columns A to AG are an exact match for the
corresponding cells.

That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
the same as B237 and B767, and so on for every column in those rows,
regardless of whether the match happens to contain data or be blank - hence
why you could go to column IV to do your interrogation. In this scenario I
want to only be left with unique rows after the macro is finished, so only
row 123 would remain while rows 237 and 767 are deleted.

Now, I may have another spreadsheet where I want to achieve the same thing,
but only want to check a single column for duplicates. For instance, I may
have 25000 rows in a spreadsheet containing employee data, but there may only
be 18000 employees in total, so I should only have 18000 rows not 25000. So,
if Column E contained a unique identifier such as employee numbers, then I
would want the macro to only interrogate that column and remove rows based
only on the duplicated data in row E regardless of whether the other cells in
the row were duplicated. In this scenario I only want to keep the first row
conatining that unique identifier.

Hope this makes sense and that I haven't confused you!

:)

Once again, I appreciate your help and time with this.