Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
Hello
I need two macros to do as follows: ONE I need a macro which will check every row in a worksheet and if it finds two or more duplicate rows (i.e. every cell in the row is duplicated), it will then proceed to delete all duplicate rows except the first duplicate row. TWO I need a macro which will check every row in a worksheet and if it finds two or more rows with the exact same value in a particular column, it will then proceed to delete all duplicate rows except the first duplicate row. I would prefer the user to have the ability to specify the column they want interrogated. Any suggestions would be greatly appreciated! Thanks, Joe. -- If you can measure it, you can improve it! |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
I have Few questions
1) Is it acceptable to sort the worksheet in the macro? 2) For macro ONE what is the last column? 3) Will the last column always be the same when you run the macro? 4) Does every Row have data filed to the last column. What I propose for macro ONE is to sort every column of the worksheet starting with the last column and moving towards the first column. Then comparing adjacent rows for matches. "Monomeeth" wrote: Hello I need two macros to do as follows: ONE I need a macro which will check every row in a worksheet and if it finds two or more duplicate rows (i.e. every cell in the row is duplicated), it will then proceed to delete all duplicate rows except the first duplicate row. TWO I need a macro which will check every row in a worksheet and if it finds two or more rows with the exact same value in a particular column, it will then proceed to delete all duplicate rows except the first duplicate row. I would prefer the user to have the ability to specify the column they want interrogated. Any suggestions would be greatly appreciated! Thanks, Joe. -- If you can measure it, you can improve it! |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
Hi Joel
Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? No. See answer to 2 above. 4) Does every Row have data filed to the last column. Not necessarily. Some cells could be blank. At the risk of providing too much info, here is some context for you: BACKGROUND Basically I am trying to remove duplicate rows from worksheets which import data from various online forms submitted by users. Unfortunately, sometimes users submit the exact same data multiple times (for whatever reason) and this causes problems when we need to do an analysis of the collated data. So I want to be able to remove duplicate rows where every cell in the row is exactly the same as every corresponding cell within another row, and some of these cells could be blank. As for Macro TWO, I need this for another reason altogether. Last column is not constant and yes there may be blank cells within the row. JUST A THOUGHT I suppose you could have the one macro do both jobs by designing it so that the user can specify a range to interrogate - the user could select a single column or a group of columns - but I don't know whether that would complicate things as I don't want the user to specify WHAT the macro is actually looking for, only WHERE to look for duplicates. EXAMPLE I may have a worksheet with a data range of A1:AG25000. Some cells within this range may be blank for whatever reason. Now, I may want to remove any duplicate rows where data in columns A to AG are an exact match for the corresponding cells. That is, A123 may be exactly the same as A237 and A767, B123 may be exactly the same as B237 and B767, and so on for every column in those rows, regardless of whether the match happens to contain data or be blank - hence why you could go to column IV to do your interrogation. In this scenario I want to only be left with unique rows after the macro is finished, so only row 123 would remain while rows 237 and 767 are deleted. Now, I may have another spreadsheet where I want to achieve the same thing, but only want to check a single column for duplicates. For instance, I may have 25000 rows in a spreadsheet containing employee data, but there may only be 18000 employees in total, so I should only have 18000 rows not 25000. So, if Column E contained a unique identifier such as employee numbers, then I would want the macro to only interrogate that column and remove rows based only on the duplicated data in row E regardless of whether the other cells in the row were duplicated. In this scenario I only want to keep the first row conatining that unique identifier. Hope this makes sense and that I haven't confused you! :) Once again, I appreciate your help and time with this. Joe. -- If you can measure it, you can improve it! "Joel" wrote: I have Few questions 1) Is it acceptable to sort the worksheet in the macro? 2) For macro ONE what is the last column? 3) Will the last column always be the same when you run the macro? 4) Does every Row have data filed to the last column. What I propose for macro ONE is to sort every column of the worksheet starting with the last column and moving towards the first column. Then comparing adjacent rows for matches. "Monomeeth" wrote: Hello I need two macros to do as follows: ONE I need a macro which will check every row in a worksheet and if it finds two or more duplicate rows (i.e. every cell in the row is duplicated), it will then proceed to delete all duplicate rows except the first duplicate row. TWO I need a macro which will check every row in a worksheet and if it finds two or more rows with the exact same value in a particular column, it will then proceed to delete all duplicate rows except the first duplicate row. I would prefer the user to have the ability to specify the column they want interrogated. Any suggestions would be greatly appreciated! Thanks, Joe. -- If you can measure it, you can improve it! |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
The code assumes there is a header row in both cases. I asked the questions
becasue I want to keep the code as simple as possible and to make it run quicker. Becasue the columns lengths varied I had to check every row for the last column. To make the code run faster instead of deleting rows one at a time. I placed an X in column IV for rows that needed to be deleted. Then filtered the X's using Autofilter and removed the rows with the X's. See comments in the code. Sub RemoveDuplicatesAllColumns() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'find last row 'use column A to determine last row LastRow = Range("A" & Rows.Count).End(xlUp).Row 'find last column 'check every row to determine last column EndColumn = 0 For RowCount = 1 To LastRow LastColumn = Cells(RowCount).End(xlUp).Column If LastColumn EndColumn Then EndColumn = LastColumn End If Next RowCount 'sort cells three columns at a time For ColCount = 1 To EndColumn Step 3 Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, ColCount), _ order1:=xlAscending, _ key2:=Cells(1, ColCount + 1), _ order2:=xlAscending, _ key3:=Cells(1, ColCount + 1), _ order3:=xlAscending Next ColCount 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If 'check if column lengths are equal if not skip LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column Match = True If LastCol = LastCompareCol Then For ColCount = 1 To LastCol If Cells(RowCount, ColCount) < _ Cells(CompareRow, ColCount) Then Match = False Exit For End If Next ColCount If Match = True Then Range("IV" & CompareRow) = "X" End If Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub Sub RemoveDuplicatesOneColumn() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'select column Set Selected = Application.InputBox(Prompt:="Select one column", _ Title:="Select one column", Type:=8) SelectCol = Selected.Column 'find last row 'use column A to determine last row LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row 'sort by column selected Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, SelectCol), _ order1:=xlAscending 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If If Cells(RowCount, SelectCol) = _ Cells(CompareRow, SelectCol) Then Range("IV" & CompareRow) = "X" Match = True Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub "Monomeeth" wrote: Hi Joel Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? No. See answer to 2 above. 4) Does every Row have data filed to the last column. Not necessarily. Some cells could be blank. At the risk of providing too much info, here is some context for you: BACKGROUND Basically I am trying to remove duplicate rows from worksheets which import data from various online forms submitted by users. Unfortunately, sometimes users submit the exact same data multiple times (for whatever reason) and this causes problems when we need to do an analysis of the collated data. So I want to be able to remove duplicate rows where every cell in the row is exactly the same as every corresponding cell within another row, and some of these cells could be blank. As for Macro TWO, I need this for another reason altogether. Last column is not constant and yes there may be blank cells within the row. JUST A THOUGHT I suppose you could have the one macro do both jobs by designing it so that the user can specify a range to interrogate - the user could select a single column or a group of columns - but I don't know whether that would complicate things as I don't want the user to specify WHAT the macro is actually looking for, only WHERE to look for duplicates. EXAMPLE I may have a worksheet with a data range of A1:AG25000. Some cells within this range may be blank for whatever reason. Now, I may want to remove any duplicate rows where data in columns A to AG are an exact match for the corresponding cells. That is, A123 may be exactly the same as A237 and A767, B123 may be exactly the same as B237 and B767, and so on for every column in those rows, regardless of whether the match happens to contain data or be blank - hence why you could go to column IV to do your interrogation. In this scenario I want to only be left with unique rows after the macro is finished, so only row 123 would remain while rows 237 and 767 are deleted. Now, I may have another spreadsheet where I want to achieve the same thing, but only want to check a single column for duplicates. For instance, I may have 25000 rows in a spreadsheet containing employee data, but there may only be 18000 employees in total, so I should only have 18000 rows not 25000. So, if Column E contained a unique identifier such as employee numbers, then I would want the macro to only interrogate that column and remove rows based only on the duplicated data in row E regardless of whether the other cells in the row were duplicated. In this scenario I only want to keep the first row conatining that unique identifier. Hope this makes sense and that I haven't confused you! :) Once again, I appreciate your help and time with this. Joe. -- If you can measure it, you can improve it! "Joel" wrote: I have Few questions 1) Is it acceptable to sort the worksheet in the macro? 2) For macro ONE what is the last column? 3) Will the last column always be the same when you run the macro? 4) Does every Row have data filed to the last column. What I propose for macro ONE is to sort every column of the worksheet starting with the last column and moving towards the first column. Then comparing adjacent rows for matches. "Monomeeth" wrote: Hello I need two macros to do as follows: ONE I need a macro which will check every row in a worksheet and if it finds two or more duplicate rows (i.e. every cell in the row is duplicated), it will then proceed to delete all duplicate rows except the first duplicate row. TWO I need a macro which will check every row in a worksheet and if it finds two or more rows with the exact same value in a particular column, it will then proceed to delete all duplicate rows except the first duplicate row. I would prefer the user to have the ability to specify the column they want interrogated. Any suggestions would be greatly appreciated! Thanks, Joe. -- If you can measure it, you can improve it! |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
Hi Joel
Thank you very much for yoiur continued help. I really appreciate it. I also like the way you have chosen to approach this. I decided to test both your macros by doing a straight copy and paste as is. I chose a worksheet which has data in the range A1:AL29420. The results were as follows: REMOVE DUPLICATES ALL COLUMNS Macro seemed to run fine as it was obviously doing something. However, after a few minutes I got a strange error message from Visual Basic. All it said was "400". Nothing else, the only options I had was an OK button and a Help button, but when I clicked on Help nothing seemed to happen. REMOVE DUPLICATES ONE COLUMN Macro started fine - I was able to select the column and get it running. However, I waited for an hour and the Macro seemed to still be running. There were no error messages, but Excel said it was not responding, while Visual Basic Editor said it was still running. Sorry, I can't give any further clues. Now that I'm thinking about it, I will try these macros on much smaller worksheets. I will post back shortly with the results. Thanks Joel. :) -- If you can measure it, you can improve it! "Joel" wrote: The code assumes there is a header row in both cases. I asked the questions becasue I want to keep the code as simple as possible and to make it run quicker. Becasue the columns lengths varied I had to check every row for the last column. To make the code run faster instead of deleting rows one at a time. I placed an X in column IV for rows that needed to be deleted. Then filtered the X's using Autofilter and removed the rows with the X's. See comments in the code. Sub RemoveDuplicatesAllColumns() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'find last row 'use column A to determine last row LastRow = Range("A" & Rows.Count).End(xlUp).Row 'find last column 'check every row to determine last column EndColumn = 0 For RowCount = 1 To LastRow LastColumn = Cells(RowCount).End(xlUp).Column If LastColumn EndColumn Then EndColumn = LastColumn End If Next RowCount 'sort cells three columns at a time For ColCount = 1 To EndColumn Step 3 Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, ColCount), _ order1:=xlAscending, _ key2:=Cells(1, ColCount + 1), _ order2:=xlAscending, _ key3:=Cells(1, ColCount + 1), _ order3:=xlAscending Next ColCount 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If 'check if column lengths are equal if not skip LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column Match = True If LastCol = LastCompareCol Then For ColCount = 1 To LastCol If Cells(RowCount, ColCount) < _ Cells(CompareRow, ColCount) Then Match = False Exit For End If Next ColCount If Match = True Then Range("IV" & CompareRow) = "X" End If Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub Sub RemoveDuplicatesOneColumn() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'select column Set Selected = Application.InputBox(Prompt:="Select one column", _ Title:="Select one column", Type:=8) SelectCol = Selected.Column 'find last row 'use column A to determine last row LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row 'sort by column selected Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, SelectCol), _ order1:=xlAscending 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If If Cells(RowCount, SelectCol) = _ Cells(CompareRow, SelectCol) Then Range("IV" & CompareRow) = "X" Match = True Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub "Monomeeth" wrote: Hi Joel Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? No. See answer to 2 above. 4) Does every Row have data filed to the last column. Not necessarily. Some cells could be blank. At the risk of providing too much info, here is some context for you: BACKGROUND Basically I am trying to remove duplicate rows from worksheets which import data from various online forms submitted by users. Unfortunately, sometimes users submit the exact same data multiple times (for whatever reason) and this causes problems when we need to do an analysis of the collated data. So I want to be able to remove duplicate rows where every cell in the row is exactly the same as every corresponding cell within another row, and some of these cells could be blank. As for Macro TWO, I need this for another reason altogether. Last column is not constant and yes there may be blank cells within the row. JUST A THOUGHT I suppose you could have the one macro do both jobs by designing it so that the user can specify a range to interrogate - the user could select a single column or a group of columns - but I don't know whether that would complicate things as I don't want the user to specify WHAT the macro is actually looking for, only WHERE to look for duplicates. EXAMPLE I may have a worksheet with a data range of A1:AG25000. Some cells within this range may be blank for whatever reason. Now, I may want to remove any duplicate rows where data in columns A to AG are an exact match for the corresponding cells. That is, A123 may be exactly the same as A237 and A767, B123 may be exactly the same as B237 and B767, and so on for every column in those rows, regardless of whether the match happens to contain data or be blank - hence why you could go to column IV to do your interrogation. In this scenario I want to only be left with unique rows after the macro is finished, so only row 123 would remain while rows 237 and 767 are deleted. Now, I may have another spreadsheet where I want to achieve the same thing, but only want to check a single column for duplicates. For instance, I may have 25000 rows in a spreadsheet containing employee data, but there may only be 18000 employees in total, so I should only have 18000 rows not 25000. So, if Column E contained a unique identifier such as employee numbers, then I would want the macro to only interrogate that column and remove rows based only on the duplicated data in row E regardless of whether the other cells in the row were duplicated. In this scenario I only want to keep the first row conatining that unique identifier. Hope this makes sense and that I haven't confused you! :) Once again, I appreciate your help and time with this. Joe. -- If you can measure it, you can improve it! "Joel" wrote: I have Few questions 1) Is it acceptable to sort the worksheet in the macro? 2) For macro ONE what is the last column? 3) Will the last column always be the same when you run the macro? 4) Does every Row have data filed to the last column. What I propose for macro ONE is to sort every column of the worksheet starting with the last column and moving towards the first column. Then comparing adjacent rows for matches. "Monomeeth" wrote: Hello I need two macros to do as follows: ONE I need a macro which will check every row in a worksheet and if it finds two or more duplicate rows (i.e. every cell in the row is duplicated), it will then proceed to delete all duplicate rows except the first duplicate row. TWO I need a macro which will check every row in a worksheet and if it finds two or more rows with the exact same value in a particular column, it will then proceed to delete all duplicate rows except the first duplicate row. I would prefer the user to have the ability to specify the column they want interrogated. Any suggestions would be greatly appreciated! Thanks, Joe. -- If you can measure it, you can improve it! |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
Hi Joel
Okay, retested the macro on a test worksheet I created. This worksheet had data in the range A1:AC543. Oh, and yes, the first row was a header row. Results are as follows: REMOVE DUPLICATES ALL COLUMNS Okay, same problem as before. It didn't take long to get the Visual Basic error message "400". When I click on Help all I get is a blank white window. I did notice, however, that the worksheet had been sorted - so the error occurs sometime after the sort is done. However, the duplicate rows had not beed deleted. I did try to find out what the error 400 was about, but all I found was some info relating to showing forms which are already visible - but in this case we there is no form. Anyway, hope this helps to narrow the proble down. REMOVE DUPLICATES ONE COLUMN Success! This worked. I deliberately designed the test worksheet to have duplicate rows - both in terms of every column and in terms of a single column. The macro went through fairly quickly and everything was A-OK. Thanks for this Joel - I will just have to bear in mind it may not work on larger worksheets - or take care to run it when the computer isn't doing anything else in case it's a memory issue. Now all I have to do is sort out the first Macro. Thanks again for your help! :) -- If you can measure it, you can improve it! "Monomeeth" wrote: Hi Joel Thank you very much for yoiur continued help. I really appreciate it. I also like the way you have chosen to approach this. I decided to test both your macros by doing a straight copy and paste as is. I chose a worksheet which has data in the range A1:AL29420. The results were as follows: REMOVE DUPLICATES ALL COLUMNS Macro seemed to run fine as it was obviously doing something. However, after a few minutes I got a strange error message from Visual Basic. All it said was "400". Nothing else, the only options I had was an OK button and a Help button, but when I clicked on Help nothing seemed to happen. REMOVE DUPLICATES ONE COLUMN Macro started fine - I was able to select the column and get it running. However, I waited for an hour and the Macro seemed to still be running. There were no error messages, but Excel said it was not responding, while Visual Basic Editor said it was still running. Sorry, I can't give any further clues. Now that I'm thinking about it, I will try these macros on much smaller worksheets. I will post back shortly with the results. Thanks Joel. :) -- If you can measure it, you can improve it! "Joel" wrote: The code assumes there is a header row in both cases. I asked the questions becasue I want to keep the code as simple as possible and to make it run quicker. Becasue the columns lengths varied I had to check every row for the last column. To make the code run faster instead of deleting rows one at a time. I placed an X in column IV for rows that needed to be deleted. Then filtered the X's using Autofilter and removed the rows with the X's. See comments in the code. Sub RemoveDuplicatesAllColumns() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'find last row 'use column A to determine last row LastRow = Range("A" & Rows.Count).End(xlUp).Row 'find last column 'check every row to determine last column EndColumn = 0 For RowCount = 1 To LastRow LastColumn = Cells(RowCount).End(xlUp).Column If LastColumn EndColumn Then EndColumn = LastColumn End If Next RowCount 'sort cells three columns at a time For ColCount = 1 To EndColumn Step 3 Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, ColCount), _ order1:=xlAscending, _ key2:=Cells(1, ColCount + 1), _ order2:=xlAscending, _ key3:=Cells(1, ColCount + 1), _ order3:=xlAscending Next ColCount 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If 'check if column lengths are equal if not skip LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column Match = True If LastCol = LastCompareCol Then For ColCount = 1 To LastCol If Cells(RowCount, ColCount) < _ Cells(CompareRow, ColCount) Then Match = False Exit For End If Next ColCount If Match = True Then Range("IV" & CompareRow) = "X" End If Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub Sub RemoveDuplicatesOneColumn() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'select column Set Selected = Application.InputBox(Prompt:="Select one column", _ Title:="Select one column", Type:=8) SelectCol = Selected.Column 'find last row 'use column A to determine last row LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row 'sort by column selected Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, SelectCol), _ order1:=xlAscending 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If If Cells(RowCount, SelectCol) = _ Cells(CompareRow, SelectCol) Then Range("IV" & CompareRow) = "X" Match = True Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub "Monomeeth" wrote: Hi Joel Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? No. See answer to 2 above. 4) Does every Row have data filed to the last column. Not necessarily. Some cells could be blank. At the risk of providing too much info, here is some context for you: BACKGROUND Basically I am trying to remove duplicate rows from worksheets which import data from various online forms submitted by users. Unfortunately, sometimes users submit the exact same data multiple times (for whatever reason) and this causes problems when we need to do an analysis of the collated data. So I want to be able to remove duplicate rows where every cell in the row is exactly the same as every corresponding cell within another row, and some of these cells could be blank. As for Macro TWO, I need this for another reason altogether. Last column is not constant and yes there may be blank cells within the row. JUST A THOUGHT I suppose you could have the one macro do both jobs by designing it so that the user can specify a range to interrogate - the user could select a single column or a group of columns - but I don't know whether that would complicate things as I don't want the user to specify WHAT the macro is actually looking for, only WHERE to look for duplicates. EXAMPLE I may have a worksheet with a data range of A1:AG25000. Some cells within this range may be blank for whatever reason. Now, I may want to remove any duplicate rows where data in columns A to AG are an exact match for the corresponding cells. That is, A123 may be exactly the same as A237 and A767, B123 may be exactly the same as B237 and B767, and so on for every column in those rows, regardless of whether the match happens to contain data or be blank - hence why you could go to column IV to do your interrogation. In this scenario I want to only be left with unique rows after the macro is finished, so only row 123 would remain while rows 237 and 767 are deleted. Now, I may have another spreadsheet where I want to achieve the same thing, but only want to check a single column for duplicates. For instance, I may have 25000 rows in a spreadsheet containing employee data, but there may only be 18000 employees in total, so I should only have 18000 rows not 25000. So, if Column E contained a unique identifier such as employee numbers, then I would want the macro to only interrogate that column and remove rows based only on the duplicated data in row E regardless of whether the other cells in the row were duplicated. In this scenario I only want to keep the first row conatining that unique identifier. Hope this makes sense and that I haven't confused you! :) Once again, I appreciate your help and time with this. Joe. -- If you can measure it, you can improve it! "Joel" wrote: I have Few questions 1) Is it acceptable to sort the worksheet in the macro? 2) For macro ONE what is the last column? 3) Will the last column always be the same when you run the macro? 4) Does every Row have data filed to the last column. What I propose for macro ONE is to sort every column of the worksheet starting with the last column and moving towards the first column. Then comparing adjacent rows for matches. "Monomeeth" wrote: Hello I need two macros to do as follows: ONE I need a macro which will check every row in a worksheet and if it finds two or more duplicate rows (i.e. every cell in the row is duplicated), it will then proceed to delete all duplicate rows except the first duplicate row. TWO I need a macro which will check every row in a worksheet and if it finds two or more rows with the exact same value in a particular column, it will then proceed to delete all duplicate rows except the first duplicate row. I would prefer the user to have the ability to specify the column they want interrogated. Any suggestions would be greatly appreciated! |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
I asked my original questions because I knew these programs could take a long
time to run. The sorting and putting the X'x in column IV probably increased the speed of this program by an order of maginitude. Integer Variables default to 16,536 when not declared. when you have a worksheet larger than 16,536 rows you must declare any variables that are used to count the rows as long. try putting these statements at the top of the two macros Dim LastRow as long Dim RowCount as Long Dim CompareRow as Long "Monomeeth" wrote: Hi Joel Okay, retested the macro on a test worksheet I created. This worksheet had data in the range A1:AC543. Oh, and yes, the first row was a header row. Results are as follows: REMOVE DUPLICATES ALL COLUMNS Okay, same problem as before. It didn't take long to get the Visual Basic error message "400". When I click on Help all I get is a blank white window. I did notice, however, that the worksheet had been sorted - so the error occurs sometime after the sort is done. However, the duplicate rows had not beed deleted. I did try to find out what the error 400 was about, but all I found was some info relating to showing forms which are already visible - but in this case we there is no form. Anyway, hope this helps to narrow the proble down. REMOVE DUPLICATES ONE COLUMN Success! This worked. I deliberately designed the test worksheet to have duplicate rows - both in terms of every column and in terms of a single column. The macro went through fairly quickly and everything was A-OK. Thanks for this Joel - I will just have to bear in mind it may not work on larger worksheets - or take care to run it when the computer isn't doing anything else in case it's a memory issue. Now all I have to do is sort out the first Macro. Thanks again for your help! :) -- If you can measure it, you can improve it! "Monomeeth" wrote: Hi Joel Thank you very much for yoiur continued help. I really appreciate it. I also like the way you have chosen to approach this. I decided to test both your macros by doing a straight copy and paste as is. I chose a worksheet which has data in the range A1:AL29420. The results were as follows: REMOVE DUPLICATES ALL COLUMNS Macro seemed to run fine as it was obviously doing something. However, after a few minutes I got a strange error message from Visual Basic. All it said was "400". Nothing else, the only options I had was an OK button and a Help button, but when I clicked on Help nothing seemed to happen. REMOVE DUPLICATES ONE COLUMN Macro started fine - I was able to select the column and get it running. However, I waited for an hour and the Macro seemed to still be running. There were no error messages, but Excel said it was not responding, while Visual Basic Editor said it was still running. Sorry, I can't give any further clues. Now that I'm thinking about it, I will try these macros on much smaller worksheets. I will post back shortly with the results. Thanks Joel. :) -- If you can measure it, you can improve it! "Joel" wrote: The code assumes there is a header row in both cases. I asked the questions becasue I want to keep the code as simple as possible and to make it run quicker. Becasue the columns lengths varied I had to check every row for the last column. To make the code run faster instead of deleting rows one at a time. I placed an X in column IV for rows that needed to be deleted. Then filtered the X's using Autofilter and removed the rows with the X's. See comments in the code. Sub RemoveDuplicatesAllColumns() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'find last row 'use column A to determine last row LastRow = Range("A" & Rows.Count).End(xlUp).Row 'find last column 'check every row to determine last column EndColumn = 0 For RowCount = 1 To LastRow LastColumn = Cells(RowCount).End(xlUp).Column If LastColumn EndColumn Then EndColumn = LastColumn End If Next RowCount 'sort cells three columns at a time For ColCount = 1 To EndColumn Step 3 Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, ColCount), _ order1:=xlAscending, _ key2:=Cells(1, ColCount + 1), _ order2:=xlAscending, _ key3:=Cells(1, ColCount + 1), _ order3:=xlAscending Next ColCount 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If 'check if column lengths are equal if not skip LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column Match = True If LastCol = LastCompareCol Then For ColCount = 1 To LastCol If Cells(RowCount, ColCount) < _ Cells(CompareRow, ColCount) Then Match = False Exit For End If Next ColCount If Match = True Then Range("IV" & CompareRow) = "X" End If Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub Sub RemoveDuplicatesOneColumn() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'select column Set Selected = Application.InputBox(Prompt:="Select one column", _ Title:="Select one column", Type:=8) SelectCol = Selected.Column 'find last row 'use column A to determine last row LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row 'sort by column selected Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, SelectCol), _ order1:=xlAscending 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If If Cells(RowCount, SelectCol) = _ Cells(CompareRow, SelectCol) Then Range("IV" & CompareRow) = "X" Match = True Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub "Monomeeth" wrote: Hi Joel Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? No. See answer to 2 above. 4) Does every Row have data filed to the last column. Not necessarily. Some cells could be blank. At the risk of providing too much info, here is some context for you: BACKGROUND Basically I am trying to remove duplicate rows from worksheets which import data from various online forms submitted by users. Unfortunately, sometimes users submit the exact same data multiple times (for whatever reason) and this causes problems when we need to do an analysis of the collated data. So I want to be able to remove duplicate rows where every cell in the row is exactly the same as every corresponding cell within another row, and some of these cells could be blank. As for Macro TWO, I need this for another reason altogether. Last column is not constant and yes there may be blank cells within the row. JUST A THOUGHT I suppose you could have the one macro do both jobs by designing it so that the user can specify a range to interrogate - the user could select a single column or a group of columns - but I don't know whether that would complicate things as I don't want the user to specify WHAT the macro is actually looking for, only WHERE to look for duplicates. EXAMPLE I may have a worksheet with a data range of A1:AG25000. Some cells within this range may be blank for whatever reason. Now, I may want to remove any duplicate rows where data in columns A to AG are an exact match for the corresponding cells. That is, A123 may be exactly the same as A237 and A767, B123 may be exactly the same as B237 and B767, and so on for every column in those rows, regardless of whether the match happens to contain data or be blank - hence why you could go to column IV to do your interrogation. In this scenario I want to only be left with unique rows after the macro is finished, so only row 123 would remain while rows 237 and 767 are deleted. Now, I may have another spreadsheet where I want to achieve the same thing, but only want to check a single column for duplicates. For instance, I may have 25000 rows in a spreadsheet containing employee data, but there may only be 18000 employees in total, so I should only have 18000 rows not 25000. So, if Column E contained a unique identifier such as employee numbers, then I would want the macro to only interrogate that column and remove rows based only on the duplicated data in row E regardless of whether the other cells in the row were duplicated. In this scenario I only want to keep the first row conatining that unique identifier. Hope this makes sense and that I haven't confused you! :) Once again, I appreciate your help and time with this. |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
One thing that might help speed up the code s to reduce the size of the
worksheet. Once a cell is used excel sometimes thinks there is data in the cell. So when you do a .end(xlup) or .end(xltoleft) you get the wrong results. to solve this probelm is you manually delete the rows and columns after the used data this helps to delte the unused rows 1) select A1 2) Press Shift-Cntl-Down Arrow to get to last row 3) Select the row number after the last row of data 4) Press again Shift-Cntl-down Arrow all the rows after your data will be selected 5) Right click any row number that is selected and choose delete Repeat the same for columns instead of down arrow use right arrow. "Monomeeth" wrote: Hi Joel Okay, retested the macro on a test worksheet I created. This worksheet had data in the range A1:AC543. Oh, and yes, the first row was a header row. Results are as follows: REMOVE DUPLICATES ALL COLUMNS Okay, same problem as before. It didn't take long to get the Visual Basic error message "400". When I click on Help all I get is a blank white window. I did notice, however, that the worksheet had been sorted - so the error occurs sometime after the sort is done. However, the duplicate rows had not beed deleted. I did try to find out what the error 400 was about, but all I found was some info relating to showing forms which are already visible - but in this case we there is no form. Anyway, hope this helps to narrow the proble down. REMOVE DUPLICATES ONE COLUMN Success! This worked. I deliberately designed the test worksheet to have duplicate rows - both in terms of every column and in terms of a single column. The macro went through fairly quickly and everything was A-OK. Thanks for this Joel - I will just have to bear in mind it may not work on larger worksheets - or take care to run it when the computer isn't doing anything else in case it's a memory issue. Now all I have to do is sort out the first Macro. Thanks again for your help! :) -- If you can measure it, you can improve it! "Monomeeth" wrote: Hi Joel Thank you very much for yoiur continued help. I really appreciate it. I also like the way you have chosen to approach this. I decided to test both your macros by doing a straight copy and paste as is. I chose a worksheet which has data in the range A1:AL29420. The results were as follows: REMOVE DUPLICATES ALL COLUMNS Macro seemed to run fine as it was obviously doing something. However, after a few minutes I got a strange error message from Visual Basic. All it said was "400". Nothing else, the only options I had was an OK button and a Help button, but when I clicked on Help nothing seemed to happen. REMOVE DUPLICATES ONE COLUMN Macro started fine - I was able to select the column and get it running. However, I waited for an hour and the Macro seemed to still be running. There were no error messages, but Excel said it was not responding, while Visual Basic Editor said it was still running. Sorry, I can't give any further clues. Now that I'm thinking about it, I will try these macros on much smaller worksheets. I will post back shortly with the results. Thanks Joel. :) -- If you can measure it, you can improve it! "Joel" wrote: The code assumes there is a header row in both cases. I asked the questions becasue I want to keep the code as simple as possible and to make it run quicker. Becasue the columns lengths varied I had to check every row for the last column. To make the code run faster instead of deleting rows one at a time. I placed an X in column IV for rows that needed to be deleted. Then filtered the X's using Autofilter and removed the rows with the X's. See comments in the code. Sub RemoveDuplicatesAllColumns() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'find last row 'use column A to determine last row LastRow = Range("A" & Rows.Count).End(xlUp).Row 'find last column 'check every row to determine last column EndColumn = 0 For RowCount = 1 To LastRow LastColumn = Cells(RowCount).End(xlUp).Column If LastColumn EndColumn Then EndColumn = LastColumn End If Next RowCount 'sort cells three columns at a time For ColCount = 1 To EndColumn Step 3 Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, ColCount), _ order1:=xlAscending, _ key2:=Cells(1, ColCount + 1), _ order2:=xlAscending, _ key3:=Cells(1, ColCount + 1), _ order3:=xlAscending Next ColCount 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If 'check if column lengths are equal if not skip LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column Match = True If LastCol = LastCompareCol Then For ColCount = 1 To LastCol If Cells(RowCount, ColCount) < _ Cells(CompareRow, ColCount) Then Match = False Exit For End If Next ColCount If Match = True Then Range("IV" & CompareRow) = "X" End If Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub Sub RemoveDuplicatesOneColumn() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'select column Set Selected = Application.InputBox(Prompt:="Select one column", _ Title:="Select one column", Type:=8) SelectCol = Selected.Column 'find last row 'use column A to determine last row LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row 'sort by column selected Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, SelectCol), _ order1:=xlAscending 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If If Cells(RowCount, SelectCol) = _ Cells(CompareRow, SelectCol) Then Range("IV" & CompareRow) = "X" Match = True Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub "Monomeeth" wrote: Hi Joel Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? No. See answer to 2 above. 4) Does every Row have data filed to the last column. Not necessarily. Some cells could be blank. At the risk of providing too much info, here is some context for you: BACKGROUND Basically I am trying to remove duplicate rows from worksheets which import data from various online forms submitted by users. Unfortunately, sometimes users submit the exact same data multiple times (for whatever reason) and this causes problems when we need to do an analysis of the collated data. So I want to be able to remove duplicate rows where every cell in the row is exactly the same as every corresponding cell within another row, and some of these cells could be blank. As for Macro TWO, I need this for another reason altogether. Last column is not constant and yes there may be blank cells within the row. JUST A THOUGHT I suppose you could have the one macro do both jobs by designing it so that the user can specify a range to interrogate - the user could select a single column or a group of columns - but I don't know whether that would complicate things as I don't want the user to specify WHAT the macro is actually looking for, only WHERE to look for duplicates. EXAMPLE I may have a worksheet with a data range of A1:AG25000. Some cells within this range may be blank for whatever reason. Now, I may want to remove any duplicate rows where data in columns A to AG are an exact match for the corresponding cells. That is, A123 may be exactly the same as A237 and A767, B123 may be exactly the same as B237 and B767, and so on for every column in those rows, regardless of whether the match happens to contain data or be blank - hence why you could go to column IV to do your interrogation. In this scenario I want to only be left with unique rows after the macro is finished, so only row 123 would remain while rows 237 and 767 are deleted. Now, I may have another spreadsheet where I want to achieve the same thing, but only want to check a single column for duplicates. For instance, I may have 25000 rows in a spreadsheet containing employee data, but there may only be 18000 employees in total, so I should only have 18000 rows not 25000. So, if Column E contained a unique identifier such as employee numbers, then I would want the macro to only interrogate that column and remove rows based only on the duplicated data in row E regardless of whether the other cells in the row were duplicated. In this scenario I only want to keep the first row conatining that unique identifier. Hope this makes sense and that I haven't confused you! :) Once again, I appreciate your help and time with this. |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
Hi Joel
I'm afraid I've run into problems. I've been experimenting and trying to sort it out myself, not wanting to trouble you anymore than necessary, but I've hit a brick wall. I tried to run the REMOVE DUPLICATES ONE COLUMN macro on a "real" worksheet which had data in the range A1:AL4942, but was met with the Visual Basic "400" error. I decided to reduce the worksheet down, since this macro did work on a much smaller worksheet last week. I progressively reduced the worksheet down from 4942 rows to 2000 rows to 1000 rows to 500 rows and finally to 250 rows. I ran the Macro each time with the same result (i.e. the "400" error). I also tried deleting all unused rows and columns, but to no avail. Adding the extra DIM statements at the top of the code made no difference either. DEBUGGING What I can tell you is that in every instance the first column was filtered, with a custom filter looking for "X", but there is no "X" in the first column. The Xs only appear in column IV as they should, so I checked the code looking for something obvious, but couldn't see anything. I'm now wondering whether the same problem is affecting both macros? I also redid my test from last week when I got the macro to work on my dummy workbook, but cannot replicate the success - so much for my testing. I swear it worked TWICE last week, but now not at all - so I can't explain that as I was certainly methodical in my approach. Sorry to hassle you, but perhaps you might see this as a challenge? I'm using Excel 2003, just in case that's an issue. Regards, Joe. -- If you can measure it, you can improve it! "Joel" wrote: I asked my original questions because I knew these programs could take a long time to run. The sorting and putting the X'x in column IV probably increased the speed of this program by an order of maginitude. Integer Variables default to 16,536 when not declared. when you have a worksheet larger than 16,536 rows you must declare any variables that are used to count the rows as long. try putting these statements at the top of the two macros Dim LastRow as long Dim RowCount as Long Dim CompareRow as Long "Monomeeth" wrote: Hi Joel Okay, retested the macro on a test worksheet I created. This worksheet had data in the range A1:AC543. Oh, and yes, the first row was a header row. Results are as follows: REMOVE DUPLICATES ALL COLUMNS Okay, same problem as before. It didn't take long to get the Visual Basic error message "400". When I click on Help all I get is a blank white window. I did notice, however, that the worksheet had been sorted - so the error occurs sometime after the sort is done. However, the duplicate rows had not beed deleted. I did try to find out what the error 400 was about, but all I found was some info relating to showing forms which are already visible - but in this case we there is no form. Anyway, hope this helps to narrow the proble down. REMOVE DUPLICATES ONE COLUMN Success! This worked. I deliberately designed the test worksheet to have duplicate rows - both in terms of every column and in terms of a single column. The macro went through fairly quickly and everything was A-OK. Thanks for this Joel - I will just have to bear in mind it may not work on larger worksheets - or take care to run it when the computer isn't doing anything else in case it's a memory issue. Now all I have to do is sort out the first Macro. Thanks again for your help! :) -- If you can measure it, you can improve it! "Monomeeth" wrote: Hi Joel Thank you very much for yoiur continued help. I really appreciate it. I also like the way you have chosen to approach this. I decided to test both your macros by doing a straight copy and paste as is. I chose a worksheet which has data in the range A1:AL29420. The results were as follows: REMOVE DUPLICATES ALL COLUMNS Macro seemed to run fine as it was obviously doing something. However, after a few minutes I got a strange error message from Visual Basic. All it said was "400". Nothing else, the only options I had was an OK button and a Help button, but when I clicked on Help nothing seemed to happen. REMOVE DUPLICATES ONE COLUMN Macro started fine - I was able to select the column and get it running. However, I waited for an hour and the Macro seemed to still be running. There were no error messages, but Excel said it was not responding, while Visual Basic Editor said it was still running. Sorry, I can't give any further clues. Now that I'm thinking about it, I will try these macros on much smaller worksheets. I will post back shortly with the results. Thanks Joel. :) -- If you can measure it, you can improve it! "Joel" wrote: The code assumes there is a header row in both cases. I asked the questions becasue I want to keep the code as simple as possible and to make it run quicker. Becasue the columns lengths varied I had to check every row for the last column. To make the code run faster instead of deleting rows one at a time. I placed an X in column IV for rows that needed to be deleted. Then filtered the X's using Autofilter and removed the rows with the X's. See comments in the code. Sub RemoveDuplicatesAllColumns() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'find last row 'use column A to determine last row LastRow = Range("A" & Rows.Count).End(xlUp).Row 'find last column 'check every row to determine last column EndColumn = 0 For RowCount = 1 To LastRow LastColumn = Cells(RowCount).End(xlUp).Column If LastColumn EndColumn Then EndColumn = LastColumn End If Next RowCount 'sort cells three columns at a time For ColCount = 1 To EndColumn Step 3 Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, ColCount), _ order1:=xlAscending, _ key2:=Cells(1, ColCount + 1), _ order2:=xlAscending, _ key3:=Cells(1, ColCount + 1), _ order3:=xlAscending Next ColCount 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If 'check if column lengths are equal if not skip LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column Match = True If LastCol = LastCompareCol Then For ColCount = 1 To LastCol If Cells(RowCount, ColCount) < _ Cells(CompareRow, ColCount) Then Match = False Exit For End If Next ColCount If Match = True Then Range("IV" & CompareRow) = "X" End If Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub Sub RemoveDuplicatesOneColumn() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'select column Set Selected = Application.InputBox(Prompt:="Select one column", _ Title:="Select one column", Type:=8) SelectCol = Selected.Column 'find last row 'use column A to determine last row LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row 'sort by column selected Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, SelectCol), _ order1:=xlAscending 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If If Cells(RowCount, SelectCol) = _ Cells(CompareRow, SelectCol) Then Range("IV" & CompareRow) = "X" Match = True Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub "Monomeeth" wrote: Hi Joel Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? No. See answer to 2 above. 4) Does every Row have data filed to the last column. Not necessarily. Some cells could be blank. At the risk of providing too much info, here is some context for you: BACKGROUND Basically I am trying to remove duplicate rows from worksheets which import data from various online forms submitted by users. Unfortunately, sometimes users submit the exact same data multiple times (for whatever reason) and this causes problems when we need to do an analysis of the collated data. So I want to be able to remove duplicate rows where every cell in the row is exactly the same as every corresponding cell within another row, and some of these cells could be blank. As for Macro TWO, I need this for another reason altogether. Last column is not constant and yes there may be blank cells within the row. JUST A THOUGHT I suppose you could have the one macro do both jobs by designing it so that the user can specify a range to interrogate - the user could select a single column or a group of columns - but I don't know whether that would complicate things as I don't want the user to specify WHAT the macro is actually looking for, only WHERE to look for duplicates. EXAMPLE I may have a worksheet with a data range of A1:AG25000. Some cells within this range may be blank for whatever reason. Now, I may want to remove any duplicate rows where data in columns A to AG are an exact match for the corresponding cells. That is, A123 may be exactly the same as A237 and A767, B123 may be exactly the same as B237 and B767, and so on for every column in those rows, regardless of whether the match happens to contain data or be blank - hence why you could go to column IV to do your interrogation. In this scenario I want to only be left with unique rows after the macro is finished, so only row 123 would remain while rows 237 and 767 are deleted. Now, I may have another spreadsheet where I want to achieve the same thing, |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing Rows
Which line is giving the Error 400?
Try removing the autofilter before running the macro. If is still fails check the value of LastRow and let me know what LastRow is being set to. "Monomeeth" wrote: Hi Joel I'm afraid I've run into problems. I've been experimenting and trying to sort it out myself, not wanting to trouble you anymore than necessary, but I've hit a brick wall. I tried to run the REMOVE DUPLICATES ONE COLUMN macro on a "real" worksheet which had data in the range A1:AL4942, but was met with the Visual Basic "400" error. I decided to reduce the worksheet down, since this macro did work on a much smaller worksheet last week. I progressively reduced the worksheet down from 4942 rows to 2000 rows to 1000 rows to 500 rows and finally to 250 rows. I ran the Macro each time with the same result (i.e. the "400" error). I also tried deleting all unused rows and columns, but to no avail. Adding the extra DIM statements at the top of the code made no difference either. DEBUGGING What I can tell you is that in every instance the first column was filtered, with a custom filter looking for "X", but there is no "X" in the first column. The Xs only appear in column IV as they should, so I checked the code looking for something obvious, but couldn't see anything. I'm now wondering whether the same problem is affecting both macros? I also redid my test from last week when I got the macro to work on my dummy workbook, but cannot replicate the success - so much for my testing. I swear it worked TWICE last week, but now not at all - so I can't explain that as I was certainly methodical in my approach. Sorry to hassle you, but perhaps you might see this as a challenge? I'm using Excel 2003, just in case that's an issue. Regards, Joe. -- If you can measure it, you can improve it! "Joel" wrote: I asked my original questions because I knew these programs could take a long time to run. The sorting and putting the X'x in column IV probably increased the speed of this program by an order of maginitude. Integer Variables default to 16,536 when not declared. when you have a worksheet larger than 16,536 rows you must declare any variables that are used to count the rows as long. try putting these statements at the top of the two macros Dim LastRow as long Dim RowCount as Long Dim CompareRow as Long "Monomeeth" wrote: Hi Joel Okay, retested the macro on a test worksheet I created. This worksheet had data in the range A1:AC543. Oh, and yes, the first row was a header row. Results are as follows: REMOVE DUPLICATES ALL COLUMNS Okay, same problem as before. It didn't take long to get the Visual Basic error message "400". When I click on Help all I get is a blank white window. I did notice, however, that the worksheet had been sorted - so the error occurs sometime after the sort is done. However, the duplicate rows had not beed deleted. I did try to find out what the error 400 was about, but all I found was some info relating to showing forms which are already visible - but in this case we there is no form. Anyway, hope this helps to narrow the proble down. REMOVE DUPLICATES ONE COLUMN Success! This worked. I deliberately designed the test worksheet to have duplicate rows - both in terms of every column and in terms of a single column. The macro went through fairly quickly and everything was A-OK. Thanks for this Joel - I will just have to bear in mind it may not work on larger worksheets - or take care to run it when the computer isn't doing anything else in case it's a memory issue. Now all I have to do is sort out the first Macro. Thanks again for your help! :) -- If you can measure it, you can improve it! "Monomeeth" wrote: Hi Joel Thank you very much for yoiur continued help. I really appreciate it. I also like the way you have chosen to approach this. I decided to test both your macros by doing a straight copy and paste as is. I chose a worksheet which has data in the range A1:AL29420. The results were as follows: REMOVE DUPLICATES ALL COLUMNS Macro seemed to run fine as it was obviously doing something. However, after a few minutes I got a strange error message from Visual Basic. All it said was "400". Nothing else, the only options I had was an OK button and a Help button, but when I clicked on Help nothing seemed to happen. REMOVE DUPLICATES ONE COLUMN Macro started fine - I was able to select the column and get it running. However, I waited for an hour and the Macro seemed to still be running. There were no error messages, but Excel said it was not responding, while Visual Basic Editor said it was still running. Sorry, I can't give any further clues. Now that I'm thinking about it, I will try these macros on much smaller worksheets. I will post back shortly with the results. Thanks Joel. :) -- If you can measure it, you can improve it! "Joel" wrote: The code assumes there is a header row in both cases. I asked the questions becasue I want to keep the code as simple as possible and to make it run quicker. Becasue the columns lengths varied I had to check every row for the last column. To make the code run faster instead of deleting rows one at a time. I placed an X in column IV for rows that needed to be deleted. Then filtered the X's using Autofilter and removed the rows with the X's. See comments in the code. Sub RemoveDuplicatesAllColumns() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'find last row 'use column A to determine last row LastRow = Range("A" & Rows.Count).End(xlUp).Row 'find last column 'check every row to determine last column EndColumn = 0 For RowCount = 1 To LastRow LastColumn = Cells(RowCount).End(xlUp).Column If LastColumn EndColumn Then EndColumn = LastColumn End If Next RowCount 'sort cells three columns at a time For ColCount = 1 To EndColumn Step 3 Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, ColCount), _ order1:=xlAscending, _ key2:=Cells(1, ColCount + 1), _ order2:=xlAscending, _ key3:=Cells(1, ColCount + 1), _ order3:=xlAscending Next ColCount 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If 'check if column lengths are equal if not skip LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column Match = True If LastCol = LastCompareCol Then For ColCount = 1 To LastCol If Cells(RowCount, ColCount) < _ Cells(CompareRow, ColCount) Then Match = False Exit For End If Next ColCount If Match = True Then Range("IV" & CompareRow) = "X" End If Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub Sub RemoveDuplicatesOneColumn() 'clear columnn IV incase data ther is data from last run Columns("IV").Delete 'select column Set Selected = Application.InputBox(Prompt:="Select one column", _ Title:="Select one column", Type:=8) SelectCol = Selected.Column 'find last row 'use column A to determine last row LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row 'sort by column selected Rows("1:" & LastRow).Sort _ header:=xlYes, _ key1:=Cells(1, SelectCol), _ order1:=xlAscending 'compare rows putting a X in column IV where duplicates exist RowCount = 2 Match = False 'use to indicate that last compared rows 'either matched or didn't match Do While RowCount < LastRow If Match = False Then CompareRow = RowCount + 1 Else CompareRow = CompareRow + 1 End If If Cells(RowCount, SelectCol) = _ Cells(CompareRow, SelectCol) Then Range("IV" & CompareRow) = "X" Match = True Else Match = False RowCount = RowCount + 1 End If Loop 'remove rows with X's in column IV Set c = Columns("IV").Find(what:="X", _ LookIn:=xlValues, lookat:=xlWhole) 'only filer if at least one X is found If Not c Is Nothing Then Columns("IV").AutoFilter Field:=1, Criteria1:="X" Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete Columns("IV").Delete End If End Sub "Monomeeth" wrote: Hi Joel Thanks very much for your help. My answers are as follows: 1) Is it acceptable to sort the worksheet in the macro? Yes. 2) For macro ONE what is the last column? This will be different as I want to be able to run the macro on various spreadsheets. Although I suppose you could set the last column as IV (using Excel 2003) and therefore it wouldn't matter as some cells may be blank. 3) Will the last column always be the same when you run the macro? |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
comparing rows | Excel Programming | |||
comparing rows | Excel Programming | |||
Comparing rows | Excel Programming | |||
comparing two rows, then highting? | Links and Linking in Excel | |||
Comparing two rows | Excel Programming |