Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Code to delete duplicate and extraneous data
I'm a novice VBA user - any assistance is appreciated.
I am using Excel 2003 and have a worksheet with about 15K rows of data. I need VBA code that will help to delete duplicate data and extraneous data based on specific dates or blank cells. My worksheet looks like this: HRID UID LName FName MI Region StartDate AdjStartDate CurrentDate TermDate 0001 lj1 Jones Lily USA 5/19/1999 5/19/1999 0002 js2 Smith Jon E Can 5/19/1999 5/19/1999 7/1/2000 0002 js2 Smith Jon E Can 5/19/1999 5/19/1999 0004 bh3 Ham Bill G USA 6/1/2000 6/1/2000 9/1/2003 9/1/2003 0005 vv5 Vish V V Ind 7/1/2003 7/1/2003 8/31/2007 8/31/2007 I am looking for VBA code that will detect the duplicates and do the following base on dates: If the CurrentDate and TermDate are blank ignore If the CurrentDate is blank and the TermDate is populated delete only if there is another row for the user where the current and term dates are blank, else delete If both current and termdates are populated and the dates are prior to 1/1/2007 then delete the row. Keep all rows with delete dates after 1/1/2007 Once I have this down to only 2007 data and open dates I can begin to calculate duration time in years and quarters. Thank you in advance for any assistance. |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Code to delete duplicate and extraneous data
Hi Sue,
First and foremost. Make sure you have a backup of your data before attempting to run the macros. I have developed some code for you but instead of the code going ahead and deleting the rows, I have written two macros; one to mark the rows to be deleted and the second one to actually delete the rows. My reason for this is that you need to do a reasonable sample check after the rows are marked for deletion to ensure that the correct rows are being deleted. My assumptions:- 1. That your data is in columns A to I and 2. Column A (headed HRID) does not have any blank cells before the bottom of the data. If it does then I will have to modify the macro because I use this column to identify the end of the data. The delete reference and notes are written to columns J and K. If you have got anything in these columns then I suggest that you insert 2 columns after column I and they can be deleted after completion. You run the first macro (Mark_for_Delete) and then check the rows to be deleted. You will see that in column K will be written the row number of a duplicate record (If any) for records where CurrentDate is blank and TermDate is populated. (It only finds one duplicate and if more then they are ignored). After checking, if you are happy that the deletes are correct then run the second macro (Delete_Rows) which will do the deletions for you. Records with CurrentDate and NO TermDate have no comment because you did not specify anything for them. Feel free to get back to me if you have any problems. Sub Mark_for_Delete() Dim c As Range 'Cell in column H Dim strHRID As String Dim foundHRID As Range Dim strFirstAddr As String Dim datePrior As Date 'NOTE: If altering the date between # # then enter date 'using the the alpha characters for the month and let 'Excel reset to numerics. 'Example Jan 1 2007 or 1 Jan 2007 is acceptable. datePrior = #1/1/2007# Sheets("Sheet1").Select For Each c In Range("H:H") 'Next line tests for blank in column A 'and assumes end of data if it is blank If c.Offset(0, -7) = "" Then Exit For 'Test if Current and Term are blank If c.Value = "" And c.Offset(0, 1).Value = "" Then c.Offset(0, 2).Value = "Ignore" GoTo skipToNextc End If 'Test if Current blank and Term populated If c.Value = "" And c.Offset(0, 1).Value 0 Then 'Save value and address of HRID strHRID = c.Offset(0, -7).Value strFirstAddr = c.Offset(0, -7).Address 'Find strHRID Set foundHRID = Columns("A:A") _ .Find(What:=strHRID, _ After:=c.Offset(0, -7), _ LookIn:=xlFormulas, _ LookAt:=xlWhole, _ SearchOrder:=xlByColumns, _ SearchDirection:=xlNext, _ MatchCase:=False, _ SearchFormat:=False) 'Not necessary to test if found because 'must find at least one at the current c.row. Do 'Test if not found at current row If foundHRID.Address < strFirstAddr Then 'Found but not at current c.row 'Test if Current and Term are blank at new location If foundHRID.Offset(0, 7) = "" And foundHRID.Offset(0, 8) = "" Then 'Both blank so write Delete to column J. c.Offset(0, 2).Value = "Delete" c.Offset(0, 2).Interior.ColorIndex = 3 'Write duplicate row number to column K c.Offset(0, 3).Value = "Duplicate Row " & foundHRID.Row Exit Do 'No additional find require Else 'Not both blank so look for another record Set foundHRID = Columns("A:A").FindNext(foundHRID) End If Else 'Only one record Current blank and Term populated c.Offset(0, 2).Value = "No Match with Current & Term blank" Exit Do End If Loop While Not foundHRID Is Nothing And foundHRID.Address < firstAddress End If 'Test if current and term are populated If c.Value 0 And c.Offset(0, 1).Value 0 Then 'test if current and term less than datePrior If c.Value < datePrior And c.Offset(0, 1).Value < datePrior Then c.Offset(0, 2).Value = "Delete" c.Offset(0, 2).Interior.ColorIndex = 3 Else c.Offset(0, 2).Value = "Recent" End If End If skipToNextc: Next c End Sub Sub Delete_Rows() Dim c As Range 'Cell in column H Sheets("Sheet1").Select For Each c In Range("J:J") 'Next line tests for blank in column A 'and assumes end of data if it is blank If c.Offset(0, -7) = "" Then Exit For If c.Value = "Delete" Then c.EntireRow.Delete End If Next c End Sub Regards, OssieMac |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Code to delete duplicate and extraneous data
Hi again Sue,
I don't know how savvy you are with VBA so this message is just in case you are a novice and the following gets you into problems. I notice that some of the lines in my code broke into 2 lines when posted. When you copy them into the VBA editor they will appear red. Simply place the cursor at the end of the first red line and press delete a few times until the second red line comes up to join the end of the first (leave a space) and move the cursor off the line and it should turn black. Comments lines are preceded by a single quote and appear in green. Regards, OssieMac |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Code to delete duplicate and extraneous data
OssieMac,
Thank you so much, this helped a great deal. I had to run the DeleteRow macro a couple of times, but I can live with that since it only takes a couple of minutes. I did notice what you stated about the CurrentDate and No TermDate populated. Here's what I see. In the worksheet now, I have duplicate rows for any given HRID where the CurrentDate and the TermDate are both populated (with a 2007 date) and on the next row there is a CurrentDate (with a 2007 date) and no termdate populated. I want to delete this row and keep only the row with both the Current Date and the TermDate populated. What would I add to the previous code you supplied. Thank you again, Sue "OssieMac" wrote: Hi Sue, First and foremost. Make sure you have a backup of your data before attempting to run the macros. I have developed some code for you but instead of the code going ahead and deleting the rows, I have written two macros; one to mark the rows to be deleted and the second one to actually delete the rows. My reason for this is that you need to do a reasonable sample check after the rows are marked for deletion to ensure that the correct rows are being deleted. My assumptions:- 1. That your data is in columns A to I and 2. Column A (headed HRID) does not have any blank cells before the bottom of the data. If it does then I will have to modify the macro because I use this column to identify the end of the data. The delete reference and notes are written to columns J and K. If you have got anything in these columns then I suggest that you insert 2 columns after column I and they can be deleted after completion. You run the first macro (Mark_for_Delete) and then check the rows to be deleted. You will see that in column K will be written the row number of a duplicate record (If any) for records where CurrentDate is blank and TermDate is populated. (It only finds one duplicate and if more then they are ignored). After checking, if you are happy that the deletes are correct then run the second macro (Delete_Rows) which will do the deletions for you. Records with CurrentDate and NO TermDate have no comment because you did not specify anything for them. Feel free to get back to me if you have any problems. Sub Mark_for_Delete() Dim c As Range 'Cell in column H Dim strHRID As String Dim foundHRID As Range Dim strFirstAddr As String Dim datePrior As Date 'NOTE: If altering the date between # # then enter date 'using the the alpha characters for the month and let 'Excel reset to numerics. 'Example Jan 1 2007 or 1 Jan 2007 is acceptable. datePrior = #1/1/2007# Sheets("Sheet1").Select For Each c In Range("H:H") 'Next line tests for blank in column A 'and assumes end of data if it is blank If c.Offset(0, -7) = "" Then Exit For 'Test if Current and Term are blank If c.Value = "" And c.Offset(0, 1).Value = "" Then c.Offset(0, 2).Value = "Ignore" GoTo skipToNextc End If 'Test if Current blank and Term populated If c.Value = "" And c.Offset(0, 1).Value 0 Then 'Save value and address of HRID strHRID = c.Offset(0, -7).Value strFirstAddr = c.Offset(0, -7).Address 'Find strHRID Set foundHRID = Columns("A:A") _ .Find(What:=strHRID, _ After:=c.Offset(0, -7), _ LookIn:=xlFormulas, _ LookAt:=xlWhole, _ SearchOrder:=xlByColumns, _ SearchDirection:=xlNext, _ MatchCase:=False, _ SearchFormat:=False) 'Not necessary to test if found because 'must find at least one at the current c.row. Do 'Test if not found at current row If foundHRID.Address < strFirstAddr Then 'Found but not at current c.row 'Test if Current and Term are blank at new location If foundHRID.Offset(0, 7) = "" And foundHRID.Offset(0, 8) = "" Then 'Both blank so write Delete to column J. c.Offset(0, 2).Value = "Delete" c.Offset(0, 2).Interior.ColorIndex = 3 'Write duplicate row number to column K c.Offset(0, 3).Value = "Duplicate Row " & foundHRID.Row Exit Do 'No additional find require Else 'Not both blank so look for another record Set foundHRID = Columns("A:A").FindNext(foundHRID) End If Else 'Only one record Current blank and Term populated c.Offset(0, 2).Value = "No Match with Current & Term blank" Exit Do End If Loop While Not foundHRID Is Nothing And foundHRID.Address < firstAddress End If 'Test if current and term are populated If c.Value 0 And c.Offset(0, 1).Value 0 Then 'test if current and term less than datePrior If c.Value < datePrior And c.Offset(0, 1).Value < datePrior Then c.Offset(0, 2).Value = "Delete" c.Offset(0, 2).Interior.ColorIndex = 3 Else c.Offset(0, 2).Value = "Recent" End If End If skipToNextc: Next c End Sub Sub Delete_Rows() Dim c As Range 'Cell in column H Sheets("Sheet1").Select For Each c In Range("J:J") 'Next line tests for blank in column A 'and assumes end of data if it is blank If c.Offset(0, -7) = "" Then Exit For If c.Value = "Delete" Then c.EntireRow.Delete End If Next c End Sub Regards, OssieMac |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Code to delete duplicate and extraneous data
Hi again Sue,
As per my message yesterday, don't forget to keep a backup of your data. Copy both macros in again. I have rewritten the delete one. Rows should be deleted in the reverse order from the bottom not from the top otherwise there are problems deleting adjacent rows because the row where the code is up to gets deleted and the program looses its place. I have put proper line breaks (Space and underscore) in the long lines of code so that hopefully you should not have to edit them. (I usually do this and I forgot yesterday.) Sub Mark_for_Delete() Dim c As Range 'Cell in column H Dim strHRID As String Dim foundHRID As Range Dim strFirstAddr As String Dim datePrior As Date 'NOTE: If altering the date between # # then enter date 'using the the alpha characters for the month and let 'Excel reset to numerics. 'Example Jan 1 2007 or 1 Jan 2007 is acceptable. datePrior = #1/1/2007# Sheets("Sheet1").Select For Each c In Range("H:H") 'Next line tests for blank in column A 'and assumes end of data if it is blank If c.Offset(0, -7) = "" Then Exit For 'Test if Current and Term are blank If c.Value = "" And c.Offset(0, 1).Value = "" Then c.Offset(0, 2).Value = "Ignore" GoTo skipToNextc Else 'Test for text column header If c.Value < "" And IsDate(c.Value) = False Then GoTo skipToNextc End If End If 'Test if Current blank and Term populated If c.Value = "" And c.Offset(0, 1).Value 0 Then 'Save value and address of HRID strHRID = c.Offset(0, -7).Value strFirstAddr = c.Offset(0, -7).Address 'Find strHRID Set foundHRID = Columns("A:A") _ .Find(What:=strHRID, _ After:=c.Offset(0, -7), _ LookIn:=xlFormulas, _ LookAt:=xlWhole, _ SearchOrder:=xlByColumns, _ SearchDirection:=xlNext, _ MatchCase:=False, _ SearchFormat:=False) 'Not necessary to test if found because 'must find at least one at the current c.row. Do 'Test if not found at current row If foundHRID.Address < strFirstAddr Then 'Found but not at current c.row 'Test if Current and Term are blank If foundHRID.Offset(0, 7) = "" And _ foundHRID.Offset(0, 8) = "" Then 'Both blank so write Delete to column J. c.Offset(0, 2).Value = "Delete" c.Offset(0, 2).Interior.ColorIndex = 3 'Write duplicate row number to column K c.Offset(0, 3).Value = "Duplicate Row " & _ foundHRID.Row Exit Do 'No additional find require Else 'Not both blank so look for another record Set foundHRID = Columns("A:A") _ .FindNext(foundHRID) End If Else 'Only one record Current blank and Term populated Exit Do End If Loop While Not foundHRID Is Nothing And _ foundHRID.Address < strFirstAddr GoTo skipToNextc End If 'Test if current and term are populated If c.Value 0 And c.Offset(0, 1).Value 0 Then 'test if current and term less than datePrior If c.Value < datePrior And _ c.Offset(0, 1).Value < datePrior Then c.Offset(0, 2).Value = "Delete" c.Offset(0, 2).Interior.ColorIndex = 3 GoTo skipToNextc Else 'Save value HRID strHRID = c.Offset(0, -7).Value 'Find strHRID strFirstAddr = c.Offset(0, -7).Address Set foundHRID = Columns("A:A") _ .Find(What:=strHRID, _ LookIn:=xlFormulas, _ LookAt:=xlWhole, _ SearchOrder:=xlByColumns, _ SearchDirection:=xlNext, _ MatchCase:=False, _ SearchFormat:=False) Do 'Test if NOT found at current row If foundHRID.Address < strFirstAddr Then 'Found but NOT at current c.row 'Test for Current populated and Term blank If foundHRID.Offset(0, 7) 0 And _ foundHRID.Offset(0, 8) = "" Then 'Current populated Term blank true.Delete row foundHRID.Offset(0, 9).Value = "Delete" foundHRID.Offset(0, 9).Interior.ColorIndex = 3 'Write duplicate row number to column K foundHRID.Offset(0, 10).Value = _ "Duplicate Row " & c.Row End If Set foundHRID = Columns("A:A").FindNext(foundHRID) Else 'Columns("A:A").FindNext(foundHRID).Activate 'Set foundHRID = ActiveCell Set foundHRID = Columns("A:A").FindNext(foundHRID) End If Loop While Not foundHRID Is Nothing And _ foundHRID.Address < strFirstAddr c.Offset(0, 2).Value = "Recent" End If End If skipToNextc: Next c 'Auto fir columns J and K Columns("J:K").Columns.AutoFit End Sub Sub Delete_Rows() Dim c As Range 'Cell in column H Sheets("Sheet1").Select For Each c In Range("J:J") 'Next line tests for blank in column A 'and assumes end of data if it is blank If c.Offset(0, -7) = "" Then Exit For If c.Value = "Delete" Then c.EntireRow.Delete End If Next c End Sub Sub Delete_Rows_2() Dim rngColJ As Range Dim c As Long 'Note: looks in column J for deletes. With Sheets("Sheet1") Set rngColJ = Range(.Cells(2, "J"), .Cells(.Rows.Count, "J").End(xlUp)) End With With rngColJ 'Step backwards from bottom For c = .Rows.Count To 2 Step -1 If .Cells(c, 1) = "Delete" Then .Cells(c, 1).EntireRow.Delete End If Next c End With End Sub Regards, OssieMac |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How do I delete extraneous EXCEL columns? | Excel Discussion (Misc queries) | |||
How do I delete extraneous EXCEL columns? | Excel Discussion (Misc queries) | |||
Delete Every Second Duplicate Code | Excel Discussion (Misc queries) | |||
i want to delete duplicate items in a list using code. | Excel Programming | |||
How to delete extraneous rows and columns | Excel Discussion (Misc queries) |