![]() |
Record Comparison Between Two Databases
I am writing a procedure to compare thousands of contact records from two
databases for any differences that exist between duplicate contacts in 17 different fields (Company, Address, phone number, email, etc.). Since both databases are updated frequently and independently (that's unavoidable), we need to compare the two on a monthly basis so that both contain the most updated information as possible. My approach has been to export all the contacts into one spreadsheet and sort them by DisplayName (which includes first and last name). My procedure first compares a DisplayName to the record below it. If it is listed only once, obviously no duplicate exists and that record's entire row is removed and pasted elsewhere. If the DisplayName is listed two or more times, the fields of the first record are used as the basis of comparision to determine if the other records are different. Differences are highlighted in yellow. My question has to do with finding the most efficient way (via VBA) to compare duplicate records. As an example, let's say that Joe Smith is listed four times. My code currently concatenates all 17 fields for each record into one string that can be quickly compared to each other. If the strings for each are identical, there are no differences and those records are removed. However, if the strings for two or more records are not the same, then my code starts with the first of the 17 fields (Company) and compares all the values for each Joe Smith record, highlighting differences that exist from the first record. It repeats this process for all the fields and then continues on for all other contacts. If anyone has a better, more efficient suggestion for improving the comparison process, I would appreciate hearing from you! Thanks. -- Steve C |
Record Comparison Between Two Databases
My first recomendation no matter which methoid is used is to mark the rows to
delete and then delete all the rows at one time. I usually put an Z in the auxilary column When I find a duplicate. Then sort descending order the auxilary column with an X that will bring all the X's to the top of the worksheet. I like finding duplicates by using a worksheet formula that can put automatically placed on the worksheet Put in auxilary column and the copy down all the cells. =if(COUNTIF(A$1:A1,A1)1,"X","") Try this code. I had the first row of the worksheet the names without a header row. The names were in column A and the auxilary row is incolumn D. The code may seem a little complicated but runs very quickly. Sub RemoveDuplicates() 'Use column D as Auxilary column to place X on duplicate rows With Sheets("Sheet1") LastRow = .Range("A" & Rows.Count).End(xlUp).Row 'put formula in auxilary row D .Range("D1").Formula = "=if(COUNTIF(A$1:A1,A1)1,""X"","""")" .Range("D1").Copy Destination:=.Range("D1:D" & LastRow) 'replace formula with value Range("A1:D" & LastRow).Copy Range("A1:D" & LastRow).PasteSpecial xlPasteValues Range("A1:D" & LastRow).Sort _ Key1:=Range("D1"), _ Order1:=xlDescending, _ Header:=xlGuess CountX = WorksheetFunction.CountIf( _ .Range("C1:D" & LastRow), "X") .Rows("1:" & CountX).Delete End With End Sub "Steve C" wrote: I am writing a procedure to compare thousands of contact records from two databases for any differences that exist between duplicate contacts in 17 different fields (Company, Address, phone number, email, etc.). Since both databases are updated frequently and independently (that's unavoidable), we need to compare the two on a monthly basis so that both contain the most updated information as possible. My approach has been to export all the contacts into one spreadsheet and sort them by DisplayName (which includes first and last name). My procedure first compares a DisplayName to the record below it. If it is listed only once, obviously no duplicate exists and that record's entire row is removed and pasted elsewhere. If the DisplayName is listed two or more times, the fields of the first record are used as the basis of comparision to determine if the other records are different. Differences are highlighted in yellow. My question has to do with finding the most efficient way (via VBA) to compare duplicate records. As an example, let's say that Joe Smith is listed four times. My code currently concatenates all 17 fields for each record into one string that can be quickly compared to each other. If the strings for each are identical, there are no differences and those records are removed. However, if the strings for two or more records are not the same, then my code starts with the first of the 17 fields (Company) and compares all the values for each Joe Smith record, highlighting differences that exist from the first record. It repeats this process for all the fields and then continues on for all other contacts. If anyone has a better, more efficient suggestion for improving the comparison process, I would appreciate hearing from you! Thanks. -- Steve C |
Record Comparison Between Two Databases
Thanks, Joel. Your suggestion to process as many records at one time is a
good one. Thanks for the formula and the code. -- Steve C "Joel" wrote: My first recomendation no matter which methoid is used is to mark the rows to delete and then delete all the rows at one time. I usually put an Z in the auxilary column When I find a duplicate. Then sort descending order the auxilary column with an X that will bring all the X's to the top of the worksheet. I like finding duplicates by using a worksheet formula that can put automatically placed on the worksheet Put in auxilary column and the copy down all the cells. =if(COUNTIF(A$1:A1,A1)1,"X","") Try this code. I had the first row of the worksheet the names without a header row. The names were in column A and the auxilary row is incolumn D. The code may seem a little complicated but runs very quickly. Sub RemoveDuplicates() 'Use column D as Auxilary column to place X on duplicate rows With Sheets("Sheet1") LastRow = .Range("A" & Rows.Count).End(xlUp).Row 'put formula in auxilary row D .Range("D1").Formula = "=if(COUNTIF(A$1:A1,A1)1,""X"","""")" .Range("D1").Copy Destination:=.Range("D1:D" & LastRow) 'replace formula with value Range("A1:D" & LastRow).Copy Range("A1:D" & LastRow).PasteSpecial xlPasteValues Range("A1:D" & LastRow).Sort _ Key1:=Range("D1"), _ Order1:=xlDescending, _ Header:=xlGuess CountX = WorksheetFunction.CountIf( _ .Range("C1:D" & LastRow), "X") .Rows("1:" & CountX).Delete End With End Sub "Steve C" wrote: I am writing a procedure to compare thousands of contact records from two databases for any differences that exist between duplicate contacts in 17 different fields (Company, Address, phone number, email, etc.). Since both databases are updated frequently and independently (that's unavoidable), we need to compare the two on a monthly basis so that both contain the most updated information as possible. My approach has been to export all the contacts into one spreadsheet and sort them by DisplayName (which includes first and last name). My procedure first compares a DisplayName to the record below it. If it is listed only once, obviously no duplicate exists and that record's entire row is removed and pasted elsewhere. If the DisplayName is listed two or more times, the fields of the first record are used as the basis of comparision to determine if the other records are different. Differences are highlighted in yellow. My question has to do with finding the most efficient way (via VBA) to compare duplicate records. As an example, let's say that Joe Smith is listed four times. My code currently concatenates all 17 fields for each record into one string that can be quickly compared to each other. If the strings for each are identical, there are no differences and those records are removed. However, if the strings for two or more records are not the same, then my code starts with the first of the 17 fields (Company) and compares all the values for each Joe Smith record, highlighting differences that exist from the first record. It repeats this process for all the fields and then continues on for all other contacts. If anyone has a better, more efficient suggestion for improving the comparison process, I would appreciate hearing from you! Thanks. -- Steve C |
All times are GMT +1. The time now is 09:08 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com