Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
I receive two files at the end of each month.
One called "Tracking" has a list of all changes to all invoices (some invoices have actually been invoiced and others are still in progress), the other just has the invoices called "Invoices" (just the invoices that have been finalized). These are sent automatically at the end of the month from an SQL server. For some reason the invoices file is now coming with duplicates in it. I have asked the people who send this to me to remove the duplicates but they say they can't change the SQL on their end. (I thought SQL is supposed to be flexible enough to make changes to queries?????). I thought I could just delete the duplicate invoice numbers, but found out that the values are not equal in some cases so I have to go to the "Tracking" spreadsheet to find the most recent change and keep that copy of the Invoice. The question: Is it possible to compare the most recent change in the "Tracking" spreadsheet to the "Invoices" spreadsheet, then delete the duplicate records in the "Invoices" spreadsheet. Currently, I delete all the extra changes in "Tracking" so that I am only seeing the most recent of Invoice changes. Then I am going Invoice Number by Invoice Number and comparing values to the "Invoices". Once I have no duplicates in the "Invoices" file, it gets uploaded into Access for further manipulation. The only reason I noticed this is because in Access, some of the Invoice values were doubled because the invoices were "summed" during the upload phase in Access. Any suggestions would be great. Thanks, Lynn |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
I am sure you can do that, but you haven't given enough clear information to
give any specific advice. You can check out Chip Pearson's pages on dealing with duplicates: http://www.cpearson.com/excel/duplicat.htm more important, however, it the fact that you are getting flawed data. If they can't change the SQL, then you need to find out who can. How do you know the information you are getting is even correct? You should not have to search very far to find someone who will both be interested in the problem and have the power to get it corrected. You are trying to mask the symptoms, but someone needs to treat the disease. -- Regards, Tom Ogilvy "Lynn A." wrote in message om... I receive two files at the end of each month. One called "Tracking" has a list of all changes to all invoices (some invoices have actually been invoiced and others are still in progress), the other just has the invoices called "Invoices" (just the invoices that have been finalized). These are sent automatically at the end of the month from an SQL server. For some reason the invoices file is now coming with duplicates in it. I have asked the people who send this to me to remove the duplicates but they say they can't change the SQL on their end. (I thought SQL is supposed to be flexible enough to make changes to queries?????). I thought I could just delete the duplicate invoice numbers, but found out that the values are not equal in some cases so I have to go to the "Tracking" spreadsheet to find the most recent change and keep that copy of the Invoice. The question: Is it possible to compare the most recent change in the "Tracking" spreadsheet to the "Invoices" spreadsheet, then delete the duplicate records in the "Invoices" spreadsheet. Currently, I delete all the extra changes in "Tracking" so that I am only seeing the most recent of Invoice changes. Then I am going Invoice Number by Invoice Number and comparing values to the "Invoices". Once I have no duplicates in the "Invoices" file, it gets uploaded into Access for further manipulation. The only reason I noticed this is because in Access, some of the Invoice values were doubled because the invoices were "summed" during the upload phase in Access. Any suggestions would be great. Thanks, Lynn |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
"Tom Ogilvy" wrote in ...
more important, however, it the fact that you are getting flawed data. If they can't change the SQL, then you need to find out who can. The OP saying, 'they can't change the SQL on their end' may mean, 'they lack the knowledge required to change the SQL on their end'. I've been looking at some code this week where sql was used to get the whole (ordered) data set then VBA was used to loop through the rows to copy the required data to an array. My approach would be to write sql to fetch the data set as required in one hit but it would need a quite complex query. I'm up for the challenge (when time allows <g) but perhaps the OP's situation is the sql guy couldn't come with the goods so said, 'Here *all* the data, filter it in Excel.' The question: Is it possible to compare the most recent change in the "Tracking" spreadsheet to the "Invoices" spreadsheet, then delete the duplicate records in the "Invoices" spreadsheet. No doubt. Using sql on the Excel data, one can't DELETE rows but one can create a new data set (e.g. a new worksheet) by making an INNER JOIN between the two sheets using the common key and using either a subquery or a GROUP BY (using MAX(<<date_column) as the criteria) to get the most recent rows. As Tom said, you need to post more details. Jamie. -- |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
Thanks for your quick responses. I hope this is enough detail to my
issue. I get two files from the server e-mailed monthly. I delete all but the most recent transaction record for each invoice in the "Tracking" workbook. This leaves me with the final changes that occured for each Invoice. This does not have any billing amounts on it but it does have the last date of changes on it. I sort it by invoice number. Next I split my screen and look at each line in the "Invoices" workbook. For any invoice that has duplicates, I look at the "Tracking" workbook and I match the line that has the same invoice status as the most recent change in the "Tracking" workbook. All the billing information is in the "Invoices" workbook. Once clean, the "Invoices" workbook is uploaded into Access to do budgeting analysis in a third party product. Unfortunately, all the columns aren't the same between the two workbooks so I can't use the Compare workbooks that I found referenced several times in the archives. I have a little macro that identifies the duplicates in the "Invoices" workbook so I'm only having to look through about 150 records of 5000. Could I run a loop or something? For each duplicated record in the "Invoices" workbook, goto the "Tracking" workbook, lookup the invoice number, find the invoice status value, match the invoice status value, delete the other occurences of that invoice number. Is this descriptive enough? The person who will be taking this responsibility over, doesn't have much excel experience and I can see them getting really frustrated with the process. I don't mind it, in takes me about an hour or so to clean the data depending on how many duplicates are there, it would be nice to click a button and have all of the steps done and import the data immediately. I have given up trying to get the source to fix the problem. Not very friendly. Anything you can suggest is greatly appreciated. I have about a week to work this through before I turn it over to the other person. Thanks, Lynn (Jamie Collins) wrote in message . com... "Tom Ogilvy" wrote in ... more important, however, it the fact that you are getting flawed data. If they can't change the SQL, then you need to find out who can. The OP saying, 'they can't change the SQL on their end' may mean, 'they lack the knowledge required to change the SQL on their end'. I've been looking at some code this week where sql was used to get the whole (ordered) data set then VBA was used to loop through the rows to copy the required data to an array. My approach would be to write sql to fetch the data set as required in one hit but it would need a quite complex query. I'm up for the challenge (when time allows <g) but perhaps the OP's situation is the sql guy couldn't come with the goods so said, 'Here *all* the data, filter it in Excel.' The question: Is it possible to compare the most recent change in the "Tracking" spreadsheet to the "Invoices" spreadsheet, then delete the duplicate records in the "Invoices" spreadsheet. No doubt. Using sql on the Excel data, one can't DELETE rows but one can create a new data set (e.g. a new worksheet) by making an INNER JOIN between the two sheets using the common key and using either a subquery or a GROUP BY (using MAX(<<date_column) as the criteria) to get the most recent rows. As Tom said, you need to post more details. Jamie. -- |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
|
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
|
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
Jamie, I truly appreciate everything that you have mentioned and the
fact that you have taken the time to respond. Unfortunately, this isn't really what I wanted. I believe what you are suggesting is more complicated than what I require. Here is the process again. I have two files Invoices and Tracking. (Both of these are uploaded into Access once all duplicates are removed) Tracking CREATEDATE INVOICENUMBER OLDCODE NEWCODE 09012004 6:00 am 0001 O P 09012004 6:03 am 0001 P CH 09012004 6:05 am 0001 CH C 09062004 12:12 pm 0002 O P 09062004 1:45 pm 0002 P CH 09172004 9:00 am 0003 0 P 09222004 11:15 am 0003 P CH 09272004 2:22 pm 0003 CH P I cycle through and delete the duplicates by invoice number and come up with this (I have a little delete duplicates code). This is what I end up with. CREATEDATE INVOICENUMBER OLDCODE NEWCODE 09012004 6:05 am 0001 CH C 09062004 1:45 pm 0002 P CH 09272004 2:22 pm 0003 CH P Invoices (this does not have a date, I have requested but again issues with having format changed) Sometimes there are duplicates and sometimes there isn't. I'm not sure why this spreadsheet gets a duplicate but this is what it would look like. Sometimes there are duplicates, sometimes there isn't. INVOICENUMBER CODE 0001 P 0001 CH 0001 C 0002 CH 0003 P 0003 CH What I would like ...... I would like to delete the duplicates in the invoice spreadsheet so that the code in the Invoices table matches the New Code in the Tracking table. The results I would like INVOICENUMBER CODE 0001 C 0002 CH 0003 P I am deleting duplicates from the Invoices spreadsheet (with the same code that I use to delete duplicates from the tracking table), then I compare the invoices that were duplicated with the NewCode from the Tracking table to make sure they are the same. I don't want a new spreadsheet, I don't want to query anything. I want to DELETE duplicates based on the match between the code in the invoices table and the newcode in the tracking table. Can it be done, if not just say so and I will continue doing what I am doing. Thanks again for all your time. I do appreciate what it takes to recreate everything. yt, Lynn (Jamie Collins) wrote in message . com... (Lynn A.) wrote ... Thank you Jamie, if changing the original SQL query was an option, your suggestions would be great. You can use sql to maniplulate the Excel data. Here is more detail. You again omitted test data and expected results: if you don't supply the former, someone has to put effort into creating test data which may not logically fit your schema; if you don't supply the latter, the person replying can't be sure their answer is correct because the results aren't verifiable. So again I'm guessing here... First, let's create a workbook from which to run some code: in the Excel app, create a new blank workbook and save as e.g. C:\test.xls. Open MS Query by choosing: Data, Import External Data, New Database Query, Excel Files, OK, navigate to the open workbook (e.g. C:\test.xls), OK, close the Add Tables dialog. In MS Query, open the SQL window by hitting the SQL button (or choose: View, SQL). Now to create a workbook e.g. C:\TempDB.xls to hold the test data and results. Paste the following text into the SQL window: CREATE TABLE [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].Tracking ( CREATEDATE DATETIME, INVOICENUMBER VARCHAR(255), OLDCODE VARCHAR(255), NEWCODE VARCHAR(255) ) ; Hit OK and the SQL is executed. You should get a success message. Open the SQL window again, delete the text and paste in the following: CREATE TABLE [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].Invoicing ( INVOICENUMBER VARCHAR(255), CODE VARCHAR(255) ) ; Hit OK and the second sheet should now be created. Now for some test data. For each of the following lines (separated by a semicolon), paste into the SQL window and hit OK to execute: INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] (CREATEDATE, INVOICENUMBER, OLDCODE, NEWCODE) VALUES ('2004-09-01', 'ABC123', 'AA', 'BB'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] (CREATEDATE, INVOICENUMBER, OLDCODE, NEWCODE) VALUES ('2004-09-02', 'ABC123', 'BB', 'CC'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] (CREATEDATE, INVOICENUMBER, OLDCODE, NEWCODE) VALUES ('2004-09-03', 'ABC123', 'CC', 'DD'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] (CREATEDATE, INVOICENUMBER, OLDCODE, NEWCODE) VALUES ('2004-09-01', 'XYZ987', 'AB', 'CD'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] (CREATEDATE, INVOICENUMBER, OLDCODE, NEWCODE) VALUES ('2004-09-02', 'XYZ987', 'CD', 'EF'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] (CREATEDATE, INVOICENUMBER, OLDCODE, NEWCODE) VALUES ('2004-09-01', 'JMC555', 'JJ', 'MC'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] (INVOICENUMBER, CODE) VALUES ('ABC123', 'AA'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] (INVOICENUMBER, CODE) VALUES ('ABC123', 'BB'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] (INVOICENUMBER, CODE) VALUES ('ABC123', 'CC'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] (INVOICENUMBER, CODE) VALUES ('ABC123', 'DD'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] (INVOICENUMBER, CODE) VALUES ('XYZ987', 'CD'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] (INVOICENUMBER, CODE) VALUES ('XYZ987', 'EF'); INSERT INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] (INVOICENUMBER, CODE) VALUES ('JMC555', 'MC'); Now for the results, as I understand them to be. I delete any duplicate invoice numbers based on the most recent date so that this only contains one row per invoice number. So, SELECT only the most recent rows, using a subquery to find the maximum date for each INVOICENUMBER, into a new sheet i.e. by executing: SELECT T1.CREATEDATE, T1.INVOICENUMBER, T1.OLDCODE, T1.NEWCODE INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].NewTracking FROM [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] T1 WHERE T1.CREATEDATE = ( SELECT MAX(T2.CREATEDATE) FROM [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Tracking$] T2 WHERE T1.INVOICENUMBER = T2.INVOICENUMBER ) ; I manually loop through each invoice number that had duplicates. I manually delete the line(s) that don't match the "newcode" from the Tracking spreadsheet is the same as the "code" from the Invoicing spreadsheet. I not sure what you want here but suspect you want to JOIN to the 'NewTracking' table using INVOICENUMBER and CODE/NEWCODE i.e. SELECT T1.INVOICENUMBER, T1.CODE INTO [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].NewInvoicing FROM [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[Invoicing$] T1 INNER JOIN [Excel 8.0;HDR=YES;Database=C:\TempDB.xls;].[NewTracking$] T2 ON T1.INVOICENUMBER = T2.INVOICENUMBER AND T1.CODE = T2.NEWCODE ; Take a look at the results e.g. by quitting MS Query and opening C:\TempDB.xls. If the 'new' tables are not you expected results, post back with some test data and what you expect to see. Jamie. -- |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
|
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
Thanks Jamie, I appreciate your response and your time.
I will repost Lynn (Jamie Collins) wrote in message om... (Lynn A.) wrote ... I don't want to query anything. Then I'm not the guy for you <g. Can it be done, if not just say so and I will continue doing what I am doing. It can be done using the approach you've outlined but it requires a lot more effort than writing a sql query (the set-based sql language was invented so that we don't have to use a cursor/loop on an ordered set). From the details you've posted you should get a good response. However, a re-post may be in order; many people are put off by the mention of sql e.g. Dave Peterson famously doesn't do sql either (hopefully that will summon him for you <g). Jamie. -- |
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Comparing contents - is it possible
|
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Comparing cell contents via VB | Excel Discussion (Misc queries) | |||
Comparing Cell Contents in 2 or More Columns | Excel Discussion (Misc queries) | |||
Comparing Field Contents | Excel Discussion (Misc queries) | |||
Comparing cell contents with different reference cells | Excel Worksheet Functions | |||
Comparing Workbook contents | Excel Discussion (Misc queries) |