![]() |
Lookup from a very heavy file
Hi,
My colleague has a file of size 31 MB (56000 rows) in which 2 columns are of interest (Column B and F) He also has another smaller file of 7MB which has 10000 rows. In this file, he has to check whether data in Column Z appears in Column B and F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal to SmallFile!Z). If it exists then we use a IF formula along with ISERROR to say that "Data exists" or "Data Not Exist". Once we get the results we just do copy -- paste special -- values so that each time editing of some other column of the small file excel doesnt hang excel because of recalculation of lookups. The formula is working fine and am getting correct results. But because of the size of the big file, it takes a lot of time to calculate. I wanted to know as to whether we may load the big data file in to access and then from excel I perform something like a lookup to the access database. Is such a thing possible? If yes how. Secondly, would this method offer significant speed as compared to my earlier method. Thirdly are there other methods which might speed up this whole process. Regards, HP India |
Lookup from a very heavy file
Hi Hari,
Sort your data and try this: it will calculate hundreds or thousands of times faster. Sorted Data with Missing Values. Two approximate Lookups are usually faster than one exact Lookup. If you can sort your data but still cannot use approximate match because you can't be sure that the value you are looking up exists in the lookup range, then try this: IF(lookup_val=Index(lookup_array,MATCH(lookup_val, lookup_list),1) ,Index(lookup_array,MATCH(lookup_val,lookup_array) , colnum),"notexist") This does an approximate lookup on the lookup list, and if the lookup value = the answer in the lookup column you have found an exact match, so redo the approximate lookup on the column you want, otherwise it's a missing value. Note that this assumes you never lookup a value smaller than the smallest value in the list, so you may need to add a dummy very small entry into the list. Two approximate matches are significantly faster than one exact match for a lookup over a large number of rows (breakeven point is about 10-20 rows). -- Charles ______________________ Decision Models FastExcel 2.2 Beta now available www.DecisionModels.com "Hari" wrote in message ups.com... Hi, My colleague has a file of size 31 MB (56000 rows) in which 2 columns are of interest (Column B and F) He also has another smaller file of 7MB which has 10000 rows. In this file, he has to check whether data in Column Z appears in Column B and F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal to SmallFile!Z). If it exists then we use a IF formula along with ISERROR to say that "Data exists" or "Data Not Exist". Once we get the results we just do copy -- paste special -- values so that each time editing of some other column of the small file excel doesnt hang excel because of recalculation of lookups. The formula is working fine and am getting correct results. But because of the size of the big file, it takes a lot of time to calculate. I wanted to know as to whether we may load the big data file in to access and then from excel I perform something like a lookup to the access database. Is such a thing possible? If yes how. Secondly, would this method offer significant speed as compared to my earlier method. Thirdly are there other methods which might speed up this whole process. Regards, HP India |
Lookup from a very heavy file
Charles,
Thanks for your solution. Since, my file is of 31 MB, it takes a lot of time to load and usually it ends up freezing every application. Thats why I wanted to even avoid opening this file. Iam not sure as to how much trouble it will give in case we try to sort this 55000 row file. (I dont have the file in front of me presently). Morever I think my colleague is expectign the number of rows to increase in the near future and would cross the 65536 limit. In such a case, we would have to anyway keep the source data for lookup outside of excel. While writing this post, I realize that probably I could move both my Lookup value table and Table array to access and then use some basic SQL to produce the kind of output I want (I think I can accomplish this). Only thing is Iam slightly more keen for a solution in which I could perfrom my lookup from an excel sheet to an access table. One more thing. Your method of 2 approxiamte lookups is novel to me. Though I couldnt appreciate as to why your method works, I will try to figure it out. Iam sure I can put this to use for other lookup situations I encounter. Regards, HP India |
Lookup from a very heavy file
Best option might be to do this with SQL directly on the text files.
Output would be to a third text file. If you want to go this way I can post some example code. RBS "Hari" wrote in message ups.com... Hi, My colleague has a file of size 31 MB (56000 rows) in which 2 columns are of interest (Column B and F) He also has another smaller file of 7MB which has 10000 rows. In this file, he has to check whether data in Column Z appears in Column B and F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal to SmallFile!Z). If it exists then we use a IF formula along with ISERROR to say that "Data exists" or "Data Not Exist". Once we get the results we just do copy -- paste special -- values so that each time editing of some other column of the small file excel doesnt hang excel because of recalculation of lookups. The formula is working fine and am getting correct results. But because of the size of the big file, it takes a lot of time to calculate. I wanted to know as to whether we may load the big data file in to access and then from excel I perform something like a lookup to the access database. Is such a thing possible? If yes how. Secondly, would this method offer significant speed as compared to my earlier method. Thirdly are there other methods which might speed up this whole process. Regards, HP India |
Lookup from a very heavy file
RBS,
Thanks for your response. I have used SQL within SAS (and in a small way in Oracle) but havent come across where Joins are done directly on text files. I would like to learn this method of yours. When you say text file, can it be done on CSV format files? (Just to clue you in, as to how I was planning to approach this was by doing an non-equi join to produce a table which doesnt have te lookup values and secondly an equi join to produce a table which contains the lookup values, then I was planning to stack these tables one below the other with an additional column indicating as to what table they are from.) Regards, HP India |
Lookup from a very heavy file
Here some code snippets and an example query.
It is not working code, but if needed I can post that later. Dim rs As ADODB.Recordset Dim TempTextConn As String TempTextConn = _ "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & _ "C:\TempTables\;" & _ "Extended Properties=Text;" 'query example : '--------------- SELECT P.PATIENT_ID, P.READ_CODE, P.TERM_TEXT, P.ADDED_DATE, P.ENTRY_ID INTO ResultFile.txt IN 'C:\TempTables\' 'Text;FMT=Delimited' FROM all_auth.txt P INNER JOIN all_current_repeats.txt E ON (P.ENTRY_ID = E.ENTRY_ID) Set rs = New ADODB.Recordset rs.Open Source:=strQuery, _ ActiveConnection:=TempTextConn, _ CursorType:=adOpenForwardOnly, _ LockType:=adLockReadOnly, _ Options:=adCmdText RBS "Hari" wrote in message oups.com... RBS, Thanks for your response. I have used SQL within SAS (and in a small way in Oracle) but havent come across where Joins are done directly on text files. I would like to learn this method of yours. When you say text file, can it be done on CSV format files? (Just to clue you in, as to how I was planning to approach this was by doing an non-equi join to produce a table which doesnt have te lookup values and secondly an equi join to produce a table which contains the lookup values, then I was planning to stack these tables one below the other with an additional column indicating as to what table they are from.) Regards, HP India |
Lookup from a very heavy file
Have tested now and this works fine:
Sub test() Dim rs As ADODB.Recordset Dim TempTextConn As String Dim strQuery As String TempTextConn = _ "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & _ "C:\TempTables\;" & _ "Extended Properties=Text;" 'query example : '--------------- strQuery = "SELECT " & _ "P.PATIENT_ID, " & _ "P.OLD_EXTERNAL_NO, " & _ "P.FORENAME_1, " & _ "P.SURNAME, " & _ "P.AGE, " & _ "P.GENDER_TYPE, " & _ "P.ADDRESS_LINE_2, " & _ "P.REGISTERED_GP, " & _ "E.READ_CODE, " & _ "E.START_DATE, " & _ "E.ADDED_DATE " & _ "INTO ResultFile.txt IN " & _ "'C:\TempTables\' " & _ "'Text;FMT=Delimited' " & _ "FROM " & _ "2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _ "(P.PATIENT_ID = E.PATIENT_ID) " Set rs = New ADODB.Recordset rs.Open Source:=strQuery, _ ActiveConnection:=TempTextConn, _ CursorType:=adOpenForwardOnly, _ LockType:=adLockReadOnly, _ Options:=adCmdText 'can't close RecordSet here as it is closed already Set rs = Nothing End Sub RBS "Hari" wrote in message oups.com... RBS, Thanks for your response. I have used SQL within SAS (and in a small way in Oracle) but havent come across where Joins are done directly on text files. I would like to learn this method of yours. When you say text file, can it be done on CSV format files? (Just to clue you in, as to how I was planning to approach this was by doing an non-equi join to produce a table which doesnt have te lookup values and secondly an equi join to produce a table which contains the lookup values, then I was planning to stack these tables one below the other with an additional column indicating as to what table they are from.) Regards, HP India |
Lookup from a very heavy file
I'm also interested in solution of this problem.
I have more than 65536 rows of data so I'm doing a vlookup in two sheets, with if iserror vlookup in sheet1 vlookup in sheet2. But it's slow and sheet2 in couple of weeks will become more than 65000 limit, so doing lookup in three sheets almost imposable. Going a little bit from subject, I want to say that new excel 2007 support more than million rows, and there is beta version available and it calculates much faster and uses both parts of my dual core processor, so I connected the text file of my original data trough ODBC and database import and it works, but there is a problem with beta version of this new excel, when you save such big file you will not be able to reopen it later, excel says that data is corrupted and he needs to recover it, but can't. So prior to save it I'm deleting data sheet. Saying the truth I'm doing lookup from two different and such a big text files and it's getting worse. I know that there should be a better solution. Since I don't know any SQL maybe somebody can help. Just an idea. It sholud be a custom excel function, just like vlookup, so it could be dragged down using excel's functionality, so if the function is =ourlookup(A1,2) when you drag it down it changes to =ourlookup(A2,2). Inside of this custom function should be SQL which access to any database, in this case to my ODBC connected text file. But if it would be function does it necessary for the function to connect to database and select and close for each row? How to do that? Sorry for my English. RB Smissaert wrote: Have tested now and this works fine: Sub test() Dim rs As ADODB.Recordset Dim TempTextConn As String Dim strQuery As String TempTextConn = _ "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & _ "C:\TempTables\;" & _ "Extended Properties=Text;" 'query example : '--------------- strQuery = "SELECT " & _ "P.PATIENT_ID, " & _ "P.OLD_EXTERNAL_NO, " & _ "P.FORENAME_1, " & _ "P.SURNAME, " & _ "P.AGE, " & _ "P.GENDER_TYPE, " & _ "P.ADDRESS_LINE_2, " & _ "P.REGISTERED_GP, " & _ "E.READ_CODE, " & _ "E.START_DATE, " & _ "E.ADDED_DATE " & _ "INTO ResultFile.txt IN " & _ "'C:\TempTables\' " & _ "'Text;FMT=Delimited' " & _ "FROM " & _ "2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _ "(P.PATIENT_ID = E.PATIENT_ID) " Set rs = New ADODB.Recordset rs.Open Source:=strQuery, _ ActiveConnection:=TempTextConn, _ CursorType:=adOpenForwardOnly, _ LockType:=adLockReadOnly, _ Options:=adCmdText 'can't close RecordSet here as it is closed already Set rs = Nothing End Sub RBS "Hari" wrote in message oups.com... RBS, Thanks for your response. I have used SQL within SAS (and in a small way in Oracle) but havent come across where Joins are done directly on text files. I would like to learn this method of yours. When you say text file, can it be done on CSV format files? (Just to clue you in, as to how I was planning to approach this was by doing an non-equi join to produce a table which doesnt have te lookup values and secondly an equi join to produce a table which contains the lookup values, then I was planning to stack these tables one below the other with an additional column indicating as to what table they are from.) Regards, HP India |
Lookup from a very heavy file
RBS,
Thanks a lot for your code. I will test it one my data and will let you know in case I have any syntax problems. Regards, HP India RB Smissaert wrote: Have tested now and this works fine: Sub test() Dim rs As ADODB.Recordset Dim TempTextConn As String Dim strQuery As String TempTextConn = _ "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & _ "C:\TempTables\;" & _ "Extended Properties=Text;" 'query example : '--------------- strQuery = "SELECT " & _ "P.PATIENT_ID, " & _ "P.OLD_EXTERNAL_NO, " & _ "P.FORENAME_1, " & _ "P.SURNAME, " & _ "P.AGE, " & _ "P.GENDER_TYPE, " & _ "P.ADDRESS_LINE_2, " & _ "P.REGISTERED_GP, " & _ "E.READ_CODE, " & _ "E.START_DATE, " & _ "E.ADDED_DATE " & _ "INTO ResultFile.txt IN " & _ "'C:\TempTables\' " & _ "'Text;FMT=Delimited' " & _ "FROM " & _ "2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _ "(P.PATIENT_ID = E.PATIENT_ID) " Set rs = New ADODB.Recordset rs.Open Source:=strQuery, _ ActiveConnection:=TempTextConn, _ CursorType:=adOpenForwardOnly, _ LockType:=adLockReadOnly, _ Options:=adCmdText 'can't close RecordSet here as it is closed already Set rs = Nothing End Sub RBS "Hari" wrote in message oups.com... RBS, Thanks for your response. I have used SQL within SAS (and in a small way in Oracle) but havent come across where Joins are done directly on text files. I would like to learn this method of yours. When you say text file, can it be done on CSV format files? (Just to clue you in, as to how I was planning to approach this was by doing an non-equi join to produce a table which doesnt have te lookup values and secondly an equi join to produce a table which contains the lookup values, then I was planning to stack these tables one below the other with an additional column indicating as to what table they are from.) Regards, HP India |
All times are GMT +1. The time now is 04:03 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com