ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Lookup from a very heavy file (https://www.excelbanter.com/excel-programming/361978-lookup-very-heavy-file.html)

Hari

Lookup from a very heavy file
 
Hi,

My colleague has a file of size 31 MB (56000 rows) in which 2 columns
are of interest (Column B and F)

He also has another smaller file of 7MB which has 10000 rows. In this
file, he has to check whether data in Column Z appears in Column B and
F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal
to SmallFile!Z).

If it exists then we use a IF formula along with ISERROR to say that
"Data exists" or "Data Not Exist". Once we get the results we just do
copy -- paste special -- values so that each time editing of some other
column of the small file excel doesnt hang excel because of
recalculation of lookups.

The formula is working fine and am getting correct results.

But because of the size of the big file, it takes a lot of time to
calculate. I wanted to know as to whether we may load the big data file
in to access and then from excel I perform something like a lookup to
the access database. Is such a thing possible? If yes how. Secondly,
would this method offer significant speed as compared to my earlier
method. Thirdly are there other methods which might speed up this whole
process.

Regards,
HP
India


Charles Williams

Lookup from a very heavy file
 
Hi Hari,

Sort your data and try this: it will calculate hundreds or thousands of
times faster.
Sorted Data with Missing Values.
Two approximate Lookups are usually faster than one exact Lookup.
If you can sort your data but still cannot use approximate match
because you can't be sure that the value you are looking up exists in the
lookup range, then try this:

IF(lookup_val=Index(lookup_array,MATCH(lookup_val, lookup_list),1)
,Index(lookup_array,MATCH(lookup_val,lookup_array) , colnum),"notexist")

This does an approximate lookup on the lookup list, and if the lookup
value = the answer in the lookup column you have found an exact match, so
redo the approximate lookup on the column you want, otherwise it's a missing
value. Note that this assumes you never lookup a value smaller than the
smallest value in the list, so you may need to add a dummy very small entry
into the list.
Two approximate matches are significantly faster than one exact match
for a lookup over a large number of rows (breakeven point is about 10-20
rows).


--
Charles
______________________
Decision Models
FastExcel 2.2 Beta now available
www.DecisionModels.com


"Hari" wrote in message
ups.com...
Hi,

My colleague has a file of size 31 MB (56000 rows) in which 2 columns
are of interest (Column B and F)

He also has another smaller file of 7MB which has 10000 rows. In this
file, he has to check whether data in Column Z appears in Column B and
F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal
to SmallFile!Z).

If it exists then we use a IF formula along with ISERROR to say that
"Data exists" or "Data Not Exist". Once we get the results we just do
copy -- paste special -- values so that each time editing of some other
column of the small file excel doesnt hang excel because of
recalculation of lookups.

The formula is working fine and am getting correct results.

But because of the size of the big file, it takes a lot of time to
calculate. I wanted to know as to whether we may load the big data file
in to access and then from excel I perform something like a lookup to
the access database. Is such a thing possible? If yes how. Secondly,
would this method offer significant speed as compared to my earlier
method. Thirdly are there other methods which might speed up this whole
process.

Regards,
HP
India




Hari

Lookup from a very heavy file
 
Charles,

Thanks for your solution.

Since, my file is of 31 MB, it takes a lot of time to load and usually
it ends up freezing every application. Thats why I wanted to even avoid
opening this file. Iam not sure as to how much trouble it will give in
case we try to sort this 55000 row file. (I dont have the file in front
of me presently). Morever I think my colleague is expectign the number
of rows to increase in the near future and would cross the 65536 limit.
In such a case, we would have to anyway keep the source data for lookup
outside of excel. While writing this post, I realize that probably I
could move both my Lookup value table and Table array to access and
then use some basic SQL to produce the kind of output I want (I think I
can accomplish this). Only thing is Iam slightly more keen for a
solution in which I could perfrom my lookup from an excel sheet to an
access table.

One more thing. Your method of 2 approxiamte lookups is novel to me.
Though I couldnt appreciate as to why your method works, I will try to
figure it out. Iam sure I can put this to use for other lookup
situations I encounter.

Regards,
HP
India


RB Smissaert

Lookup from a very heavy file
 
Best option might be to do this with SQL directly on the text files.
Output would be to a third text file. If you want to go this way I can
post some example code.

RBS

"Hari" wrote in message
ups.com...
Hi,

My colleague has a file of size 31 MB (56000 rows) in which 2 columns
are of interest (Column B and F)

He also has another smaller file of 7MB which has 10000 rows. In this
file, he has to check whether data in Column Z appears in Column B and
F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal
to SmallFile!Z).

If it exists then we use a IF formula along with ISERROR to say that
"Data exists" or "Data Not Exist". Once we get the results we just do
copy -- paste special -- values so that each time editing of some other
column of the small file excel doesnt hang excel because of
recalculation of lookups.

The formula is working fine and am getting correct results.

But because of the size of the big file, it takes a lot of time to
calculate. I wanted to know as to whether we may load the big data file
in to access and then from excel I perform something like a lookup to
the access database. Is such a thing possible? If yes how. Secondly,
would this method offer significant speed as compared to my earlier
method. Thirdly are there other methods which might speed up this whole
process.

Regards,
HP
India



Hari

Lookup from a very heavy file
 
RBS,

Thanks for your response. I have used SQL within SAS (and in a small
way in Oracle) but havent come across where Joins are done directly on
text files. I would like to learn this method of yours. When you say
text file, can it be done on CSV format files?

(Just to clue you in, as to how I was planning to approach this was by
doing an non-equi join to produce a table which doesnt have te lookup
values and secondly an equi join to produce a table which contains the
lookup values, then I was planning to stack these tables one below the
other with an additional column indicating as to what table they are
from.)

Regards,
HP
India


RB Smissaert

Lookup from a very heavy file
 
Here some code snippets and an example query.
It is not working code, but if needed I can post that later.

Dim rs As ADODB.Recordset
Dim TempTextConn As String

TempTextConn = _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & _
"C:\TempTables\;" & _
"Extended Properties=Text;"

'query example :
'---------------
SELECT
P.PATIENT_ID,
P.READ_CODE,
P.TERM_TEXT,
P.ADDED_DATE,
P.ENTRY_ID
INTO ResultFile.txt IN
'C:\TempTables\'
'Text;FMT=Delimited'
FROM
all_auth.txt P INNER JOIN all_current_repeats.txt E ON
(P.ENTRY_ID = E.ENTRY_ID)

Set rs = New ADODB.Recordset

rs.Open Source:=strQuery, _
ActiveConnection:=TempTextConn, _
CursorType:=adOpenForwardOnly, _
LockType:=adLockReadOnly, _
Options:=adCmdText


RBS


"Hari" wrote in message
oups.com...
RBS,

Thanks for your response. I have used SQL within SAS (and in a small
way in Oracle) but havent come across where Joins are done directly on
text files. I would like to learn this method of yours. When you say
text file, can it be done on CSV format files?

(Just to clue you in, as to how I was planning to approach this was by
doing an non-equi join to produce a table which doesnt have te lookup
values and secondly an equi join to produce a table which contains the
lookup values, then I was planning to stack these tables one below the
other with an additional column indicating as to what table they are
from.)

Regards,
HP
India



RB Smissaert

Lookup from a very heavy file
 
Have tested now and this works fine:

Sub test()

Dim rs As ADODB.Recordset
Dim TempTextConn As String
Dim strQuery As String

TempTextConn = _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & _
"C:\TempTables\;" & _
"Extended Properties=Text;"

'query example :
'---------------
strQuery = "SELECT " & _
"P.PATIENT_ID, " & _
"P.OLD_EXTERNAL_NO, " & _
"P.FORENAME_1, " & _
"P.SURNAME, " & _
"P.AGE, " & _
"P.GENDER_TYPE, " & _
"P.ADDRESS_LINE_2, " & _
"P.REGISTERED_GP, " & _
"E.READ_CODE, " & _
"E.START_DATE, " & _
"E.ADDED_DATE " & _
"INTO ResultFile.txt IN " & _
"'C:\TempTables\' " & _
"'Text;FMT=Delimited' " & _
"FROM " & _
"2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _
"(P.PATIENT_ID = E.PATIENT_ID) "

Set rs = New ADODB.Recordset

rs.Open Source:=strQuery, _
ActiveConnection:=TempTextConn, _
CursorType:=adOpenForwardOnly, _
LockType:=adLockReadOnly, _
Options:=adCmdText

'can't close RecordSet here as it is closed already
Set rs = Nothing

End Sub


RBS


"Hari" wrote in message
oups.com...
RBS,

Thanks for your response. I have used SQL within SAS (and in a small
way in Oracle) but havent come across where Joins are done directly on
text files. I would like to learn this method of yours. When you say
text file, can it be done on CSV format files?

(Just to clue you in, as to how I was planning to approach this was by
doing an non-equi join to produce a table which doesnt have te lookup
values and secondly an equi join to produce a table which contains the
lookup values, then I was planning to stack these tables one below the
other with an additional column indicating as to what table they are
from.)

Regards,
HP
India



Aram[_2_]

Lookup from a very heavy file
 
I'm also interested in solution of this problem.
I have more than 65536 rows of data so I'm doing a vlookup in two
sheets, with if iserror vlookup in sheet1 vlookup in sheet2. But it's
slow and sheet2 in couple of weeks will become more than 65000 limit,
so doing lookup in three sheets almost imposable.
Going a little bit from subject, I want to say that new excel 2007
support more than million rows, and there is beta version available and
it calculates much faster and uses both parts of my dual core
processor, so I connected the text file of my original data trough ODBC
and database import and it works, but there is a problem with beta
version of this new excel, when you save such big file you will not be
able to reopen it later, excel says that data is corrupted and he needs
to recover it, but can't. So prior to save it I'm deleting data sheet.
Saying the truth I'm doing lookup from two different and such a big
text files and it's getting worse. I know that there should be a better
solution. Since I don't know any SQL maybe somebody can help.
Just an idea. It sholud be a custom excel function, just like vlookup,
so it could be dragged down using excel's functionality, so if the
function is =ourlookup(A1,2) when you drag it down it changes to
=ourlookup(A2,2). Inside of this custom function should be SQL which
access to any database, in this case to my ODBC connected text file.
But if it would be function does it necessary for the function to
connect to database and select and close for each row?

How to do that?

Sorry for my English.

RB Smissaert wrote:
Have tested now and this works fine:

Sub test()

Dim rs As ADODB.Recordset
Dim TempTextConn As String
Dim strQuery As String

TempTextConn = _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & _
"C:\TempTables\;" & _
"Extended Properties=Text;"

'query example :
'---------------
strQuery = "SELECT " & _
"P.PATIENT_ID, " & _
"P.OLD_EXTERNAL_NO, " & _
"P.FORENAME_1, " & _
"P.SURNAME, " & _
"P.AGE, " & _
"P.GENDER_TYPE, " & _
"P.ADDRESS_LINE_2, " & _
"P.REGISTERED_GP, " & _
"E.READ_CODE, " & _
"E.START_DATE, " & _
"E.ADDED_DATE " & _
"INTO ResultFile.txt IN " & _
"'C:\TempTables\' " & _
"'Text;FMT=Delimited' " & _
"FROM " & _
"2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _
"(P.PATIENT_ID = E.PATIENT_ID) "

Set rs = New ADODB.Recordset

rs.Open Source:=strQuery, _
ActiveConnection:=TempTextConn, _
CursorType:=adOpenForwardOnly, _
LockType:=adLockReadOnly, _
Options:=adCmdText

'can't close RecordSet here as it is closed already
Set rs = Nothing

End Sub


RBS


"Hari" wrote in message
oups.com...
RBS,

Thanks for your response. I have used SQL within SAS (and in a small
way in Oracle) but havent come across where Joins are done directly on
text files. I would like to learn this method of yours. When you say
text file, can it be done on CSV format files?

(Just to clue you in, as to how I was planning to approach this was by
doing an non-equi join to produce a table which doesnt have te lookup
values and secondly an equi join to produce a table which contains the
lookup values, then I was planning to stack these tables one below the
other with an additional column indicating as to what table they are
from.)

Regards,
HP
India



Hari

Lookup from a very heavy file
 
RBS,

Thanks a lot for your code. I will test it one my data and will let you
know in case I have any syntax problems.

Regards,
HP
India


RB Smissaert wrote:
Have tested now and this works fine:

Sub test()

Dim rs As ADODB.Recordset
Dim TempTextConn As String
Dim strQuery As String

TempTextConn = _
"Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & _
"C:\TempTables\;" & _
"Extended Properties=Text;"

'query example :
'---------------
strQuery = "SELECT " & _
"P.PATIENT_ID, " & _
"P.OLD_EXTERNAL_NO, " & _
"P.FORENAME_1, " & _
"P.SURNAME, " & _
"P.AGE, " & _
"P.GENDER_TYPE, " & _
"P.ADDRESS_LINE_2, " & _
"P.REGISTERED_GP, " & _
"E.READ_CODE, " & _
"E.START_DATE, " & _
"E.ADDED_DATE " & _
"INTO ResultFile.txt IN " & _
"'C:\TempTables\' " & _
"'Text;FMT=Delimited' " & _
"FROM " & _
"2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _
"(P.PATIENT_ID = E.PATIENT_ID) "

Set rs = New ADODB.Recordset

rs.Open Source:=strQuery, _
ActiveConnection:=TempTextConn, _
CursorType:=adOpenForwardOnly, _
LockType:=adLockReadOnly, _
Options:=adCmdText

'can't close RecordSet here as it is closed already
Set rs = Nothing

End Sub


RBS


"Hari" wrote in message
oups.com...
RBS,

Thanks for your response. I have used SQL within SAS (and in a small
way in Oracle) but havent come across where Joins are done directly on
text files. I would like to learn this method of yours. When you say
text file, can it be done on CSV format files?

(Just to clue you in, as to how I was planning to approach this was by
doing an non-equi join to produce a table which doesnt have te lookup
values and secondly an equi join to produce a table which contains the
lookup values, then I was planning to stack these tables one below the
other with an additional column indicating as to what table they are
from.)

Regards,
HP
India




All times are GMT +1. The time now is 04:03 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com