Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Excel: Opening Very Large Text Files
Trying to open very large text files (200 MB+) using an Excel VBA
program. I am first using the "Open text_filename For Input As #1" command to open the file. I then use a "Do While Not EOF(1)" loop and within that use "Line Input #1" command to look at one line at at time and parse for a specific text string (If line has the desired text string, I am then printing the entire line to a new text output file; the new output file will be much smaller) I tested my program on smaller text files and it works, but the 200 MB huge files are causing Excel to lock up. Questions: - I thought the use of the "Line Input #1" would allow me to read one line of text at a time? So even though text file is very large, I only look at one line at at time. - Or is "Open filename For Input" command still causing Excel to have to open the entire 200 MB file? - Is there some other way I can parse out the text lines I want from such a huge text file? |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Excel: Opening Very Large Text Files
Thanks.
Unfortunately it is not at all a column structure. It's basically a big ugly logfile off a SUN server. The only consistent structure is that every line begins with a date and time stamp, but after the stamp it's just text (no consistent commas or other delimiters). I know what I really need to do: Learn Unix shell or perl scripting so I can do the text parsing in Unix, and then bring the output file into Excel. On Tue, 16 Sep 2003 20:16:07 -0700, "Rob Bovey" wrote: If your text file is well-formed (i.e. it's got a database-like column and row structure throughout) then you can use ADO to query the contents of the text file. Here's some generic code that shows how to do this. Note that you have to set a reference to the ADO object library in order to use this code. ''' Set a reference to the Microsoft ActiveX Data ''' Objects 2.X Library under Tools/References. Public Sub GenericQueryTextFile() Dim rsData As ADODB.Recordset Dim szPath As String Dim szConnect As String Dim szSQL As String ''' This is the path where the text file ''' is located (not the filename). szPath = ThisWorkbook.Path & "\" ''' Create the connection string. szConnect = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" & szPath & ";" & _ "Extended Properties=""Text;HDR=NO"";" ''' If the first row of the text file has column ''' names, the last line above should be: ''' "Extended Properties=Text;" ''' Create the SQL statement. This can be as ''' complex as you like to limit the number ''' of records returned. ''' "MyData.txt" should be the name of the ''' text file you're trying to query. szSQL = "SELECT * FROM MyData.txt;" Set rsData = New ADODB.Recordset rsData.Open szSQL, szConnect, adOpenForwardOnly, _ adLockReadOnly, adCmdText ''' Check to make sure we received data. If Not rsData.EOF Then ''' Either dump the whole data set onto Sheet1. 'Sheet1.UsedRange.Clear 'Sheet1.Range("A1").CopyFromRecordset rsData ''' Or check each record individually. Do While Not rsData.EOF If rsData.Fields(0).Value 100 Then ''' Do something here. End If rsData.MoveNext Loop Else MsgBox "No records returned.", vbCritical End If ''' Clean up our Recordset object. rsData.Close Set rsData = Nothing End Sub |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Excel: Opening Very Large Text Files
wrote in message ... I know what I really need to do: Learn Unix shell or perl scripting so I can do the text parsing in Unix, and then bring the output file into Excel. Ask, and you shall receive: grep search_string source_file output_file (obviously, replace search_string, source_file, and output_file accordingly.) -- HTH - -Frank Microsoft Excel MVP Dolphin Technology Corp. http://vbapro.com |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Excel: Opening Very Large Text Files
And there are versions of grep that have been ported to the windows platform.
Take a look at www.shareware.com for grep. or maybe a DOS equivalent: find /i "searchstring" source_file output_file (that looks kind of familiar???) But I think I'd check to see how big a file DOS's Find will work with. (I don't recall if there's a limit.) Frank Isaacs wrote: wrote in message ... I know what I really need to do: Learn Unix shell or perl scripting so I can do the text parsing in Unix, and then bring the output file into Excel. Ask, and you shall receive: grep search_string source_file output_file (obviously, replace search_string, source_file, and output_file accordingly.) -- HTH - -Frank Microsoft Excel MVP Dolphin Technology Corp. http://vbapro.com -- Dave Peterson |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Excel: Opening Very Large Text Files
Thanks for all the suggestions, I appreciate everyone taking the time. Guess I have to open the can of worms further :-) I was actually starting with a 600MB+ file and using grep to pull out lines with desired text string. But there are many, many lines with duplicate text strings (see example below) and the resulting grep output was still 200MB+ For example, say that I have the following lines in the text file (this is my 600MB file): look at that dog1 look at that cat look at that dog1 look at that mouse look at that dog1 look at that dog2 look at that dog2 look at that bird look at that dog3 look at that dog4 look at that dog4 I can easily grep out only those lines with "dog" in them and I get this output (this is my 200MB file): look at that dog1 look at that dog1 look at that dog1 look at that dog2 look at that dog2 look at that dog3 look at that dog4 look at that dog4 But I don't know of a way to use grep so that I also eliminate duplicates so that output looks like this: look at that dog1 look at that dog2 look at that dog3 look at that dog4 Any ideas (short of a Perl script)? Thanks again. |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
VBA Excel: Opening Very Large Text Files
Well, if Windows is your primary platform, I'd say Rob's ADO idea is going
to be your best bet. If you can play around on the *nix side of things, in addition to grep, you've got awk and sed. Since the data is starting there, you may want to do as much of that raw processing as possible before moving it over to Excel, and let Excel do what it's good at. -- HTH - -Frank Microsoft Excel MVP Dolphin Technology Corp. http://vbapro.com wrote in message ... Thanks for all the suggestions, I appreciate everyone taking the time. Guess I have to open the can of worms further :-) I was actually starting with a 600MB+ file and using grep to pull out lines with desired text string. But there are many, many lines with duplicate text strings (see example below) and the resulting grep output was still 200MB+ For example, say that I have the following lines in the text file (this is my 600MB file): look at that dog1 look at that cat look at that dog1 look at that mouse look at that dog1 look at that dog2 look at that dog2 look at that bird look at that dog3 look at that dog4 look at that dog4 I can easily grep out only those lines with "dog" in them and I get this output (this is my 200MB file): look at that dog1 look at that dog1 look at that dog1 look at that dog2 look at that dog2 look at that dog3 look at that dog4 look at that dog4 But I don't know of a way to use grep so that I also eliminate duplicates so that output looks like this: look at that dog1 look at that dog2 look at that dog3 look at that dog4 Any ideas (short of a Perl script)? Thanks again. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Opening text files | Excel Discussion (Misc queries) | |||
Extracting data from large text files for beginner | Excel Worksheet Functions | |||
Importing Large Text Files | Excel Discussion (Misc queries) | |||
Text Import Wizard (use for large files) | Excel Discussion (Misc queries) | |||
Opening large text files | Excel Discussion (Misc queries) |