Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
Hello
Would anyone know a better way to remove the first line of a huge txt file? I am using a FileSystemObject/TextStream, doing a first readline, and then looping: Destination.WriteLine Source.ReadLine but the two files are located on drives and it is taking hours to run. Is anyone aware of a smarter way? (I guess the solution would be to directly delete the first row in a file and save the change...without loading it all to memory, but not sure of how to achieve that) Thanks in advance Charles |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
I think you're pretty much stuck with how you're doing it now.
How large is the file. Do you mean hours literally ? Tim "Charles" wrote in message ... Hello Would anyone know a better way to remove the first line of a huge txt file? I am using a FileSystemObject/TextStream, doing a first readline, and then looping: Destination.WriteLine Source.ReadLine but the two files are located on drives and it is taking hours to run. Is anyone aware of a smarter way? (I guess the solution would be to directly delete the first row in a file and save the change...without loading it all to memory, but not sure of how to achieve that) Thanks in advance Charles |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
How huge is "huge"? I would think the following should work fairly quickly
on a file up to 20 Megs in size and, while slowing down on file sizes larger than this, it should still work much, much faster what you are describing for your present code. Give it a try and let us know... Sub RemoveFirstLine(PathAndFileName As String) Dim FileNum As Long Dim TotalFile As String FileNum = FreeFile Open PathAndFileName For Binary As #FileNum TotalFile = Space(LOF(FileNum)) Get #FileNum, , TotalFile Close #FileNum FileNum = FreeFile Open PathAndFileName For Output As #FileNum Print #FileNum, Mid$(TotalFile, InStr(TotalFile, vbCrLf) + 2) Close #FileNum End Sub You would call this subroutine from within your own code like this... Sub Test() ' ' <<< Some code ' RemoveFirstLine "C:\TEMP\TestData.txt" ' ' <<< Some more code ' End Sub where, of course, you would substitute your own path and filename for the sample one I used. If you think you might need it (I don't), you can add a slight measure of safety to the RemoveFirstLine subroutine by Output'ting the modified TotalFile (the Mid$ function call) to a different filename in the path, then executing a Kill statement for the original file, and finally executing a Name..As statement to rename the Output'ted filename back to the original filename. Rick "Charles" wrote in message ... Hello Would anyone know a better way to remove the first line of a huge txt file? I am using a FileSystemObject/TextStream, doing a first readline, and then looping: Destination.WriteLine Source.ReadLine but the two files are located on drives and it is taking hours to run. Is anyone aware of a smarter way? (I guess the solution would be to directly delete the first row in a file and save the change...without loading it all to memory, but not sure of how to achieve that) Thanks in advance Charles |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
Hi
by huge I mean 200Meg+. The problem is that in top of that the file is stored on a network drive, so it's really slow. The solution I found to improve it is to copy it to the C drive first, process it, and then copy it back. That's substentially faster than processing it directly from the network drive. It doesn't take hours but it takes a good 4-5min, which feels like hours when you are waiting in front of your screen! Rick: I'll try your solution. Thanks! Charles |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
Actually, not sure it works. Your code requires to load the whole
200MB file into memory in one go. Which would be probably faster than having an access to the drive for each line read/written. But that's going to be a big ask for Excel. Actually can a single string contain 200MB of data? I actually have another question: I never really understood the meaning of the "#" character before variables, or the "$" sign after the name of the function. How different is it from simply using mid() or filenum? Charles |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
Yes, a single string can contain 200MB of data; the actual limit is
approximately 2 billion (2^31) characters. So, the only restriction on how well the code I posted will work is available RAM (I think a total of 400MB would be necessary for your file... 200MB for the string as it is read into the TotalFile variable and another 200MB for the temporary string I think gets created by the Mid$ function call). If there is not enough available RAM, the code should still work, but the system paging back out to the hard drive would be a performance killer. (All this is why I asked you how huge "huge" was.) The # signs are part of the syntax for addressing a channel number to a file (although in some of the locations I used it in, it may be optional). As for the $ sign, String functions have two forms... without the $ sign, they return variants with a subtype of String (variants are slow and memory wasters... not usually a concern outside of large loops); with the $ sign, they return a pure String data type (and hence, when used in further String manipulations, are as efficient as the function will get). Rick "Charles" wrote in message ... Actually, not sure it works. Your code requires to load the whole 200MB file into memory in one go. Which would be probably faster than having an access to the drive for each line read/written. But that's going to be a big ask for Excel. Actually can a single string contain 200MB of data? I actually have another question: I never really understood the meaning of the "#" character before variables, or the "$" sign after the name of the function. How different is it from simply using mid() or filenum? Charles |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
Interesting. That's the kind of post to archive in the section "useful
stuff to know". Thanks very much for your help! Charles |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
Just out of curiosity, did you try my code against your file? While your
file is **much** larger than what I'd consider an ideal size, I'd still be interested how it fairs against your existing methods (assuming you have a decent amount of RAM in your system to handle the file all in memory). Rick "Charles" wrote in message ... Interesting. That's the kind of post to archive in the section "useful stuff to know". Thanks very much for your help! Charles |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
the line Space(LOF(FileNum)) is generating an error. It is actually
the Space function that generates the error. LOF returns 190,504,020. It says it ran out of string space. also tried with String(, " ") function, same result. Don't know if this is a limit of the string function or if I reached maximum capacity of a string in VBA (or at least by VBA) Charles |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
I did a test using the following loop
Sub test2() Dim txt As String, i As Long For i = 1 To 190 Application.StatusBar = i txt = txt & Space(1000000) Next i MsgBox "Success" End Sub it breaks at the step 105, which seems to mean that VBA is running out of memory with a string larger than 104m characters. Excel total memory usage was around 350MB, so far below the 2GB XL limit, and my installed memory (about 2GB as well) So it looks like there is another memory limit for strings. Charles |
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Smart way to delete the first row of a huge file
Okay, I also get the loop quitting well below the maximum string size. Part
of that may be the tight loop (probably interfering with the memory "garbage collection" routine) and continual concatenations. Each concatenation will require a larger and larger chunk of contiguous memory which I guess becomes harder and harder to find within the available (unused, non-garbage collected) RAM. Anyway, another approach is to process the file in large "chunks" which I am guessing will still be faster than what you are now doing. Below my signature is a subroutine that handles the process this way. You can experiment with this a little bit. I set the chunk size at 20MB (via the ChunkSize constant assignment)... you can try upping that to, say, 40MB and see if your system will handle it (I don't have a 200MB file to test the code out on). Anyway, give it a try and let me know how it works out. Rick Sub RemoveFirstLine(PathAndFileName As String) Dim InFileNum As Long Dim OutFileNum As Long Dim ReadLocation As Long Dim LoopChunk As Long Dim BackSlash As Long Dim PartialFile As String Const ChunkSize As Long = 20000000 BackSlash = InStrRev(PathAndFileName, "\") ' The next 4 lines must be executed in the order shown InFileNum = FreeFile Open PathAndFileName For Binary As #InFileNum OutFileNum = FreeFile Open Left$(PathAndFileName, BackSlash) & "TEMP_" & _ Mid$(PathAndFileName, BackSlash + 1) For Append As #OutFileNum LoopChunk = 5000 PartialFile = Space$(LoopChunk) Get #InFileNum, , PartialFile Print #OutFileNum, Mid$(PartialFile, InStr(PartialFile, vbCrLf) + 2); Do While Not (EOF(InFileNum)) PartialFile = Space$(ChunkSize) Get #InFileNum, , PartialFile If FileLen(PathAndFileName) - Seek(InFileNum) <= 0 Then PartialFile = Left(PartialFile, InStr(PartialFile, Chr$(0)) - 1) End If Print #OutFileNum, PartialFile; Loop Close #InFileNum Close #OutFileNum Kill PathAndFileName Name Left$(PathAndFileName, BackSlash) & "TEMP_" & _ Mid$(PathAndFileName, BackSlash + 1) As PathAndFileName End Sub "Charles" wrote in message ... I did a test using the following loop Sub test2() Dim txt As String, i As Long For i = 1 To 190 Application.StatusBar = i txt = txt & Space(1000000) Next i MsgBox "Success" End Sub it breaks at the step 105, which seems to mean that VBA is running out of memory with a string larger than 104m characters. Excel total memory usage was around 350MB, so far below the 2GB XL limit, and my installed memory (about 2GB as well) So it looks like there is another memory limit for strings. Charles |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
huge huge excel file... why? | Excel Discussion (Misc queries) | |||
File size is huge | Excel Programming | |||
File got huge | Excel Discussion (Misc queries) | |||
The XLS file is huge! | Excel Programming | |||
The XLS file is huge! | Excel Programming |