Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Hi to all,
i've a text file with more than 6.000.000 rows, i've to read the file, manipulate some rows and save the new file. I'm doing in this way but the elaboration time is about 4 minutes: -First I open the file for input and i count the Rows - I close the file and open another time reading each line and put it in an String array - While I read I manipulate the data - Open another file in Output mode and Write all the array in the file. Does exist an efficient mode to do what i would like to do? Thanks in advance Salmec |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
No need to open the file more than once! You could *dump* the file
contents into an array and work the array for the rows you need to manipulate, then *dump* the array back into a new file. If your manipulation adds/removes rows then you'll have to use a temp array (or string) for the output text. If the file has a header row in the 1st line then the number of lines = UBound(vaText). Otherwise, the number of lines is UBound(vaText)+1. I suggest vaText be dimmed a variant data type, not dimmed an array... Dim vaText As Variant vaText = Split(Read_TextFromFile(sFilename), vbCrLf) ...and when ready to put back... Write_TextToFile Join(vaText, vbCrLf), sNewFilename ...where Read_TextFromFile is a function that takes a filename and returns a string, and Write_TextToFile is a sub that takes the text and the filename. Depending on available resources, this more efficient approach should be quite fast. -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
GS wrote:
No need to open the file more than once! You could *dump* the file contents into an array and work the array for the rows you need to manipulate, then *dump* the array back into a new file. If your manipulation adds/removes rows then you'll have to use a temp array (or string) for the output text. If the file has a header row in the 1st line then the number of lines = UBound(vaText). Otherwise, the number of lines is UBound(vaText)+1. I suggest vaText be dimmed a variant data type, not dimmed an array... Dim vaText As Variant vaText = Split(Read_TextFromFile(sFilename), vbCrLf) ..and when ready to put back... Write_TextToFile Join(vaText, vbCrLf), sNewFilename ..where Read_TextFromFile is a function that takes a filename and returns a string, and Write_TextToFile is a sub that takes the text and the filename. Depending on available resources, this more efficient approach should be quite fast. Yes, "depending on available resources" is the operative phrase here. On my admittedly-older workstation (1GHz Athlon64, ~700MB RAM), Split() made Excel "run off into the bushes and not come back" (bonus points for anyone who recognizes the quote), with the added bonus that you can't break in the middle of a Split() operation. Depending on Salmec's specific situation, it might very well be worth it to just skip the "Split" step and work directly on the string itself. Perhaps you (or someone else) can improve upon my code. Salmec, I'm assuming this is part of what you needed help with in the other group (microsoft.public.office.developer.vba, if anyone else is interested) so I'll just roll them together. This makes the code somewhat more complex than before but that can't be helped. Public Sub Parsedxf() '***IMPORTANT!*** If there are leading or trailing spaces, 'put them inside the quotes or you won't get *any* matches. Const LOOKFOR As String = "HATCH" & vbCrLf 'Ditto this one (sample data had 1 leading space): Const DATAMATCH As String = vbCrLf & " 62" & vbCrLf LOOKFORLEN = Len(LOOKFOR) DATAMATCHLEN = Len(DATAMATCH) Dim vaText As Variant, stmp As String Dim sFileName As String Dim iFileNum As Integer Dim sBuf As String 'Don't know what the below are for; 'not going to delete them, but they serve no purpose here. 'Dim Fields As String 'Dim TempStr As String 'Dim strGenerator(0 To 3) As String 'Dim i As Integer 'i = 0 sFileName = "E:\Batch\parse.txt" 'Does the file exist? If Len(Dir$(sFileName)) Then iFileNum = FreeFile Open sFileName For Binary As iFileNum sBuf = Space$(LOF(iFileNum)) Get #iFileNum, 1, sBuf Close iFileNum Do 'This is where the actual work is done. Here there be tigers! where = InStr(where + 1, sBuf, LOOKFOR) If where Then vatmp1 = InStr(where + LOOKFORLEN, sBuf, vbCrLf) If vatmp1 Then stmp = Mid$(sBuf, where + LOOKFORLEN, _ vatmp1 - where - LOOKFORLEN) Select Case Trim$(stmp) Case "5" vatmp2 = InStr(vatmp1 + 2, sBuf, vbCrLf) If vatmp2 Then stmp = Mid$(sBuf, vatmp1 + 2, vatmp2 - vatmp1 - 2) If Trim$(stmp) = "A7123" Then vatmp1 = InStr(vatmp2, sBuf, DATAMATCH) If vatmp1 Then vatmp2 = InStr(vatmp1 + DATAMATCHLEN, sBuf, vbCrLf) 'Time to extract the data... If vatmp2 Then stmp = Mid$(sBuf, vatmp1 + DATAMATCHLEN, _ vatmp2 - vatmp1 - DATAMATCHLEN) Else stmp = Mid$(sBuf, vatmp1 + DATAMATCHLEN) End If ltmp = Len(stmp) 'Save this; important later. 'At this point stmp should have the data you want; 'manipulate it here. Minor example: stmp = stmp + 1 'Then put it back into the extracted data: Mid$(sBuf, vatmp1 + DATAMATCHLEN, ltmp) = _ Space$(ltmp - Len(stmp)) & stmp Else 'No more instances of 62? No need to keep looking. Exit Do End If End If End If End Select End If Else Exit Do End If Loop iFileNum = FreeFile Open "outputfile" For Binary As iFileNum Put #iFileNum, 1, sBuf Close iFileNum End If End Sub On my workstation (the above-mentioned Athlon64), this takes about 3 seconds to run, with your appropriate data being found every 36 lines. (This will be affected by how often "HATCH" is in the actual data, how often the rest of the data matches what you're looking for, and what kinds of data manipulation you actually do.) -- And you're surprised by this *why*? |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Yes, "depending on available resources" is the operative phrase here.
On my admittedly-older workstation (1GHz Athlon64, ~700MB RAM), Split() made Excel "run off into the bushes and not come back" (bonus points for anyone who recognizes the quote), with the added bonus that you can't break in the middle of a Split() operation." In cases of low memory resources it's better to manage large data in blocks. Fact is, the data is pulled from the file in a string. This can be done in one shot or in blocks. Same goes for writing back to file. In any case, the key element in this OP's scenario is the content of each line. IMO, this is better handled using an array and looking for matches in each element. That obviates having to find the next linebreak via additional InStr() processes. ============================== "On my workstation (the above-mentioned Athlon64), this takes about 3 seconds to run, with your appropriate data being found every 36 lines. (This will be affected by how often "HATCH" is in the actual data, how often the rest of the data matches what you're looking for, and what kinds of data manipulation you actually do.)" Well.., that's got to be a really old machine! Most systems run 1gb or more RAM for over a decade now and so I suspect any business using computers today has the resources available. My point was how much of the resources is being used by other processes. If PageFile comes into play then the process will take forever. The OP states there's 6m lines of text and so I'd recommend processing in blocks of 50k to 100k lines at a time *IF* the match text is in the string *before* dumping it into an array for processing. If it's not there then grab the next block and so on. You definitely don't want to loop the array if the match text isn't even in the string, right?<g -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Il giorno sabato 26 gennaio 2013 03:01:19 UTC+1, Auric__ ha scritto:
GS wrote: [CUT...] Thanks to all, Expecially to Auric__, the solution proposed work perfectly and it is very fast, some second for more than 6.000.000 Rows. Now i can edit a dxf file (CAD) directly from Excel-VBA. King Regards Salmec |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Salmec wrote:
Il giorno sabato 26 gennaio 2013 03:01:19 UTC+1, Auric__ ha scritto: GS wrote: [CUT...] Thanks to all, Expecially to Auric__, the solution proposed work perfectly and it is very fast, some second for more than 6.000.000 Rows. Now i can edit a dxf file (CAD) directly from Excel-VBA. My pleasure, but do pay attention to what Garry (GS) said; he has some valid points. -- C programmers do not use cute names like ThisVariableIsATemporaryCounter. A C programmer would call that variable "tmp", which is much easier to write, and not the least more difficult to understand. -- Linux kernel coding style doc |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
How do you save an excel file to be read as IBM-type text file ? | Excel Worksheet Functions | |||
Autoname a save as file from a cell in a read only file | Excel Discussion (Misc queries) | |||
cant save read only file | Excel Discussion (Misc queries) | |||
Trying to save a .XLA file that's Read Only | Excel Programming | |||
Read,Show,Update,Format & Save a text file. | Excel Programming |