Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 35
Default Ever wondered how to work with large text imports 256 columns?

Since I found a rather nice solution to a problem I had, I wanted to
share it - as is:

I needed to import a rather large (ca. 50,000 rows) text file, which had
a good bit over 256 columns, but only needed a few of those columns in
an Excel sheet, then export most of the rest to a text file again, but
without *some* columns.

To "trick" Excel into doing the file reading for me, I used
Workbooks.OpenText but instead of giving it the true column separator
(tab), I told the function Tab:=False and Semicolon:=True (using a
separator that did not occur at all in my data).

That way, Excel read the whole file at once, into column 1 of a new
workbook. 1-liner instead of bothering to do all the file reading
yourself :)

Then, for the columns I needed in an Excel Worksheet, I wrote a little
VBA function to get those columns from a string that contains a whole
tab-separated (or other separator) row, as stored in column 1 now.

I then looped over the number of rows and assigned the return value of
my function to the cell content in column 1 of my (new) target
worksheet. Now you can use the Excel Range.TextToColumns function on
that sheet and you got the first part of the job done.

For the data I wanted to re-export, I wrote another function that saves
a text file *directly* from the imported worksheets column 1, by looping
over each row, re-assembling the line without the columns I wanted to
remove, and then saving to a text file.

Done. I processed 266 columns, 49195 rows, extracting the first 11
columns to an Excel Sheet, re-exporting all but 1 column to a text file,
on a 2.21 Dual Core System with 2GB RAM and WinXP SP2 within about 16
seconds, that includes opening the original text file (28.9MB) and
saving both the new Workbook and the re-export of the text without an
unwanted column.

For the code used in my case, see below if interested.

HTH someone!

Lars


VBA Code used (might need some adaptation for your needs):

' Function: getFirstXColumns
' Purpose: return the first colNumber tab-separated values from _
' strLine, including the trailing tabs
' Example Usage (returns "abc" & vbTab & "def" & vbTab):
' columnValues = getFirstXColumns ("abc" & vbTab & "def" & vbTab _
' & "ghi" & vbTab, 2)
Public Function getFirstXColumns(strLine As String, _
colNumber As Integer) As String
Dim col As Integer
Dim pos As Long

pos = 0
For col = 1 To colNumber
pos = InStr(pos + 1, strLine, vbTab)
If (pos = 0) Then
getFirstXColumns = strLine
Exit Function
End If
Next col

getFirstXColumns = Left(strLine, pos)
End Function

' Function: saveTextWithoutColumn
' Purpose: consider column 1 of ws a tab-separated table, determine _
' the position of the column indicated by colName and save the _
' table to the indicated file without that column
' Example Usage:
' saveTextWithoutColumn ws, "myfilename.txt", "unwanted column name"
Private Sub saveTextWithoutColumn(ws As Worksheet, file As String, _
colName As String)
Dim buffer As String
Dim col As Long, colNumber As Long
Dim pos As Long, endpos As Long
Dim lastRow As Long, row As Long
Dim fileHandle As Integer
Dim skipColStart As Long, skipColEnd As Long

buffer = ws.Cells(1, 1).value

pos = 1
endpos = 1
col = 0
Do While (endpos 0) ' while a new col separator was found (vbTab)
endpos = InStr(pos, buffer, vbTab)
If (endpos 0) Then
col = col + 1
If (Mid(buffer, pos, endpos - pos) = colName) Then
Exit Do
End If
Else
col = 0
End If
pos = endpos + 1
Loop
colNumber = col

' Determine the highest used row number in the imported data
lastRow = ws.Cells.Find("*", SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious).row

fileHandle = FreeFile()

On Error Resume Next
Kill file
On Error GoTo 0

Open file For Binary Access Write As #fileHandle

' if-statement wrapped around whole loop in case of non-existant
' column for speed optimisation
If colNumber = 0 Then
For row = 1 To lastRow
' attach line break to line and write to file
buffer = ws.Cells(row, 1).value & lineBreak
Put #fileHandle, , buffer
Next row
Else
For row = 1 To lastRow
buffer = ws.Cells(row, 1).value ' get current line

' determine start and end positions of column to remove
pos = 0
For col = 1 To colNumber - 1
pos = InStr(pos + 1, buffer, vbTab)
If (pos = 0) Then Exit For
Next col
skipColStart = pos + 1
skipColEnd = InStr(skipColStart, buffer, vbTab)

' assemble & write line to file without column to be skipped
buffer = Left(buffer, skipColStart - 1) _
& Mid(buffer, skipColEnd + 1) & lineBreak
Put #fileHandle, , buffer
Next row
End If

Close #fileHandle
End Sub


Sub handleData(path As String, filename As String)
Dim wb As Workbook
Dim ws As Worksheet
Dim wsNew As Worksheet
Dim lastRow As Long
Dim row As Long

Workbooks.OpenText path & filename & ".txt", Origin:=xlMSDOS, _
DataType:=xlDelimited, Tab:=False, Semicolon:=True, _
DecimalSeparator:=".", ThousandsSeparator:=","

Set wb = Workbooks(filename & ".txt")
Set ws = wb.Worksheets(filename)

' Determine the highest used row number in the imported data
lastRow = ws.Cells.Find("*", SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious).row

Set wsNew = wb.Worksheets.Add()
wsNew.Name = "my data"

For row = 1 To lastRow
' first 11 columns wanted in my case - might differ for you
wsNew.Cells(row, 1).value = getFirstXColumns( _
ws.Cells(row, 1).value, 11)
Next row

wsNew.Range(wsNew.Cells(1, 1), _
wsNew.Cells(lastRow, 1)).TextToColumns DataType:=xlDelimited, _
Tab:=True, DecimalSeparator:=".", ThousandsSeparator:=","

saveTextWithoutColumn ws, "myfilename.txt", "unwanted column name"

Application.DisplayAlerts = False
ws.Delete
Set ws = Nothing

wb.SaveAs path & wsNew.Name & ".xls", xlNormal
Application.DisplayAlerts = True
wb.Close

Set wsNew = Nothing
Set wb = Nothing
End Sub
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Imports Excel = Microsoft.Office.Interop.Excel doesn't work Uncle Martin Excel Programming 3 July 29th 07 01:44 AM
need way to handle large csv imports that overflow cell size remay Excel Discussion (Misc queries) 3 March 6th 07 02:04 PM
repetitive text imports Knox Excel Programming 2 June 21st 06 09:42 PM
ODBC imports numbers as text LarryPH Excel Programming 3 February 9th 05 11:47 AM
Excel,Auto fit - does work for large text fields ancastle Excel Discussion (Misc queries) 4 January 6th 05 12:15 AM


All times are GMT +1. The time now is 05:29 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"