ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Filtering A CSV file during the import (https://www.excelbanter.com/excel-programming/417347-filtering-csv-file-during-import.html)

Steve White

Filtering A CSV file during the import
 
I have a situation where a .CSV files(approx 380,000 rows) containes a lot of
data that I do not need. being rather new to scripting I am unsure how to
get this loaded properly. I has found ways to split the file and to have it
import into multiple sheets where I may then manually scrub to the data I
need. I was wondering if it is possible to select only the rows in the CSV
file that I need on the way in thereby saving me a lot of time by not having
to manually scrub the 300,000 rows I do not need.

I do not have access available to me for any of this and I only have Excel
2003(hence the splitting and mutiple worksheets).

Is this possible at all?

Thank,

Steve

Mike

Filtering A CSV file during the import
 
Answer to your question is yes. What determins the data that you dont need ?
What column is it in ? Maybe post some or all of your code so I can or
someone can may be able to help you.


"Steve White" wrote:

I have a situation where a .CSV files(approx 380,000 rows) containes a lot of
data that I do not need. being rather new to scripting I am unsure how to
get this loaded properly. I has found ways to split the file and to have it
import into multiple sheets where I may then manually scrub to the data I
need. I was wondering if it is possible to select only the rows in the CSV
file that I need on the way in thereby saving me a lot of time by not having
to manually scrub the 300,000 rows I do not need.

I do not have access available to me for any of this and I only have Excel
2003(hence the splitting and mutiple worksheets).

Is this possible at all?

Thank,

Steve


Steve White[_2_]

Filtering A CSV file during the import
 

Thanks for the reply.
I apologize for this long-ish post.

I cannot offer out any data (of course) and I do not know where to start
with the code. I guess I'm a total noob. The code I have is snippets
collected of the web trying to be able to load the file into a single
workbook(multiple sheets). There was the one on the MS site that drops all
of the text into one column then needs to be split out with the text to
columns tool - takes about 10 minutes per sheet. then there was a splitting
tool that breaks the CSV file into smaller chunks so you can directly import
each one into a worksheet. These get me started and I can do the rest
manually, but that takes about an hour or so to get the whole thing done. I
guess I do not know where to start to get the process automated. Other
searches on the net don't seem to pull anything up. Perhaps I am not looking
in the right place or for the right thing.

With that said, the data can be sorted by a particular column denoting the
location that I am at. This is the 15th column in the CSV data file and I
would like to load only rows that are positive for this one field. The
overall size of the source CSV file is in excess of 100MB so anything i do on
the entire files will take more than a few minutes. If I could get this
working, the data I need will fit in one sheet.

I'm handicapped in this process of learning what I need by some of the
policies in effect at my workplace - no paper (so no books) no data transfer
mechinisms (USB keys and such) so I cannot work the issue at home where I
have the resources.

I realize this makes it more difficult to help me and I am sorry. I'm not
sure what else to offer though.

Steve
(Appreciatng you time very much - thank you)



joel

Filtering A CSV file during the import
 
You can read any text file one line at a time by opening the file as text.
Then looking at any character on the line do determine if the data should or
shouldn't be put in the worksheet. I have done this a number of times
before. What are you filter requirements. The code below look for any line
starting with 100. After the data is put on the worksheet you can run Text
to columns to seperate out the CSV data.

there is a Split function which will put the seperated data into an array
which makes it easier to filer


Sub EditInput()
Const ReadFile = "c:\temp\event.txt"

Const ForReading = 1, ForWriting = 2, _
ForAppending = 3

Set fs = CreateObject("Scripting.FileSystemObject")
Set fin = fs.OpenTextFile(ReadFile, _
ForReading, TristateFalse)

RowCount = 1
Do While fin.AtEndOfStream < True
Readdata = fin.readline
If Readdata < Val(Readdata) = 100 Then
Range("A" & RowCount) = Readdata
RowCount = RowCount + 1
End If
Loop
fin.Close
End Sub


"Steve White" wrote:


Thanks for the reply.
I apologize for this long-ish post.

I cannot offer out any data (of course) and I do not know where to start
with the code. I guess I'm a total noob. The code I have is snippets
collected of the web trying to be able to load the file into a single
workbook(multiple sheets). There was the one on the MS site that drops all
of the text into one column then needs to be split out with the text to
columns tool - takes about 10 minutes per sheet. then there was a splitting
tool that breaks the CSV file into smaller chunks so you can directly import
each one into a worksheet. These get me started and I can do the rest
manually, but that takes about an hour or so to get the whole thing done. I
guess I do not know where to start to get the process automated. Other
searches on the net don't seem to pull anything up. Perhaps I am not looking
in the right place or for the right thing.

With that said, the data can be sorted by a particular column denoting the
location that I am at. This is the 15th column in the CSV data file and I
would like to load only rows that are positive for this one field. The
overall size of the source CSV file is in excess of 100MB so anything i do on
the entire files will take more than a few minutes. If I could get this
working, the data I need will fit in one sheet.

I'm handicapped in this process of learning what I need by some of the
policies in effect at my workplace - no paper (so no books) no data transfer
mechinisms (USB keys and such) so I cannot work the issue at home where I
have the resources.

I realize this makes it more difficult to help me and I am sorry. I'm not
sure what else to offer though.

Steve
(Appreciatng you time very much - thank you)



Steve White[_2_]

Filtering A CSV file during the import
 
Thank you.

I'll have to study that so that I can modify it for my purposes. Looks like
a great help in the right direction though.

Thank you all.

Steve

Steve White[_2_]

Filtering A CSV file during the import
 

OK. thank again for the help. I'm a scripting noobie and any help is
greatly appreciated.

My filtering requirements are to filter on a string. Data is very regular
and the filster string is "phoenix" (w/o quotes). You used the val()
function to return a numeric value to decide if the line is processed. Do I
follow your code correctly?

I would need to read the file line by line and only import the lines into
the worksheet that contain the filter string. The site name is always in the
same column. I've seen some mentions of the split() funtion, but I'm not
entirely sure how to use it. I'm still reading though.

Steve

Steve White[_2_]

Filtering A CSV file during the import
 
I have tried this script and it doesn't load anything into the worksheet. Am
I doing something wrong?

My modifications are limited, Here's the code...

Sub EditInput()
Const ReadFile = "U:\script
stuff\OSP_Closed0922.txt_Pieces\OSP_Closed0922_6.t xt"

Const ForReading = 1, ForWriting = 2, _
ForAppending = 3

Set fs = CreateObject("Scripting.FileSystemObject")
Set fin = fs.OpenTextFile(ReadFile, _
ForReading, TristateFalse)

RowCount = 1
Do While fin.AtEndOfStream < True
Readdata = fin.readline
If Readdata < Val(Readdata) = "phoenix" Then
Range("A" & RowCount) = Readdata
RowCount = RowCount + 1
End If
Loop
fin.Close
End Sub

it obviously reads through the file as it takes an appropriate amount of
time, it doesn't return add any rows to the worksheet though. What am I
missing?

TIA

Steve


All times are GMT +1. The time now is 12:32 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com