Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I have a large CSV file that I need to read and filter into various
sheets based on some rules that test values in particular columns. There are more than 64k rows, so I can't load it into one sheet first and then split it up from there as this needs to be 2003-compatible. I'm currently reading it line by line in a macro, breaking up each line into its fields, applying the rules, and then writing each row to the appropriate sheet. This is quite slow, I think mostly it's the rules part of the code, but I was wondering if there was any way of parsing a single row of CSV other than doing it the hard way? I suppose I should try writing the entire line to a single cell and calling Text To Columns on it and then sending it off to the appropriate sheet. Maybe I could do, say, 10000 rows at a time this way, and maybe use AutoFilter to apply the rules. Any other suggestions? Phil Hibbs. |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Chip Pearson has a macro he
http://www.cpearson.com/excel/ImportBigFiles.aspx which allows you to import files with more than 64k records - it just splits them up into different sheets. This will be quicker than reading the file one line at a time. Perhaps you can sort the data in the imported sheets to reflect your rules (or apply autofilter), and then copy blocks of data into the sheets you need. Hope this helps. Pete On Nov 15, 11:11*am, Phil Hibbs wrote: I have a large CSV file that I need to read and filter into various sheets based on some rules that test values in particular columns. There are more than 64k rows, so I can't load it into one sheet first and then split it up from there as this needs to be 2003-compatible. I'm currently reading it line by line in a macro, breaking up each line into its fields, applying the rules, and then writing each row to the appropriate sheet. This is quite slow, I think mostly it's the rules part of the code, but I was wondering if there was any way of parsing a single row of CSV other than doing it the hard way? I suppose I should try writing the entire line to a single cell and calling Text To Columns on it and then sending it off to the appropriate sheet. Maybe I could do, say, 10000 rows at a time this way, and maybe use AutoFilter to apply the rules. Any other suggestions? Phil Hibbs. |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Also, you can import up to ~98000 lines using the Excel Text Import Wizard.
("import external data" on the Data menu) You do this by specifying the start line of the imported data. See... http://support.microsoft.com/kb/119770 -- Jim Cone Portland, Oregon USA http://www.mediafire.com/PrimitiveSoftware .. .. .. "Phil Hibbs" wrote in message ... I have a large CSV file that I need to read and filter into various sheets based on some rules that test values in particular columns. There are more than 64k rows, so I can't load it into one sheet first and then split it up from there as this needs to be 2003-compatible. I'm currently reading it line by line in a macro, breaking up each line into its fields, applying the rules, and then writing each row to the appropriate sheet. This is quite slow, I think mostly it's the rules part of the code, but I was wondering if there was any way of parsing a single row of CSV other than doing it the hard way? I suppose I should try writing the entire line to a single cell and calling Text To Columns on it and then sending it off to the appropriate sheet. Maybe I could do, say, 10000 rows at a time this way, and maybe use AutoFilter to apply the rules. Any other suggestions? Phil Hibbs. |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Nov 15, 1:36*pm, Pete_UK wrote:
Chip Pearson has a macro he http://www.cpearson.com/excel/ImportBigFiles.aspx Hope this helps. Pete That suffers from the same problem that my code does (sorry, should have mentioned that), which is that it does not respect quoted strings that contain the separator character. I need proper quoted CSV parsing. Phil Hibbs. |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Phil Hibbs formulated on Monday :
I have a large CSV file that I need to read and filter into various sheets based on some rules that test values in particular columns. There are more than 64k rows, so I can't load it into one sheet first and then split it up from there as this needs to be 2003-compatible. I'm currently reading it line by line in a macro, breaking up each line into its fields, applying the rules, and then writing each row to the appropriate sheet. This is quite slow, I think mostly it's the rules part of the code, but I was wondering if there was any way of parsing a single row of CSV other than doing it the hard way? I suppose I should try writing the entire line to a single cell and calling Text To Columns on it and then sending it off to the appropriate sheet. Maybe I could do, say, 10000 rows at a time this way, and maybe use AutoFilter to apply the rules. Any other suggestions? Phil Hibbs. If it's a properly constructed CSV then you could load it into an array of arrays, then loop the array for the desired value at its expected position in each element. Also, if it's a properly constructed CSV having the first line containing the field names, you could use ADO and filter the data via a WHERE clause and specify the field name and value to filter for. --If the first line doesn't contain the field names then simply prepend a line to the file contents and save it b efore running an ADO query. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
GS wrote:
Also, if it's a properly constructed CSV having the first line containing the field names, you could use ADO and filter the data via a WHERE clause and specify the field name and value to filter for. --If the first line doesn't contain the field names then simply prepend a line to the file contents and save it b efore running an ADO query. Unfortunately, the 4th line contains the column headings. That's one of the reasons I need to process it line-by-line in VBA and decode each line separately. I was kind of hoping there was some easy way to take a string and parse it as a quoted CSV (or tab-separated, in the case of another file I need to do something similar with). The SPLIT function is not good enough as the data contains commas within quoted values. Phil Hibbs. -- |
#7
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
It happens that Phil Hibbs formulated :
GS wrote: Also, if it's a properly constructed CSV having the first line containing the field names, you could use ADO and filter the data via a WHERE clause and specify the field name and value to filter for. --If the first line doesn't contain the field names then simply prepend a line to the file contents and save it b efore running an ADO query. Unfortunately, the 4th line contains the column headings. That's one of the reasons I need to process it line-by-line in VBA and decode each line separately. I was kind of hoping there was some easy way to take a string and parse it as a quoted CSV (or tab-separated, in the case of another file I need to do something similar with). The SPLIT function is not good enough as the data contains commas within quoted values. Phil Hibbs. This begs me to ask what the first 3 lines contain. Can these be eliminated so the file starts with the headings? -Perhaps a temp file that starts at line 4! If so, filtering by heading is a good way to capture records for dumping to a worksheet in one shot. That way it shouldn't matter what each line contains because the entire recordset will be only lines that meet your WHERE clause criteria. I'm thinking that you could dump the entire file into a variant using the Split function and specifying vbCrLf as the delimiter. Then empty the first 3 elements and Filter() the array to get rid of those. Then dump the array back into a temp file using Join and vbCrLf as the delimiter. Then run your ADO query. May seem like a lot of work but the coding is rather simple and straight forward. The process should be fairly fast, and (I suspect) much easier to manage than your current parsing process. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#8
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
GS wrote:
This begs me to ask what the first 3 lines contain. Can these be eliminated so the file starts with the headings? -Perhaps a temp file that starts at line 4! Client name, date and time of report, and a blank line. I guess I could write a temp file and then use the built-in facilities. I'm reluctant to do individual ADO queries, though, as I need to split the data out into 31 different sheets, and I don't want to have to read through a 70,000 line file 31 times. Phil. |
#9
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Phil Hibbs pretended :
GS wrote: This begs me to ask what the first 3 lines contain. Can these be eliminated so the file starts with the headings? -Perhaps a temp file that starts at line 4! Client name, date and time of report, and a blank line. I guess I could write a temp file and then use the built-in facilities. I'm reluctant to do individual ADO queries, though, as I need to split the data out into 31 different sheets, and I don't want to have to read through a 70,000 line file 31 times. Phil. Well, the Tab-delimited files would be easy to dump into an array of arrays. It's the Comma-delimited files that are going to be a problem if, as you say, some individual values also contain commas. The easiest way <IMO to handle these is with loading lines into ADO recordsets, which you'd only have to process if the recordset is not empty. As it stands now, you have to evaluate each one of those 70,000 lines to determine which sheet to put the data on, then redirect code to the appropriate place to do that. Making 31 recordsets sounds easier and quicker to me!<g Here's some code to restructure the CSVs: Sub RestructureCSVs(ByVal FileIn As String, _ ByVal LinesToRemove As Long, _ Optional FileOut As String = "tmpCSV.dat") ' Opens a CSV file for editing. ' (Used to remove auxilliary lines before the line containing headings) ' Removes a specified number of lines from the start of the file. ' Dumps the filtered array into a reusable temp file. ' Requires ReadTextFileContents(), WriteTextFileContents() Dim saLines() As String, i As Long 'Get all lines from the file saLines() = Split(ReadTextFileContents(FileIn), vbCrLf) 'To quickly delete unwanted/empty lines, load them with vbNullChar 'and use the Filter() function to delete them If LinesToRemove 0 Then For i = 0 To LinesToRemove - 1 saLines(i) = vbNullChar Next saLines() = Filter(saLines(), vbNullChar, False) 'Dump the contents into a temp file FileOut = ThisWorkbook.Path & "\" & FileOut WriteTextFileContents Join(saLines, vbCrLf), FileOut End If End Sub Function ReadTextFileContents(Filename As String) As String ' Reuseable proc to read large amounts of data from a text file Dim iNum As Integer, bFileIsOpen As Boolean On Error GoTo ErrHandler iNum = FreeFile() Open Filename For Input As #iNum bFileIsOpen = True '//if we got here the file opened successfully ReadTextFileContents = Space$(LOF(iNum)) '//size our return string 'Read the entire contents in one single step ReadTextFileContents = Input(LOF(iNum), iNum) ErrHandler: 'Close the file If bFileIsOpen Then Close #iNum If Err Then Err.Raise Err.Number, , Err.Description End Function 'ReadTextFileContents() Sub WriteTextFileContents(Text As String, _ Filename As String, _ Optional AppendMode As Boolean = False) ' Reuseable proc to write/append large amounts of data to a text file Dim iNum As Integer, bFileIsOpen As Boolean On Error GoTo ErrHandler iNum = FreeFile() If AppendMode Then Open Filename For Append As #iNum Else Open Filename For Output As #iNum End If bFileIsOpen = True '//if we got here the file opened successfully Print #iNum, Text '//print to the file in one single step ErrHandler: 'Close the file If bFileIsOpen Then Close #iNum If Err Then Err.Raise Err.Number, , Err.Description End Sub 'WriteTextFileContents() HTH -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#10
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
GS wrote:
Well, the Tab-delimited files would be easy to dump into an array of arrays. It's the Comma-delimited files that are going to be a problem if, as you say, some individual values also contain commas. You'd think so. However, it gets better, it turns out the tab- delimited file also contains tabs within quoted fields. Phil. -- |
#11
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Phil Hibbs used his keyboard to write :
GS wrote: Well, the Tab-delimited files would be easy to dump into an array of arrays. It's the Comma-delimited files that are going to be a problem if, as you say, some individual values also contain commas. You'd think so. However, it gets better, it turns out the tab- delimited file also contains tabs within quoted fields. Phil. Well, that tells you something about how the files are configured. Now if you can only get the source to do it right...!<g Again, it's looking even more like ADO is the way to go! -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#12
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
GS wrote:
Well, that tells you something about how the files are configured. Now if you can only get the source to do it right...!<g The data IS right. It genuinely contains tabs, in quoted values. Again, it's looking even more like ADO is the way to go! I just thought of another reason why I can't use ADO - one of the 31 sheets is an "otherwise" bucket, anything that doesn't match any of the other 30 sets of criteria gets written to the bucket sheet. That's going to be one complicated query. Phil. -- |
#13
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Phil Hibbs wrote on 11/16/2010 :
The data IS right. It genuinely contains tabs, in quoted values. The data may very well be 'right' as is. I was referring to the structuring of the CSV as a proper data store. This usually includes headings on the first line and strict use of unambiguous delimiters that would not be confused with system delimiters OR punctuation in the data values. For example, the Pipe "|" or Tilde "~" characters. I just thought of another reason why I can't use ADO - one of the 31 sheets is an "otherwise" bucket, anything that doesn't match any of the other 30 sets of criteria gets written to the bucket sheet. That's going to be one complicated query. Phil, I would loop the file for the 'bucket' records first so they're out of the way. Then I'd grab the other 30 sets with ADO and dump them on their respective sheets. Alternatively, you could do the 30 sets first, delete the records after the dump so all that remains are 'bucket' records (-those that didn't get grabbed by a query). Since they reside in a temp file deleting should not be a problem. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#14
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
If the CSV file is properly constructed then you could load it into
Notepad first and do a Find and Replace on it: Find "," Replace with "|" (the quotes are needed), so that you are defining a delimiter different than the commas which appear within your data. Save the file with an extension of .txt Then when you try to open that within Excel you will be taken to the Data Import Wizard, where you can specify that it is delimited and you can specify the delimiter | on the second panel. You can also specify the start row (up to 32k), so that with 2 passes you can effectively import 96k rows of data into two separate sheets, where you then need to get rid of the duplicates which appear on rows 32k to 64k. With it all in Excel, you could then apply your rules within a macro to split the data into your 30+ sheets. Hope this helps. Pete On Nov 16, 9:30*pm, GS wrote: Phil Hibbs wrote on 11/16/2010 : The data IS right. It genuinely contains tabs, in quoted values. The data may very well be 'right' as is. I was referring to the structuring of the CSV as a proper data store. This usually includes headings on the first line and strict use of unambiguous delimiters that would not be confused with system delimiters OR punctuation in the data values. For example, the Pipe "|" or Tilde "~" characters. I just thought of another reason why I can't use ADO - one of the 31 sheets is an "otherwise" bucket, anything that doesn't match any of the other 30 sets of criteria gets written to the bucket sheet. That's going to be one complicated query. Phil, I would loop the file for the 'bucket' records first so they're out of the way. Then I'd grab the other 30 sets with ADO and dump them on their respective sheets. Alternatively, you could do the 30 sets first, delete the records after the dump so all that remains are 'bucket' records (-those that didn't get grabbed by a query). Since they reside in a temp file deleting should not be a problem. -- Garry Free usenet access athttp://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#15
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
It happens that Pete_UK formulated :
If the CSV file is properly constructed... And here lies the challenge. then you could load it into Notepad first and do a Find and Replace on it: I suspect the OP wants to automate the entire process. Find "," Replace with "|" (the quotes are needed), so that you are defining a delimiter different than the commas which appear within your data. How does that resolve? For example: Lastname,Firstname,Street,Town_State_Zip,Phone Jones,Davey,123 Somestreet,Sometown, AnyState 1234,1234567890 ...where 'Sometown, AnyState 1234' is a single field of data. Albeit that's not 'proper' data file construction since state and zip should be singled out, but it's not uncommon to find. Save the file with an extension of .txt Then when you try to open that within Excel you will be taken to the Data Import Wizard, where you can specify that it is delimited and you can specify the delimiter | on the second panel. You can also specify the start row (up to 32k), so that with 2 passes you can effectively import 96k rows of data into two separate sheets, where you then need to get rid of the duplicates which appear on rows 32k to 64k. With it all in Excel, you could then apply your rules within a macro to split the data into your 30+ sheets. Hope this helps. Pete -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#16
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On 11/16/2010 9:26 AM, Phil Hibbs wrote:
GS wrote: Well, that tells you something about how the files are configured. Now if you can only get the source to do it right...!<g The data IS right. It genuinely contains tabs, in quoted values. Again, it's looking even more like ADO is the way to go! I just thought of another reason why I can't use ADO - one of the 31 sheets is an "otherwise" bucket, anything that doesn't match any of the other 30 sets of criteria gets written to the bucket sheet. That's going to be one complicated query. Phil. -- Did you ever resolve this problem? If not please post a line form the csv file that contains csv values and tabs and show how it is supposed to be parsed, (including something like [tab] so the newsreader doesn't try to render non-visible tabs). I was thinking that I'd parse the whole file into arrays, one for each sheet, then build the sheets when all of the file has been read. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
How to convert a dotted line to a solid line in a line graph | Charts and Charting in Excel | |||
Make a line in a bar chart, and change color of any bars that exceed the line | Excel Discussion (Misc queries) | |||
coloring overy other line without doing so line by line | Excel Worksheet Functions | |||
Macro problem on, Yellowed line - previous line or next line. | Excel Programming | |||
Reads entire *.txt file into string opposed to a desired line by line input. | Excel Programming |