![]() |
HTML files into Excel
Hi,
I have 1,000 html fairly short html files, which I want to extract into a spreadsheet. The html were originally generated from a template, but there are about 3 different types of template within the 1000 files. Is it possible to write a script that will extract all the data from within the html files, and put everything between the tags into a cell. I am not concerned about retaining the html tags, but need to extract all the info into excel, so i can manipulate, and regenerate the html....any hepl appreciated! Thanks |
HTML files into Excel
HTML files are text files. You would read them like any textt files and then
apply any filtering you need. Here is a genral format of VBA code that reads and writtes text data. You can see the HTML text by opening the HTML with Notepad or in Internet explorer sectect View Menu - Source. It is basically string manipulations using Left(),MID,and Right() functions. I have used VBA to modify XML files. "alfred" wrote: Hi, I have 1,000 html fairly short html files, which I want to extract into a spreadsheet. The html were originally generated from a template, but there are about 3 different types of template within the 1000 files. Is it possible to write a script that will extract all the data from within the html files, and put everything between the tags into a cell. I am not concerned about retaining the html tags, but need to extract all the info into excel, so i can manipulate, and regenerate the html....any hepl appreciated! Thanks |
HTML files into Excel
Have you tried just opening one of the html files in Excel using File=Open.
It should put the data in various cells. If you want to remove any formatting, you can select all the cells, do edit=copy, go to a new sheet and do Edit=Paste special and select values. If this works, then you could write a macro to do it on all the files and accumulate the results. -- Regards, Tom Ogilvy "alfred" wrote: Hi, I have 1,000 html fairly short html files, which I want to extract into a spreadsheet. The html were originally generated from a template, but there are about 3 different types of template within the 1000 files. Is it possible to write a script that will extract all the data from within the html files, and put everything between the tags into a cell. I am not concerned about retaining the html tags, but need to extract all the info into excel, so i can manipulate, and regenerate the html....any hepl appreciated! Thanks |
HTML files into Excel
Tom: With Hyper-text files some tagged items are formatting and some are
actual text. Alfred is looking for data which I believe is the text and not the formating. The task would be to identify each tagged field and only extractt the tags that pertain to the text strings. "Tom Ogilvy" wrote: Have you tried just opening one of the html files in Excel using File=Open. It should put the data in various cells. If you want to remove any formatting, you can select all the cells, do edit=copy, go to a new sheet and do Edit=Paste special and select values. If this works, then you could write a macro to do it on all the files and accumulate the results. -- Regards, Tom Ogilvy "alfred" wrote: Hi, I have 1,000 html fairly short html files, which I want to extract into a spreadsheet. The html were originally generated from a template, but there are about 3 different types of template within the 1000 files. Is it possible to write a script that will extract all the data from within the html files, and put everything between the tags into a cell. I am not concerned about retaining the html tags, but need to extract all the info into excel, so i can manipulate, and regenerate the html....any hepl appreciated! Thanks |
HTML files into Excel
If I agreed, I wouldn't have posted.
Opening the files in excel should do exactly what I feel the OP wants to do. Nonetheless, it is offered for his consideration to accept or reject. I suspect he is still waiting for your promise: Here is a genral format of VBA code that reads and writtes text data. I hope it wasn't the simple paragraph that followed. Based on that, I would have expected a generalized html parser that strips off all tags. (which is essentially what I offered). -- Regards, Tom Ogilvy "Joel" wrote: Tom: With Hyper-text files some tagged items are formatting and some are actual text. Alfred is looking for data which I believe is the text and not the formating. The task would be to identify each tagged field and only extractt the tags that pertain to the text strings. "Tom Ogilvy" wrote: Have you tried just opening one of the html files in Excel using File=Open. It should put the data in various cells. If you want to remove any formatting, you can select all the cells, do edit=copy, go to a new sheet and do Edit=Paste special and select values. If this works, then you could write a macro to do it on all the files and accumulate the results. -- Regards, Tom Ogilvy "alfred" wrote: Hi, I have 1,000 html fairly short html files, which I want to extract into a spreadsheet. The html were originally generated from a template, but there are about 3 different types of template within the 1000 files. Is it possible to write a script that will extract all the data from within the html files, and put everything between the tags into a cell. I am not concerned about retaining the html tags, but need to extract all the info into excel, so i can manipulate, and regenerate the html....any hepl appreciated! Thanks |
HTML files into Excel
forgot to paste sample code. Tom just notified me of my mistake.
Sub ConvertCSV() Const Sourcefile = "c:\temp\Origin.csv" Const Destfile = "c:\temp\Destination.csv" Const ForReading = 1 Const ForWriting = 2 Const ForAppending = 3 Set OriginCSV = _ CreateObject("Scripting.FileSystemObject") Set FOrigin = _ OriginCSV.GetFile(Sourcefile) Set FSOrigin = _ FOrigin.OpenAsTextStream _ (ForReading) Set DestinationCSV = _ CreateObject("Scripting.FileSystemObject") DestinationCSV.CreateTextFile Destfile Set DestinationCSV = DestinationCSV. _ GetFile(Destfile) Set FSDestination = DestinationCSV. _ OpenAsTextStream _ (ForWriting) Do While FSOrigin.ATENDOFSTREAM = False InputString = FSOrigin.readline Else 'Loop until no more characters in line First = True Do While Len(InputString) 0 'enter your code here Loop End If Loop FSOrigin.Close FSDestination.Close End Sub "alfred" wrote: Hi, I have 1,000 html fairly short html files, which I want to extract into a spreadsheet. The html were originally generated from a template, but there are about 3 different types of template within the 1000 files. Is it possible to write a script that will extract all the data from within the html files, and put everything between the tags into a cell. I am not concerned about retaining the html tags, but need to extract all the info into excel, so i can manipulate, and regenerate the html....any hepl appreciated! Thanks |
All times are GMT +1. The time now is 05:18 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com