ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   HTML files into Excel (https://www.excelbanter.com/excel-programming/385412-html-files-into-excel.html)

alfred

HTML files into Excel
 
Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks


joel

HTML files into Excel
 
HTML files are text files. You would read them like any textt files and then
apply any filtering you need. Here is a genral format of VBA code that reads
and writtes text data.

You can see the HTML text by opening the HTML with Notepad or in Internet
explorer sectect View Menu - Source. It is basically string manipulations
using Left(),MID,and Right() functions. I have used VBA to modify XML files.

"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks



Tom Ogilvy

HTML files into Excel
 
Have you tried just opening one of the html files in Excel using File=Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=copy, go to a new sheet and do Edit=Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.

--
Regards,
Tom Ogilvy


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks



joel

HTML files into Excel
 
Tom: With Hyper-text files some tagged items are formatting and some are
actual text. Alfred is looking for data which I believe is the text and not
the formating. The task would be to identify each tagged field and only
extractt the tags that pertain to the text strings.

"Tom Ogilvy" wrote:

Have you tried just opening one of the html files in Excel using File=Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=copy, go to a new sheet and do Edit=Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.

--
Regards,
Tom Ogilvy


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks



Tom Ogilvy

HTML files into Excel
 
If I agreed, I wouldn't have posted.

Opening the files in excel should do exactly what I feel the OP wants to do.
Nonetheless, it is offered for his consideration to accept or reject.

I suspect he is still waiting for your promise:
Here is a genral format of VBA code that reads and writtes text data.


I hope it wasn't the simple paragraph that followed. Based on that, I
would have expected a generalized html parser that strips off all tags.
(which is essentially what I offered).


--
Regards,
Tom Ogilvy


"Joel" wrote:

Tom: With Hyper-text files some tagged items are formatting and some are
actual text. Alfred is looking for data which I believe is the text and not
the formating. The task would be to identify each tagged field and only
extractt the tags that pertain to the text strings.

"Tom Ogilvy" wrote:

Have you tried just opening one of the html files in Excel using File=Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=copy, go to a new sheet and do Edit=Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.

--
Regards,
Tom Ogilvy


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks



joel

HTML files into Excel
 
forgot to paste sample code. Tom just notified me of my mistake.

Sub ConvertCSV()

Const Sourcefile = "c:\temp\Origin.csv"
Const Destfile = "c:\temp\Destination.csv"
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 3

Set OriginCSV = _
CreateObject("Scripting.FileSystemObject")
Set FOrigin = _
OriginCSV.GetFile(Sourcefile)
Set FSOrigin = _
FOrigin.OpenAsTextStream _
(ForReading)


Set DestinationCSV = _
CreateObject("Scripting.FileSystemObject")
DestinationCSV.CreateTextFile Destfile
Set DestinationCSV = DestinationCSV. _
GetFile(Destfile)
Set FSDestination = DestinationCSV. _
OpenAsTextStream _
(ForWriting)



Do While FSOrigin.ATENDOFSTREAM = False


InputString = FSOrigin.readline


Else
'Loop until no more characters in line
First = True
Do While Len(InputString) 0

'enter your code here

Loop

End If
Loop

FSOrigin.Close
FSDestination.Close

End Sub


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks




All times are GMT +1. The time now is 05:18 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com