Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default HTML files into Excel

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks

  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default HTML files into Excel

HTML files are text files. You would read them like any textt files and then
apply any filtering you need. Here is a genral format of VBA code that reads
and writtes text data.

You can see the HTML text by opening the HTML with Notepad or in Internet
explorer sectect View Menu - Source. It is basically string manipulations
using Left(),MID,and Right() functions. I have used VBA to modify XML files.

"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks


  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 6,953
Default HTML files into Excel

Have you tried just opening one of the html files in Excel using File=Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=copy, go to a new sheet and do Edit=Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.

--
Regards,
Tom Ogilvy


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks


  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default HTML files into Excel

Tom: With Hyper-text files some tagged items are formatting and some are
actual text. Alfred is looking for data which I believe is the text and not
the formating. The task would be to identify each tagged field and only
extractt the tags that pertain to the text strings.

"Tom Ogilvy" wrote:

Have you tried just opening one of the html files in Excel using File=Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=copy, go to a new sheet and do Edit=Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.

--
Regards,
Tom Ogilvy


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks


  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 6,953
Default HTML files into Excel

If I agreed, I wouldn't have posted.

Opening the files in excel should do exactly what I feel the OP wants to do.
Nonetheless, it is offered for his consideration to accept or reject.

I suspect he is still waiting for your promise:
Here is a genral format of VBA code that reads and writtes text data.


I hope it wasn't the simple paragraph that followed. Based on that, I
would have expected a generalized html parser that strips off all tags.
(which is essentially what I offered).


--
Regards,
Tom Ogilvy


"Joel" wrote:

Tom: With Hyper-text files some tagged items are formatting and some are
actual text. Alfred is looking for data which I believe is the text and not
the formating. The task would be to identify each tagged field and only
extractt the tags that pertain to the text strings.

"Tom Ogilvy" wrote:

Have you tried just opening one of the html files in Excel using File=Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=copy, go to a new sheet and do Edit=Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.

--
Regards,
Tom Ogilvy


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks




  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default HTML files into Excel

forgot to paste sample code. Tom just notified me of my mistake.

Sub ConvertCSV()

Const Sourcefile = "c:\temp\Origin.csv"
Const Destfile = "c:\temp\Destination.csv"
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 3

Set OriginCSV = _
CreateObject("Scripting.FileSystemObject")
Set FOrigin = _
OriginCSV.GetFile(Sourcefile)
Set FSOrigin = _
FOrigin.OpenAsTextStream _
(ForReading)


Set DestinationCSV = _
CreateObject("Scripting.FileSystemObject")
DestinationCSV.CreateTextFile Destfile
Set DestinationCSV = DestinationCSV. _
GetFile(Destfile)
Set FSDestination = DestinationCSV. _
OpenAsTextStream _
(ForWriting)



Do While FSOrigin.ATENDOFSTREAM = False


InputString = FSOrigin.readline


Else
'Loop until no more characters in line
First = True
Do While Len(InputString) 0

'enter your code here

Loop

End If
Loop

FSOrigin.Close
FSDestination.Close

End Sub


"alfred" wrote:

Hi,

I have 1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Links to html files in Excel? cornets Excel Discussion (Misc queries) 3 March 24th 07 05:42 PM
Combining multiple Excel files into one html? rjamison Excel Programming 0 June 14th 05 12:14 AM
Combining multiple Excel files into one html? rjamison Excel Programming 0 June 14th 05 12:14 AM
Combining multiple Excel files into one html? quartz[_2_] Excel Programming 9 April 21st 05 02:10 AM
Automatic Downloading of files (PDF or HTML) using Excel Hari Prasadh[_2_] Excel Programming 4 April 13th 05 06:32 AM


All times are GMT +1. The time now is 11:19 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"