Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 24
Default Need to open HTML Document

Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 38
Default Need to open HTML Document

On May 15, 10:35 am, Mark wrote:
Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? Are
the files local and the documents accessed over the web? Also, what
do you mean by *selected*? I thought you said they were being
accessed by code - as text files with FSO, I suppose. If so, I don't
see how they could *open*. Please explain how this happens.

I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. Since you are asking the question again, I thought I'd chime
in to get clarification. Then maybe I or someone else will be able to
respond with something useful.

Tom Lavedas
===========
http://members.cox.net/tglbatch/wsh/
  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 24
Default Need to open HTML Document

On May 15, 9:47*am, T Lavedas wrote:
On May 15, 10:35 am, Mark wrote:

Hello, I have code in my program that can open HTML Files and
search them for hrefs. *There are also HTML Documents that I need to
search. *If selected they open in IE, if I right click and say open
with, then they open as a text file. *Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? *Are
the files local and the documents accessed over the web? *Also, what
do you mean by *selected*? *I thought you said they were being
accessed by code - as text files with FSO, I suppose. *If so, I don't
see how they could *open*. *Please explain how this happens.

I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. *Since you are asking the question again, I thought I'd chime
in to get clarification. *Then maybe I or someone else will be able to
respond with something useful.

Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Hello

Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P</P tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?

Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)
Dim re, matches, match, d, uri, name, r As Long, c As Range, Lrow
As Long
Dim saveLink
Dim iRet
saveLink = False

Set re = CreateObject("vbscript.regexp")
re.Pattern = "<a\s+.*?href=[""\']?([^""\' ]*)[""\']?[^]*(.*?)<\/
a"
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
Set matches = re.Execute(html)
For Each match In matches
iRet = InspectLink(GetURLAddress(match))
If (iRet 0) Then
Cells(Globalindx, 2) = strFilex
Cells(Globalindx, 3) = strTitlex
Cells(Globalindx, 4) = GetURLTitle(match)
Cells(Globalindx, 5) = GetURLAddress(match)
Cells(Globalindx, 6) = GetType(iRet)

Globalindx = Globalindx + 1
End If


Next
Set matches = Nothing
Set re = Nothing


End Function
  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 38
Default Need to open HTML Document

On May 15, 11:20 am, Mark wrote:
On May 15, 9:47 am, T Lavedas wrote:



On May 15, 10:35 am, Mark wrote:


Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? Are
the files local and the documents accessed over the web? Also, what
do you mean by *selected*? I thought you said they were being
accessed by code - as text files with FSO, I suppose. If so, I don't
see how they could *open*. Please explain how this happens.


I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. Since you are asking the question again, I thought I'd chime
in to get clarification. Then maybe I or someone else will be able to
respond with something useful.


Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Hello

Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P</P tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?

Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)

{code snipped}

Yes, I would enlist IE to do all of the searching and parsing,
something like this ...

Function ListHref(sHTMLText)
Dim s
with CreateObject("htmlfile")
.write(sText)
.close
for each sTagType in Array("A", "Base", "Link", "Area")
set tags = .parentWindow.document.body.all.tags(sTagType)
for each tag in tags
s = s & tag.href & vbnewline
next ' tag
next ' tagtype
ListHref = Split(s, vbnewline) ' returns an array
end with
End Function

Just pass it the test read from your HTML file and it returns an array
of all the HREFs, on reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet.

Tom Lavedas
===========
http://members.cox.net/tglbatch/wsh/
  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 38
Default Need to open HTML Document

On May 15, 12:30 pm, T Lavedas wrote:
On May 15, 11:20 am, Mark wrote:

On May 15, 9:47 am, T Lavedas wrote:


On May 15, 10:35 am, Mark wrote:


Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? Are
the files local and the documents accessed over the web? Also, what
do you mean by *selected*? I thought you said they were being
accessed by code - as text files with FSO, I suppose. If so, I don't
see how they could *open*. Please explain how this happens.


I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. Since you are asking the question again, I thought I'd chime
in to get clarification. Then maybe I or someone else will be able to
respond with something useful.


Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Hello


Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P</P tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?


Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)


{code snipped}

Yes, I would enlist IE to do all of the searching and parsing,
something like this ...

Function ListHref(sHTMLText)
Dim s
with CreateObject("htmlfile")
.write(sText)
.close
for each sTagType in Array("A", "Base", "Link", "Area")
set tags = .parentWindow.document.body.all.tags(sTagType)
for each tag in tags
s = s & tag.href & vbnewline
next ' tag
next ' tagtype
ListHref = Split(s, vbnewline) ' returns an array
end with
End Function

Just pass it the test read from your HTML file and it returns an array
of all the HREFs, on reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet.

Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Oops, theres an error in the posted code. This line ...

.write(sText)

should read ...

.write(sHTMLText)

Also, there are two typos in the last paragraph. It should have
read ...

"Just pass it the text read from your HTML file and it returns an
array
of all the HREFs, one reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet."

Tom Lavedas
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
open HTML document as text file Mark[_66_] Excel Programming 1 May 15th 08 04:22 PM
open HTML document as text file Mark[_66_] Excel Programming 0 May 14th 08 06:48 PM
HTML Document.Links Issues Gregg Roberts Excel Programming 6 February 7th 06 04:19 PM
Get a value from a table in an HTML document Terry V Excel Programming 6 October 5th 04 04:37 AM
How to open a Word or HTML document using VBA code in Excel TBA[_2_] Excel Programming 1 January 21st 04 03:19 AM


All times are GMT +1. The time now is 03:55 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"