ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Need to open HTML Document (https://www.excelbanter.com/excel-programming/410983-need-open-html-document.html)

Mark[_66_]

Need to open HTML Document
 
Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?

T Lavedas

Need to open HTML Document
 
On May 15, 10:35 am, Mark wrote:
Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? Are
the files local and the documents accessed over the web? Also, what
do you mean by *selected*? I thought you said they were being
accessed by code - as text files with FSO, I suppose. If so, I don't
see how they could *open*. Please explain how this happens.

I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. Since you are asking the question again, I thought I'd chime
in to get clarification. Then maybe I or someone else will be able to
respond with something useful.

Tom Lavedas
===========
http://members.cox.net/tglbatch/wsh/

Mark[_66_]

Need to open HTML Document
 
On May 15, 9:47*am, T Lavedas wrote:
On May 15, 10:35 am, Mark wrote:

Hello, I have code in my program that can open HTML Files and
search them for hrefs. *There are also HTML Documents that I need to
search. *If selected they open in IE, if I right click and say open
with, then they open as a text file. *Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? *Are
the files local and the documents accessed over the web? *Also, what
do you mean by *selected*? *I thought you said they were being
accessed by code - as text files with FSO, I suppose. *If so, I don't
see how they could *open*. *Please explain how this happens.

I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. *Since you are asking the question again, I thought I'd chime
in to get clarification. *Then maybe I or someone else will be able to
respond with something useful.

Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Hello

Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P</P tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?

Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)
Dim re, matches, match, d, uri, name, r As Long, c As Range, Lrow
As Long
Dim saveLink
Dim iRet
saveLink = False

Set re = CreateObject("vbscript.regexp")
re.Pattern = "<a\s+.*?href=[""\']?([^""\' ]*)[""\']?[^]*(.*?)<\/
a"
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
Set matches = re.Execute(html)
For Each match In matches
iRet = InspectLink(GetURLAddress(match))
If (iRet 0) Then
Cells(Globalindx, 2) = strFilex
Cells(Globalindx, 3) = strTitlex
Cells(Globalindx, 4) = GetURLTitle(match)
Cells(Globalindx, 5) = GetURLAddress(match)
Cells(Globalindx, 6) = GetType(iRet)

Globalindx = Globalindx + 1
End If


Next
Set matches = Nothing
Set re = Nothing


End Function

T Lavedas

Need to open HTML Document
 
On May 15, 11:20 am, Mark wrote:
On May 15, 9:47 am, T Lavedas wrote:



On May 15, 10:35 am, Mark wrote:


Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? Are
the files local and the documents accessed over the web? Also, what
do you mean by *selected*? I thought you said they were being
accessed by code - as text files with FSO, I suppose. If so, I don't
see how they could *open*. Please explain how this happens.


I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. Since you are asking the question again, I thought I'd chime
in to get clarification. Then maybe I or someone else will be able to
respond with something useful.


Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Hello

Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P</P tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?

Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)

{code snipped}

Yes, I would enlist IE to do all of the searching and parsing,
something like this ...

Function ListHref(sHTMLText)
Dim s
with CreateObject("htmlfile")
.write(sText)
.close
for each sTagType in Array("A", "Base", "Link", "Area")
set tags = .parentWindow.document.body.all.tags(sTagType)
for each tag in tags
s = s & tag.href & vbnewline
next ' tag
next ' tagtype
ListHref = Split(s, vbnewline) ' returns an array
end with
End Function

Just pass it the test read from your HTML file and it returns an array
of all the HREFs, on reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet.

Tom Lavedas
===========
http://members.cox.net/tglbatch/wsh/

T Lavedas

Need to open HTML Document
 
On May 15, 12:30 pm, T Lavedas wrote:
On May 15, 11:20 am, Mark wrote:

On May 15, 9:47 am, T Lavedas wrote:


On May 15, 10:35 am, Mark wrote:


Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?


What do you mean by HTML *files* as opposed to HTML *documents*? Are
the files local and the documents accessed over the web? Also, what
do you mean by *selected*? I thought you said they were being
accessed by code - as text files with FSO, I suppose. If so, I don't
see how they could *open*. Please explain how this happens.


I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. Since you are asking the question again, I thought I'd chime
in to get clarification. Then maybe I or someone else will be able to
respond with something useful.


Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Hello


Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P</P tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?


Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)


{code snipped}

Yes, I would enlist IE to do all of the searching and parsing,
something like this ...

Function ListHref(sHTMLText)
Dim s
with CreateObject("htmlfile")
.write(sText)
.close
for each sTagType in Array("A", "Base", "Link", "Area")
set tags = .parentWindow.document.body.all.tags(sTagType)
for each tag in tags
s = s & tag.href & vbnewline
next ' tag
next ' tagtype
ListHref = Split(s, vbnewline) ' returns an array
end with
End Function

Just pass it the test read from your HTML file and it returns an array
of all the HREFs, on reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet.

Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/


Oops, theres an error in the posted code. This line ...

.write(sText)

should read ...

.write(sHTMLText)

Also, there are two typos in the last paragraph. It should have
read ...

"Just pass it the text read from your HTML file and it returns an
array
of all the HREFs, one reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet."

Tom Lavedas


All times are GMT +1. The time now is 09:27 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
ExcelBanter.com