Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Need to open HTML Document
Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to search. If selected they open in IE, if I right click and say open with, then they open as a text file. Is there a way to do this in my code? |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Need to open HTML Document
On May 15, 10:35 am, Mark wrote:
Hello, I have code in my program that can open HTML Files and search them for hrefs. There are also HTML Documents that I need to search. If selected they open in IE, if I right click and say open with, then they open as a text file. Is there a way to do this in my code? What do you mean by HTML *files* as opposed to HTML *documents*? Are the files local and the documents accessed over the web? Also, what do you mean by *selected*? I thought you said they were being accessed by code - as text files with FSO, I suppose. If so, I don't see how they could *open*. Please explain how this happens. I believe you posted some code yesterday, but since I didn't understand the distinction between files and documents, I didn't respond. Since you are asking the question again, I thought I'd chime in to get clarification. Then maybe I or someone else will be able to respond with something useful. Tom Lavedas =========== http://members.cox.net/tglbatch/wsh/ |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Need to open HTML Document
On May 15, 9:47*am, T Lavedas wrote:
On May 15, 10:35 am, Mark wrote: Hello, I have code in my program that can open HTML Files and search them for hrefs. *There are also HTML Documents that I need to search. *If selected they open in IE, if I right click and say open with, then they open as a text file. *Is there a way to do this in my code? What do you mean by HTML *files* as opposed to HTML *documents*? *Are the files local and the documents accessed over the web? *Also, what do you mean by *selected*? *I thought you said they were being accessed by code - as text files with FSO, I suppose. *If so, I don't see how they could *open*. *Please explain how this happens. I believe you posted some code yesterday, but since I didn't understand the distinction between files and documents, I didn't respond. *Since you are asking the question again, I thought I'd chime in to get clarification. *Then maybe I or someone else will be able to respond with something useful. Tom Lavedas ===========http://members.cox.net/tglbatch/wsh/ Hello Well basically, the thing was, there were two types of files if you double clicked, one that would open in IE, and one that would open in Notepad, but both files are HTML. The one that opens in IE is named HTML Document, and the one that opens in Notepad is HTML File. I went into folder options and changed it so that both of them open in Notepad to that my program can search the source code for hrefs. I did this because it was finding some hrefs on some files but not others. After researching a bit more I have found what the problem is. With the hrefs that are being found, in the source code they are on one line, between the <P</P tags. On the ones that it is not finding, there are multiple hrefs within the tags. The code that I have to search for the hrefs is below and I think I need to add more code to grab these others. Any ideas? Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex) Dim re, matches, match, d, uri, name, r As Long, c As Range, Lrow As Long Dim saveLink Dim iRet saveLink = False Set re = CreateObject("vbscript.regexp") re.Pattern = "<a\s+.*?href=[""\']?([^""\' ]*)[""\']?[^]*(.*?)<\/ a" re.IgnoreCase = True re.MultiLine = True re.Global = True Set matches = re.Execute(html) For Each match In matches iRet = InspectLink(GetURLAddress(match)) If (iRet 0) Then Cells(Globalindx, 2) = strFilex Cells(Globalindx, 3) = strTitlex Cells(Globalindx, 4) = GetURLTitle(match) Cells(Globalindx, 5) = GetURLAddress(match) Cells(Globalindx, 6) = GetType(iRet) Globalindx = Globalindx + 1 End If Next Set matches = Nothing Set re = Nothing End Function |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Need to open HTML Document
On May 15, 11:20 am, Mark wrote:
On May 15, 9:47 am, T Lavedas wrote: On May 15, 10:35 am, Mark wrote: Hello, I have code in my program that can open HTML Files and search them for hrefs. There are also HTML Documents that I need to search. If selected they open in IE, if I right click and say open with, then they open as a text file. Is there a way to do this in my code? What do you mean by HTML *files* as opposed to HTML *documents*? Are the files local and the documents accessed over the web? Also, what do you mean by *selected*? I thought you said they were being accessed by code - as text files with FSO, I suppose. If so, I don't see how they could *open*. Please explain how this happens. I believe you posted some code yesterday, but since I didn't understand the distinction between files and documents, I didn't respond. Since you are asking the question again, I thought I'd chime in to get clarification. Then maybe I or someone else will be able to respond with something useful. Tom Lavedas ===========http://members.cox.net/tglbatch/wsh/ Hello Well basically, the thing was, there were two types of files if you double clicked, one that would open in IE, and one that would open in Notepad, but both files are HTML. The one that opens in IE is named HTML Document, and the one that opens in Notepad is HTML File. I went into folder options and changed it so that both of them open in Notepad to that my program can search the source code for hrefs. I did this because it was finding some hrefs on some files but not others. After researching a bit more I have found what the problem is. With the hrefs that are being found, in the source code they are on one line, between the <P</P tags. On the ones that it is not finding, there are multiple hrefs within the tags. The code that I have to search for the hrefs is below and I think I need to add more code to grab these others. Any ideas? Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex) {code snipped} Yes, I would enlist IE to do all of the searching and parsing, something like this ... Function ListHref(sHTMLText) Dim s with CreateObject("htmlfile") .write(sText) .close for each sTagType in Array("A", "Base", "Link", "Area") set tags = .parentWindow.document.body.all.tags(sTagType) for each tag in tags s = s & tag.href & vbnewline next ' tag next ' tagtype ListHref = Split(s, vbnewline) ' returns an array end with End Function Just pass it the test read from your HTML file and it returns an array of all the HREFs, on reference per element. You can use a For Each loop on the array to place the results into your spreadsheet. Tom Lavedas =========== http://members.cox.net/tglbatch/wsh/ |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Need to open HTML Document
On May 15, 12:30 pm, T Lavedas wrote:
On May 15, 11:20 am, Mark wrote: On May 15, 9:47 am, T Lavedas wrote: On May 15, 10:35 am, Mark wrote: Hello, I have code in my program that can open HTML Files and search them for hrefs. There are also HTML Documents that I need to search. If selected they open in IE, if I right click and say open with, then they open as a text file. Is there a way to do this in my code? What do you mean by HTML *files* as opposed to HTML *documents*? Are the files local and the documents accessed over the web? Also, what do you mean by *selected*? I thought you said they were being accessed by code - as text files with FSO, I suppose. If so, I don't see how they could *open*. Please explain how this happens. I believe you posted some code yesterday, but since I didn't understand the distinction between files and documents, I didn't respond. Since you are asking the question again, I thought I'd chime in to get clarification. Then maybe I or someone else will be able to respond with something useful. Tom Lavedas ===========http://members.cox.net/tglbatch/wsh/ Hello Well basically, the thing was, there were two types of files if you double clicked, one that would open in IE, and one that would open in Notepad, but both files are HTML. The one that opens in IE is named HTML Document, and the one that opens in Notepad is HTML File. I went into folder options and changed it so that both of them open in Notepad to that my program can search the source code for hrefs. I did this because it was finding some hrefs on some files but not others. After researching a bit more I have found what the problem is. With the hrefs that are being found, in the source code they are on one line, between the <P</P tags. On the ones that it is not finding, there are multiple hrefs within the tags. The code that I have to search for the hrefs is below and I think I need to add more code to grab these others. Any ideas? Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex) {code snipped} Yes, I would enlist IE to do all of the searching and parsing, something like this ... Function ListHref(sHTMLText) Dim s with CreateObject("htmlfile") .write(sText) .close for each sTagType in Array("A", "Base", "Link", "Area") set tags = .parentWindow.document.body.all.tags(sTagType) for each tag in tags s = s & tag.href & vbnewline next ' tag next ' tagtype ListHref = Split(s, vbnewline) ' returns an array end with End Function Just pass it the test read from your HTML file and it returns an array of all the HREFs, on reference per element. You can use a For Each loop on the array to place the results into your spreadsheet. Tom Lavedas ===========http://members.cox.net/tglbatch/wsh/ Oops, theres an error in the posted code. This line ... .write(sText) should read ... .write(sHTMLText) Also, there are two typos in the last paragraph. It should have read ... "Just pass it the text read from your HTML file and it returns an array of all the HREFs, one reference per element. You can use a For Each loop on the array to place the results into your spreadsheet." Tom Lavedas |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
open HTML document as text file | Excel Programming | |||
open HTML document as text file | Excel Programming | |||
HTML Document.Links Issues | Excel Programming | |||
Get a value from a table in an HTML document | Excel Programming | |||
How to open a Word or HTML document using VBA code in Excel | Excel Programming |