Home |
Search |
Today's Posts |
|
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
HTML Document.Links Issues
Thanks to dicks-blog.com I've just recently managed to find out how to access
web pages and bring the data back to Excel (2000) VBA even when it's not in a table. The possibilities are very exciting. But I have run into two vexing problems. 1) The same code that retrieves the collection of links on a page -- essentially, Document.Links -- works with two homepages and fails on a third. The links on the third look the same to me -- static links with plain text anchors. 2) One of my tasks is to list out the links in a sort of tree structure. It would make sense to do this recursively, up to a user-specified number of levels deep. But I started with a non-recursive model, figuring it would be easier to debug. Moving to the second level, I load the page that the first link points to, and grab ITS link collection, storing that in a different variable. I process the page and come back to the first level. At that point, trying to access the links collection so I can process the next link gives me an Error 70, Permission Denied. This happens even if I reload the first-level page and then try to re-retrieve the collection. Since I've already stored the links collection in a variable, it shouldn't matter that I've left the page where I got it from, should it? And why can't I re-retrieve it even if I reload the page? Reloading is an abuse of bandwidth and would make the macro take much longer to run, so I really want to preserve the loaded links collection and just keep referring to it. TIA, Gregg Roberts |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
HTML Document.Links Issues
Gergg,
Hard to tell exactly what the problem is without seeing any code. Are you automating IE to do your work ? Tim "Gregg Roberts" wrote in message ... Thanks to dicks-blog.com I've just recently managed to find out how to access web pages and bring the data back to Excel (2000) VBA even when it's not in a table. The possibilities are very exciting. But I have run into two vexing problems. 1) The same code that retrieves the collection of links on a page -- essentially, Document.Links -- works with two homepages and fails on a third. The links on the third look the same to me -- static links with plain text anchors. 2) One of my tasks is to list out the links in a sort of tree structure. It would make sense to do this recursively, up to a user-specified number of levels deep. But I started with a non-recursive model, figuring it would be easier to debug. Moving to the second level, I load the page that the first link points to, and grab ITS link collection, storing that in a different variable. I process the page and come back to the first level. At that point, trying to access the links collection so I can process the next link gives me an Error 70, Permission Denied. This happens even if I reload the first-level page and then try to re-retrieve the collection. Since I've already stored the links collection in a variable, it shouldn't matter that I've left the page where I got it from, should it? And why can't I re-retrieve it even if I reload the page? Reloading is an abuse of bandwidth and would make the macro take much longer to run, so I really want to preserve the loaded links collection and just keep referring to it. TIA, Gregg Roberts |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
HTML Document.Links Issues
Hard to tell exactly what the problem is without seeing any code.
Are you automating IE to do your work ? Yes. I solved the first problem, which was occurring because it was a frames site. Here's the relevant code for the second problem: IeApp.Navigate LinkURL Do Loop Until IeApp.ReadyState = READYSTATE_COMPLETE Set IEDoc = IeApp.Document <snip Set LinkTags1 = IEDoc.links <snip CurRow = 3 CurCol = 1 MsgBox IsEmpty(LinkTags1) ' <-- This comes back False MsgBox LinkTags1.Length ' <-- Error 70 occurs here on the SECOND iteration... For LinkIndex1 = 0 To LinkTags1.Length - 1 ' <-- ...and here. CurRow = CurRow + 1 LinkURL = LinkTags1(LinkIndex1).href If LinkURL < "" Then ' An A tag with blank href is a bookmark, so ignore Range(Cells(CurRow, CurCol), Cells(CurRow, CurCol)).Value = _ LinkTags1(LinkIndex1).innerText Range(Cells(CurRow, CurCol + 1), Cells(CurRow, CurCol + 1)).Value = _ LinkURL ' if unique and internal, then get links at next level down If URL_Is_Unique(LinkURL) And URL_Is_Internal(LinkURL) Then IeApp.Navigate LinkURL Do Loop Until IeApp.ReadyState = READYSTATE_COMPLETE Set IEDoc = IeApp.Document Set LinkTags2 = IEDoc.links <snip ... process second-level page and return to top |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
HTML Document.Links Issues
I'm not certain you can still access the links collection if the page has
already been unloaded. You might have to copy the links into a collection, dictionary or array while the page is still loaded, and only then loop through them to get the sub-links. Tim "Gregg Roberts" wrote in message ... Hard to tell exactly what the problem is without seeing any code. Are you automating IE to do your work ? Yes. I solved the first problem, which was occurring because it was a frames site. Here's the relevant code for the second problem: IeApp.Navigate LinkURL Do Loop Until IeApp.ReadyState = READYSTATE_COMPLETE Set IEDoc = IeApp.Document <snip Set LinkTags1 = IEDoc.links <snip CurRow = 3 CurCol = 1 MsgBox IsEmpty(LinkTags1) ' <-- This comes back False MsgBox LinkTags1.Length ' <-- Error 70 occurs here on the SECOND iteration... For LinkIndex1 = 0 To LinkTags1.Length - 1 ' <-- ...and here. CurRow = CurRow + 1 LinkURL = LinkTags1(LinkIndex1).href If LinkURL < "" Then ' An A tag with blank href is a bookmark, so ignore Range(Cells(CurRow, CurCol), Cells(CurRow, CurCol)).Value = _ LinkTags1(LinkIndex1).innerText Range(Cells(CurRow, CurCol + 1), Cells(CurRow, CurCol + 1)).Value = _ LinkURL ' if unique and internal, then get links at next level down If URL_Is_Unique(LinkURL) And URL_Is_Internal(LinkURL) Then IeApp.Navigate LinkURL Do Loop Until IeApp.ReadyState = READYSTATE_COMPLETE Set IEDoc = IeApp.Document Set LinkTags2 = IEDoc.links <snip ... process second-level page and return to top |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
links issues | Excel Discussion (Misc queries) | |||
Opening Excel Document in ASP.NET and having issues...please help | Excel Programming | |||
Update links - hyperlink issues | Excel Discussion (Misc queries) | |||
Links in html-file | Excel Discussion (Misc queries) | |||
Get a value from a table in an HTML document | Excel Programming |