Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 25
Default HTML Document.Links Issues

Thanks to dicks-blog.com I've just recently managed to find out how to access
web pages and bring the data back to Excel (2000) VBA even when it's not in a
table. The possibilities are very exciting. But I have run into two vexing
problems.

1) The same code that retrieves the collection of links on a page --
essentially, Document.Links -- works with two homepages and fails on a third.
The links on the third look the same to me -- static links with plain text
anchors.

2) One of my tasks is to list out the links in a sort of tree structure. It
would make sense to do this recursively, up to a user-specified number of
levels deep. But I started with a non-recursive model, figuring it would be
easier to debug. Moving to the second level, I load the page that the first
link points to, and grab ITS link collection, storing that in a different
variable. I process the page and come back to the first level. At that point,
trying to access the links collection so I can process the next link gives me
an Error 70, Permission Denied. This happens even if I reload the first-level
page and then try to re-retrieve the collection.

Since I've already stored the links collection in a variable, it shouldn't
matter that I've left the page where I got it from, should it? And why can't
I re-retrieve it even if I reload the page? Reloading is an abuse of
bandwidth and would make the macro take much longer to run, so I really want
to preserve the loaded links collection and just keep referring to it.

TIA,
Gregg Roberts
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,588
Default HTML Document.Links Issues

Gergg,

Hard to tell exactly what the problem is without seeing any code.
Are you automating IE to do your work ?

Tim


"Gregg Roberts" wrote in message
...
Thanks to dicks-blog.com I've just recently managed to find out how to
access
web pages and bring the data back to Excel (2000) VBA even when it's not
in a
table. The possibilities are very exciting. But I have run into two vexing
problems.

1) The same code that retrieves the collection of links on a page --
essentially, Document.Links -- works with two homepages and fails on a
third.
The links on the third look the same to me -- static links with plain text
anchors.

2) One of my tasks is to list out the links in a sort of tree structure.
It
would make sense to do this recursively, up to a user-specified number of
levels deep. But I started with a non-recursive model, figuring it would
be
easier to debug. Moving to the second level, I load the page that the
first
link points to, and grab ITS link collection, storing that in a different
variable. I process the page and come back to the first level. At that
point,
trying to access the links collection so I can process the next link gives
me
an Error 70, Permission Denied. This happens even if I reload the
first-level
page and then try to re-retrieve the collection.

Since I've already stored the links collection in a variable, it shouldn't
matter that I've left the page where I got it from, should it? And why
can't
I re-retrieve it even if I reload the page? Reloading is an abuse of
bandwidth and would make the macro take much longer to run, so I really
want
to preserve the loaded links collection and just keep referring to it.

TIA,
Gregg Roberts



  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 25
Default HTML Document.Links Issues

Hard to tell exactly what the problem is without seeing any code.
Are you automating IE to do your work ?


Yes.

I solved the first problem, which was occurring because it was a frames site.
Here's the relevant code for the second problem:

IeApp.Navigate LinkURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IEDoc = IeApp.Document
<snip
Set LinkTags1 = IEDoc.links

<snip

CurRow = 3
CurCol = 1
MsgBox IsEmpty(LinkTags1) ' <-- This comes back False
MsgBox LinkTags1.Length ' <-- Error 70 occurs here on the SECOND
iteration...
For LinkIndex1 = 0 To LinkTags1.Length - 1 ' <-- ...and here.
CurRow = CurRow + 1
LinkURL = LinkTags1(LinkIndex1).href
If LinkURL < "" Then ' An A tag with blank href is a bookmark, so ignore
Range(Cells(CurRow, CurCol), Cells(CurRow, CurCol)).Value = _
LinkTags1(LinkIndex1).innerText
Range(Cells(CurRow, CurCol + 1), Cells(CurRow, CurCol + 1)).Value = _
LinkURL
' if unique and internal, then get links at next level down
If URL_Is_Unique(LinkURL) And URL_Is_Internal(LinkURL) Then
IeApp.Navigate LinkURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IEDoc = IeApp.Document
Set LinkTags2 = IEDoc.links

<snip ... process second-level page and return to top

  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,588
Default HTML Document.Links Issues

I'm not certain you can still access the links collection if the page has
already been unloaded.
You might have to copy the links into a collection, dictionary or array
while the page is still loaded, and only then loop through them to get the
sub-links.

Tim



"Gregg Roberts" wrote in message
...
Hard to tell exactly what the problem is without seeing any code.
Are you automating IE to do your work ?


Yes.

I solved the first problem, which was occurring because it was a frames
site.
Here's the relevant code for the second problem:

IeApp.Navigate LinkURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IEDoc = IeApp.Document
<snip
Set LinkTags1 = IEDoc.links

<snip

CurRow = 3
CurCol = 1
MsgBox IsEmpty(LinkTags1) ' <-- This comes back False
MsgBox LinkTags1.Length ' <-- Error 70 occurs here on the SECOND
iteration...
For LinkIndex1 = 0 To LinkTags1.Length - 1 ' <-- ...and here.
CurRow = CurRow + 1
LinkURL = LinkTags1(LinkIndex1).href
If LinkURL < "" Then ' An A tag with blank href is a bookmark, so
ignore
Range(Cells(CurRow, CurCol), Cells(CurRow, CurCol)).Value = _
LinkTags1(LinkIndex1).innerText
Range(Cells(CurRow, CurCol + 1), Cells(CurRow, CurCol + 1)).Value =
_
LinkURL
' if unique and internal, then get links at next level down
If URL_Is_Unique(LinkURL) And URL_Is_Internal(LinkURL) Then
IeApp.Navigate LinkURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IEDoc = IeApp.Document
Set LinkTags2 = IEDoc.links

<snip ... process second-level page and return to top



  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 25
Default HTML Document.Links Issues

Tim,

This is awesome! Thank you so much.

Since you've been so helpful, I'm going to ask you a few more questions.
Obviously you're free to say "enough" at any point. I think these additional
questions still fit my subject line...

1. Are there other ReadyStates that I should handle besides
READYSTATE_COMPLETE? What if an OK/Cancel prompt appears upon navigation,
such as "You are about to visit a page that has unsecure items..." Will that
give ReadyState = READYSTATE_INTERACTIVE? What about a 404 error -- will that
still give READYSTATE_COMPLETE?

2. What if a link href is to a non-HTML file such as a .WMV, .DOC, or .PDF?
I don't want the macro to try to visit such links, even though it should not
cause an error on my machine. Short of building and maintaining a list of
what file types to visit or not, how do I determine whether to navigate to
the URL? Parsing the file extension is not straightforward because of active
pages, which can contain spurious "file extension" matches.

I really want this code to be robust and there's a lot to think about.

Gregg Roberts

P.S. I was already saving the links collection to a variable, LinkTags1 --
not just accessing it via IEDoc.Links. However, if I navigate to a new page
without setting IEDoc = IEApp.Document, why would IEDoc.Links be affected at
all -- much less be made *inaccessible*? Remember, the error message was
Permission Denied, not Object Required, Object Does Not Exist, or some more
generic error.

P.P.S. Where can I read more about how all this works? The MSDN site seems
to refer to a lot of environments I am not using, hence the examples don't
work.


  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,588
Default HTML Document.Links Issues

Gregg,

I'm at work right now so can't give you any comprehensive advice on how to
proceed. It's enough to say that what you're trying to do isn't very simple
if you want to be able to handle all possible outcomes.

You can easily check what happens with a 404 just by starting with a
non-existent URL. I haven't tried with pages which pop up a prompt so
couln't say what will happen other than your routine will hang at this
point.

File extensions: very tricky given the number of possible values. You could
decide only to follow htm, html, shtml, etc and ignore any others, but as
you note it's not always possible to tell from looking at the URL what will
be returned. As you note, it shouldn't cause an error, but not much point in
following refs which will not have links.

Re your saving of the links collection: you have to actually copy the links
to another object: by just setting a variable to point to the links
collection you are just creating a *reference* to the collection, not making
a copy of it. Since the collection only exists in the context of the loaded
page, you will lose your connection when you reload the page. "Access
denied" is the typical response to cross-domain access violations so IE may
have interpreted your actions in this context.


Tim

--
Tim Williams
Palo Alto, CA


"Gregg Roberts" wrote in message
...
Tim,

This is awesome! Thank you so much.

Since you've been so helpful, I'm going to ask you a few more questions.
Obviously you're free to say "enough" at any point. I think these

additional
questions still fit my subject line...

1. Are there other ReadyStates that I should handle besides
READYSTATE_COMPLETE? What if an OK/Cancel prompt appears upon navigation,
such as "You are about to visit a page that has unsecure items..." Will

that
give ReadyState = READYSTATE_INTERACTIVE? What about a 404 error -- will

that
still give READYSTATE_COMPLETE?

2. What if a link href is to a non-HTML file such as a .WMV, .DOC, or

..PDF?
I don't want the macro to try to visit such links, even though it should

not
cause an error on my machine. Short of building and maintaining a list of
what file types to visit or not, how do I determine whether to navigate to
the URL? Parsing the file extension is not straightforward because of

active
pages, which can contain spurious "file extension" matches.

I really want this code to be robust and there's a lot to think about.

Gregg Roberts

P.S. I was already saving the links collection to a variable, LinkTags1 --
not just accessing it via IEDoc.Links. However, if I navigate to a new

page
without setting IEDoc = IEApp.Document, why would IEDoc.Links be affected

at
all -- much less be made *inaccessible*? Remember, the error message was
Permission Denied, not Object Required, Object Does Not Exist, or some

more
generic error.

P.P.S. Where can I read more about how all this works? The MSDN site seems
to refer to a lot of environments I am not using, hence the examples don't
work.



  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 25
Default HTML Document.Links Issues

You can easily check what happens with a 404 just by starting with a
non-existent URL


Duh. Good idea. I could do the same with a page that always results in a
prompt, if I can find one.

by just setting a variable to point to the links
collection you are just creating a *reference* to the collection, not making
a copy of it. Since the collection only exists in the context of the loaded
page, you will lose your connection when you reload the page.


I thought that might be it.

Thanks for all the help.

Gregg Roberts
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
links issues freekrill Excel Discussion (Misc queries) 1 November 29th 05 05:31 PM
Opening Excel Document in ASP.NET and having issues...please help Josh Behl[_2_] Excel Programming 0 November 11th 05 03:04 AM
Update links - hyperlink issues Shan Excel Discussion (Misc queries) 0 March 3rd 05 01:57 PM
Links in html-file Mar Excel Discussion (Misc queries) 0 February 1st 05 12:01 PM
Get a value from a table in an HTML document Terry V Excel Programming 6 October 5th 04 04:37 AM


All times are GMT +1. The time now is 09:59 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"