View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.programming
Loane Sharp[_2_] Loane Sharp[_2_] is offline
external usenet poster
 
Posts: 12
Default programmatically retrieve links from web page

Hi there

I am using the Microsoft XML v6.0 library to retrieve a web page from the
Internet, as follows:

Dim oHttp As Object
Set oHttp = CreateObject("MSXML2.XMLHTTP")
oHttp.Open "GET", "http://www.microsoft.com/default.aspx", False
oHttp.Send
content = oHttp.responseText

Once downloaded, I want to search through the page for all URLs that link
through to other web pages (ie. contained within <a </a tags). The problem
is that, given the huge diversity of formats for links (relative and
absolute references, url-encoding, etc.), I'm struggling to write out all
the possibilities in code.

Is there an easier way to retrieve the contents of a specific element in a
web page, or even better, to scroll through collections of elements? I've
tried XML proper (MSXML2.DOMDocument40) but this doesn't seem to work with
HTML pages' loose structure.

Best regards
Loane