About Us

davidman

I want to download data which is spread out over several web addresses (www.website.com/id=1 etc), which cannot be exported to CSV - is there a way to download this data over several thousand URLs into Excel automatically overnight? I can use a regular web query to download the data from one page but not from more than one because it will time out.

Can I use the InternetExplorer.Application method?

Walter Briscoe

In message of Sun, 9 Nov 2014 22:04:48 in
microsoft.public.excel.programming, davidman
writes

I want to download data which is spread out over several web addresses
(www.website.com/id=1 etc), which cannot be exported to CSV - is there a
way to download this data over several thousand URLs into Excel
automatically overnight? I can use a regular web query to download the
data from one page but not from more than one because it will time out.

Can I use the InternetExplorer.Application method?

I have been doing this for some time.
I find that I need to take a snapshot of a page to work out what I need to do with
it. I tried doing 10 interactions in parallel, but my old laptop and Internet
connection did not have enough power.

The following may be helpful.
1) Code to write significant data from the current page to a file.
Public Sub Crawl(ByVal o As Object, Optional ByVal t As String = "") ' Typename(O)
varies
' Sub Crawl(ByVal T As String, ByVal O As Object)
' When T is empty, this recursive subroutine writes a description of
' an open web page passed in O to snapshot.txt in the default folder.
' T is not empty for a recursive call
Dim c As Object ' DispDOMChildrenCollection results in error 13 on
assignment
Static f As Long ' File number used to reference snapshot.txt
Dim i As Long ' Loop counter
Dim m As Object ' Synonym of C.Item(I)
Dim s As String ' Synonym of C.Item(I).innerHTML

ChDir ActiveWorkbook.Path ' Put output in current directory
Set c = o.childNodes
If t = "" Then
f = FreeFile
Open "snapshot.txt" For Output As #f
On Error Resume Next
Print #f, IIf(o.URL < "", o.URL, "Not a document")
On Error GoTo 0 ' Ignore errors
i = 0
For Each m In c
' Debug.Print "Ioutter = " & I
Crawl m, t & " " & i: i = i + 1
Next
Close #f
Exit Sub
End If

For Each m In c
If m.nodeName = "#text" Then
Print #f, t & ", " & i & (" value=""" & m.nodeValue & """")
Else
s = m.innerHTML
' Debug.Assert InStr(S, "LineCloseInfo") = 0
If Not InStr(s, "LineCloseInfo") = 0 Then
' Debug.Print "T = " & T & "; I = " & I
' Stop
End If
Print #f, t & ", " & i; " id=" & m.ID & " innerHTML=""" & s & """"
' Print #F, t & ", " & I; " id=" & M.ID & " innerText=""" & S & """"
' S = M.innerText
If s < "" Then Crawl m, t & ", " & i
End If
i = i + 1
Next m
End Sub

2) It is useful to navigate a tree, when there is no obvious named
access to the relevant data.
Public Function GetBranch(ByVal o As Object, ParamArray Path() As Variant) As Object
' Doc is HTMLDocument, Doc.Body is HTMLBody
Dim i As Integer
Dim t As Object
Dim v As Variant

Set t = o.childNodes
' Stop
For Each v In Path
' Debug.Print "i = " & i & ", v = " & v & ", t.Length = " & t.Length & ",
TypeName(t.Item(v)) = " & TypeName(t.Item(v)): Stop
' On Error Resume Next: DebugPrint "t.item(" & V & ").innertext = """ &
T.Item(V).innerText & """": On Error GoTo 0
Set t = t.Item(v).childNodes
i = i + 1
Next v
' Stop
Set GetBranch = t
End Function

Typical uses are
Dim SearchResult As String
SearchResult = GetBranch(Doc, 1, 1, 1, 0, 0, 4, 0, 1, 0, 1, 0, 13, 3, 0,
0).Item(54).href

and
Click GetBranch(Doc, 1, 1, 2, 5).Item(1) ' Grab Text Version

I use the following public variables:
Public Doc As HTMLDocument ' IE.Document Needs Tools/References/HTML Object
Library
Public IE As SHDocVw.InternetExplorer ' Needs Tools/References/Microsoft
Internet Controls

I find the following useful:
Public Sub TxURL(ByVal URL1 As String)
I am sure the OP can Google for appropriate code and report any problems.

HTH. ;)
--
Walter Briscoe

davidman

Sorry Walter, I didn't understand any of that (I struggle to get my head around IE automation at the best of times).

I've looked at the source code, it turns out I need to go to the URL, and copy all into an Excel sheet (for a ID number 1 to 1000) because its a Javascript table which looks up the table on the internal server.

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search
Display Modes
Switch to Linear Mode Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
scraping PDFs	Jeff[_66_]	Excel Programming	0	August 12th 11 02:46 PM
Web Scraping With Loops	qcan	Excel Programming	2	February 25th 08 10:26 PM
Web scraping	mickbarry	Excel Worksheet Functions	2	February 1st 06 10:20 AM
Web Screen Scraping	scottymelloty[_19_]	Excel Programming	0	November 29th 05 01:53 PM

Menu

About Us