ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Querying websites -beginner question (https://www.excelbanter.com/excel-programming/345645-querying-websites-beginner-question.html)

Shiny

Querying websites -beginner question
 
Hi,

I am trying to compile a database of restaurant inspection records that
are posted on the web.

The reports are generated through a search engine, but the results just
pull up static websites.

The URL Of the website results look typically like this:

http://www.region.waterloo.on.ca/web...gent&pid=03102

The number at the end of the URL changes for each record that is pulled
up. There are hundreds (maybe thousands) of records, each with their
own URL. I can't seem to identify the range of report numbers, they
seem to have no set order. However, even if if you type in a report
number that doesn't exist, it still brings up a webpage.

Is there any way for me to run a macro in Excel that will repeatedly
query the website, each time changing the 5-digit number at the end of
the URL.

The front page of the website is he
http://www.region.waterloo.on.ca/web...nAgent&valid=y

Thanks,
Tamsin


Dick Kusleika

Querying websites -beginner question
 
Shiny wrote:
The URL Of the website results look typically like this:

http://www.region.waterloo.on.ca/web...gent&pid=03102

The number at the end of the URL changes for each record that is
pulled up. There are hundreds (maybe thousands) of records, each with
their own URL. I can't seem to identify the range of report numbers,
they seem to have no set order. However, even if if you type in a
report number that doesn't exist, it still brings up a webpage.

Is there any way for me to run a macro in Excel that will repeatedly
query the website, each time changing the 5-digit number at the end of
the URL.

The front page of the website is he
http://www.region.waterloo.on.ca/web...nAgent&valid=y


Tamsin:

The "beginner" answer is no. The best way to get at that data is to get
access to the database that's behind that site. Usually, that's not
possible unless you work there. You can automate Internet Explorer and try
to scrape the information off of the site. This macro loads each page in
turn and reads through all the hyperlinks until it gets to one that look
like a restaurant. Paste it in a new workbook and set a reference (tools -
references) to Microsoft Internet Controls. Sheet1 should then contain a
list of all the five digit page id's that exist on that site.

Sub ListPageIDs()

Dim appIE As InternetExplorer
Dim docIE As Object
Dim vaCityCode As Variant
Dim i As Long, j As Long, k As Long
Dim idxLinks As Long
Dim cResults As Collection
Dim lStart As Long
Dim sPid As String
Dim vaTemp As Variant
Dim dTimeOut As Date

'starting url doesn't change
Const sURL As String = _
"http://www.region.waterloo.on.ca/web/foodinspection.nsf/Index?OpenAgent"
'this will identify links we want
Const sJAVALINK As String =
"onclick=""popUp('aaShowDetails?openagent&pid= "

'the city codes used in the url
vaCityCode = Array("k", "c", "w", "x")

Set appIE = New InternetExplorer
Set cResults = New Collection

For i = LBound(vaCityCode) To UBound(vaCityCode) 'loop through cities
For j = 97 To 122 'loop a to z
For k = 1 To 1000 Step 15 'loop through pages
appIE.Navigate sURL & "&city=" & vaCityCode(i) & _
"&prefix=" & Chr$(j) & _
"&start=" & k & _
"&valid=y"

'Loop until the web page is fully loaded
dTimeOut = Timer
Do
DoEvents
If Timer - dTimeOut 120 Then
'if the page won't load, put in column C
ThisWorkbook.Sheets(1).Range("C65000").End(xlUp).O ffset(1,
0).Value = _
appIE.Document.URL
Exit For
End If
Loop Until appIE.ReadyState = READYSTATE_COMPLETE

Set docIE = appIE.Document

'if there are no records, skip to the next letter
If InStr(1, docIE.body.innertext, "No Record Found") 0
Then
Exit For
Else
'loop through all the links looking for SJAVALINK
For idxLinks = 1 To docIE.links.Length - 1
lStart = InStr(1, docIE.links(idxLinks).outerhtml,
sJAVALINK)
Debug.Print docIE.links(idxLinks).innertext
If lStart 0 Then
sPid = Mid(docIE.links(idxLinks).outerhtml,
lStart + Len(sJAVALINK), 5)
'add link to the collection with sPid as key to
avoid duplicates
On Error Resume Next
cResults.Add sPid & "-" &
docIE.links(idxLinks).innertext, CStr(sPid)
On Error GoTo 0
End If
Next idxLinks
End If
Next k
Next j
Next i

'write the results to sheet1
For i = 1 To cResults.Count
vaTemp = Split(cResults(i), "-")
With ThisWorkbook.Sheets(1)
.Cells(i, 1).Value = Format(vaTemp(0), "00000")
.Cells(i, 2).Value = vaTemp(1)
End With
Next i

appIE.Quit

End Sub

--
Dick Kusleika
MVP - Excel
Daily Dose of Excel
http://www.dicks-blog.com



Shiny

Querying websites -beginner question
 
Hi Dick,

Thanks for taking the time to create that coding. When I run it, I get
this error: "Compile error: syntax error"
It highlights the line:
" Const sJAVALINK As String ="

Any suggestions on what I can do?

This is all, currently, Greek to me...


Dick Kusleika

Querying websites -beginner question
 
Shiny wrote:
Thanks for taking the time to create that coding. When I run it, I get
this error: "Compile error: syntax error"
It highlights the line:
" Const sJAVALINK As String ="

Any suggestions on what I can do?


I'd guess that it's a word wrapping problem. That line and the line below
it should all be on the same line. You may run into that again later in the
code once you fix that one.


--
Dick Kusleika
MVP - Excel
Daily Dose of Excel
http://www.dicks-blog.com



Shiny

Querying websites -beginner question
 
Fixed it and it worked, thanks!



All times are GMT +1. The time now is 03:45 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com