![]() |
Querying websites -beginner question
Hi,
I am trying to compile a database of restaurant inspection records that are posted on the web. The reports are generated through a search engine, but the results just pull up static websites. The URL Of the website results look typically like this: http://www.region.waterloo.on.ca/web...gent&pid=03102 The number at the end of the URL changes for each record that is pulled up. There are hundreds (maybe thousands) of records, each with their own URL. I can't seem to identify the range of report numbers, they seem to have no set order. However, even if if you type in a report number that doesn't exist, it still brings up a webpage. Is there any way for me to run a macro in Excel that will repeatedly query the website, each time changing the 5-digit number at the end of the URL. The front page of the website is he http://www.region.waterloo.on.ca/web...nAgent&valid=y Thanks, Tamsin |
Querying websites -beginner question
Shiny wrote:
The URL Of the website results look typically like this: http://www.region.waterloo.on.ca/web...gent&pid=03102 The number at the end of the URL changes for each record that is pulled up. There are hundreds (maybe thousands) of records, each with their own URL. I can't seem to identify the range of report numbers, they seem to have no set order. However, even if if you type in a report number that doesn't exist, it still brings up a webpage. Is there any way for me to run a macro in Excel that will repeatedly query the website, each time changing the 5-digit number at the end of the URL. The front page of the website is he http://www.region.waterloo.on.ca/web...nAgent&valid=y Tamsin: The "beginner" answer is no. The best way to get at that data is to get access to the database that's behind that site. Usually, that's not possible unless you work there. You can automate Internet Explorer and try to scrape the information off of the site. This macro loads each page in turn and reads through all the hyperlinks until it gets to one that look like a restaurant. Paste it in a new workbook and set a reference (tools - references) to Microsoft Internet Controls. Sheet1 should then contain a list of all the five digit page id's that exist on that site. Sub ListPageIDs() Dim appIE As InternetExplorer Dim docIE As Object Dim vaCityCode As Variant Dim i As Long, j As Long, k As Long Dim idxLinks As Long Dim cResults As Collection Dim lStart As Long Dim sPid As String Dim vaTemp As Variant Dim dTimeOut As Date 'starting url doesn't change Const sURL As String = _ "http://www.region.waterloo.on.ca/web/foodinspection.nsf/Index?OpenAgent" 'this will identify links we want Const sJAVALINK As String = "onclick=""popUp('aaShowDetails?openagent&pid= " 'the city codes used in the url vaCityCode = Array("k", "c", "w", "x") Set appIE = New InternetExplorer Set cResults = New Collection For i = LBound(vaCityCode) To UBound(vaCityCode) 'loop through cities For j = 97 To 122 'loop a to z For k = 1 To 1000 Step 15 'loop through pages appIE.Navigate sURL & "&city=" & vaCityCode(i) & _ "&prefix=" & Chr$(j) & _ "&start=" & k & _ "&valid=y" 'Loop until the web page is fully loaded dTimeOut = Timer Do DoEvents If Timer - dTimeOut 120 Then 'if the page won't load, put in column C ThisWorkbook.Sheets(1).Range("C65000").End(xlUp).O ffset(1, 0).Value = _ appIE.Document.URL Exit For End If Loop Until appIE.ReadyState = READYSTATE_COMPLETE Set docIE = appIE.Document 'if there are no records, skip to the next letter If InStr(1, docIE.body.innertext, "No Record Found") 0 Then Exit For Else 'loop through all the links looking for SJAVALINK For idxLinks = 1 To docIE.links.Length - 1 lStart = InStr(1, docIE.links(idxLinks).outerhtml, sJAVALINK) Debug.Print docIE.links(idxLinks).innertext If lStart 0 Then sPid = Mid(docIE.links(idxLinks).outerhtml, lStart + Len(sJAVALINK), 5) 'add link to the collection with sPid as key to avoid duplicates On Error Resume Next cResults.Add sPid & "-" & docIE.links(idxLinks).innertext, CStr(sPid) On Error GoTo 0 End If Next idxLinks End If Next k Next j Next i 'write the results to sheet1 For i = 1 To cResults.Count vaTemp = Split(cResults(i), "-") With ThisWorkbook.Sheets(1) .Cells(i, 1).Value = Format(vaTemp(0), "00000") .Cells(i, 2).Value = vaTemp(1) End With Next i appIE.Quit End Sub -- Dick Kusleika MVP - Excel Daily Dose of Excel http://www.dicks-blog.com |
Querying websites -beginner question
Hi Dick,
Thanks for taking the time to create that coding. When I run it, I get this error: "Compile error: syntax error" It highlights the line: " Const sJAVALINK As String =" Any suggestions on what I can do? This is all, currently, Greek to me... |
Querying websites -beginner question
Shiny wrote:
Thanks for taking the time to create that coding. When I run it, I get this error: "Compile error: syntax error" It highlights the line: " Const sJAVALINK As String =" Any suggestions on what I can do? I'd guess that it's a word wrapping problem. That line and the line below it should all be on the same line. You may run into that again later in the code once you fix that one. -- Dick Kusleika MVP - Excel Daily Dose of Excel http://www.dicks-blog.com |
Querying websites -beginner question
Fixed it and it worked, thanks!
|
All times are GMT +1. The time now is 03:45 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com