Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default Querying websites -beginner question

Hi,

I am trying to compile a database of restaurant inspection records that
are posted on the web.

The reports are generated through a search engine, but the results just
pull up static websites.

The URL Of the website results look typically like this:

http://www.region.waterloo.on.ca/web...gent&pid=03102

The number at the end of the URL changes for each record that is pulled
up. There are hundreds (maybe thousands) of records, each with their
own URL. I can't seem to identify the range of report numbers, they
seem to have no set order. However, even if if you type in a report
number that doesn't exist, it still brings up a webpage.

Is there any way for me to run a macro in Excel that will repeatedly
query the website, each time changing the 5-digit number at the end of
the URL.

The front page of the website is he
http://www.region.waterloo.on.ca/web...nAgent&valid=y

Thanks,
Tamsin

  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 179
Default Querying websites -beginner question

Shiny wrote:
The URL Of the website results look typically like this:

http://www.region.waterloo.on.ca/web...gent&pid=03102

The number at the end of the URL changes for each record that is
pulled up. There are hundreds (maybe thousands) of records, each with
their own URL. I can't seem to identify the range of report numbers,
they seem to have no set order. However, even if if you type in a
report number that doesn't exist, it still brings up a webpage.

Is there any way for me to run a macro in Excel that will repeatedly
query the website, each time changing the 5-digit number at the end of
the URL.

The front page of the website is he
http://www.region.waterloo.on.ca/web...nAgent&valid=y


Tamsin:

The "beginner" answer is no. The best way to get at that data is to get
access to the database that's behind that site. Usually, that's not
possible unless you work there. You can automate Internet Explorer and try
to scrape the information off of the site. This macro loads each page in
turn and reads through all the hyperlinks until it gets to one that look
like a restaurant. Paste it in a new workbook and set a reference (tools -
references) to Microsoft Internet Controls. Sheet1 should then contain a
list of all the five digit page id's that exist on that site.

Sub ListPageIDs()

Dim appIE As InternetExplorer
Dim docIE As Object
Dim vaCityCode As Variant
Dim i As Long, j As Long, k As Long
Dim idxLinks As Long
Dim cResults As Collection
Dim lStart As Long
Dim sPid As String
Dim vaTemp As Variant
Dim dTimeOut As Date

'starting url doesn't change
Const sURL As String = _
"http://www.region.waterloo.on.ca/web/foodinspection.nsf/Index?OpenAgent"
'this will identify links we want
Const sJAVALINK As String =
"onclick=""popUp('aaShowDetails?openagent&pid= "

'the city codes used in the url
vaCityCode = Array("k", "c", "w", "x")

Set appIE = New InternetExplorer
Set cResults = New Collection

For i = LBound(vaCityCode) To UBound(vaCityCode) 'loop through cities
For j = 97 To 122 'loop a to z
For k = 1 To 1000 Step 15 'loop through pages
appIE.Navigate sURL & "&city=" & vaCityCode(i) & _
"&prefix=" & Chr$(j) & _
"&start=" & k & _
"&valid=y"

'Loop until the web page is fully loaded
dTimeOut = Timer
Do
DoEvents
If Timer - dTimeOut 120 Then
'if the page won't load, put in column C
ThisWorkbook.Sheets(1).Range("C65000").End(xlUp).O ffset(1,
0).Value = _
appIE.Document.URL
Exit For
End If
Loop Until appIE.ReadyState = READYSTATE_COMPLETE

Set docIE = appIE.Document

'if there are no records, skip to the next letter
If InStr(1, docIE.body.innertext, "No Record Found") 0
Then
Exit For
Else
'loop through all the links looking for SJAVALINK
For idxLinks = 1 To docIE.links.Length - 1
lStart = InStr(1, docIE.links(idxLinks).outerhtml,
sJAVALINK)
Debug.Print docIE.links(idxLinks).innertext
If lStart 0 Then
sPid = Mid(docIE.links(idxLinks).outerhtml,
lStart + Len(sJAVALINK), 5)
'add link to the collection with sPid as key to
avoid duplicates
On Error Resume Next
cResults.Add sPid & "-" &
docIE.links(idxLinks).innertext, CStr(sPid)
On Error GoTo 0
End If
Next idxLinks
End If
Next k
Next j
Next i

'write the results to sheet1
For i = 1 To cResults.Count
vaTemp = Split(cResults(i), "-")
With ThisWorkbook.Sheets(1)
.Cells(i, 1).Value = Format(vaTemp(0), "00000")
.Cells(i, 2).Value = vaTemp(1)
End With
Next i

appIE.Quit

End Sub

--
Dick Kusleika
MVP - Excel
Daily Dose of Excel
http://www.dicks-blog.com


  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default Querying websites -beginner question

Hi Dick,

Thanks for taking the time to create that coding. When I run it, I get
this error: "Compile error: syntax error"
It highlights the line:
" Const sJAVALINK As String ="

Any suggestions on what I can do?

This is all, currently, Greek to me...

  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 179
Default Querying websites -beginner question

Shiny wrote:
Thanks for taking the time to create that coding. When I run it, I get
this error: "Compile error: syntax error"
It highlights the line:
" Const sJAVALINK As String ="

Any suggestions on what I can do?


I'd guess that it's a word wrapping problem. That line and the line below
it should all be on the same line. You may run into that again later in the
code once you fix that one.


--
Dick Kusleika
MVP - Excel
Daily Dose of Excel
http://www.dicks-blog.com


  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default Querying websites -beginner question

Fixed it and it worked, thanks!



Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Beginner Question Bernie Charts and Charting in Excel 0 February 13th 07 04:31 PM
Beginner question! Pat Excel Discussion (Misc queries) 3 August 7th 06 09:19 AM
beginner question Roshintosh Excel Programming 2 November 9th 05 02:11 AM
Beginner VBA question light Excel Programming 0 November 10th 04 05:40 PM
Beginner VBA question light Excel Programming 1 November 10th 04 01:44 PM


All times are GMT +1. The time now is 11:10 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"