Extracting web data
On May 18, 10:48*am, ron wrote:
Hi Peter...The following code should get you going...Ron
Sub Webpage()
' Retrieve the source code for the first page
* * J = 1
* * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * my_obj.Open "GET", my_url, False
* * my_obj.send
* * my_var = my_obj.responsetext
* * Set my_obj = Nothing
' Determine the number of pages to examine
* * pos_1 = InStr(1, my_var, "search-intro", vbTextCompare)
* * pos_2 = InStr(pos_1, my_var, "", vbTextCompare)
* * pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare)
* * pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare)
* * no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3)))
* * no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare)
* * no_pages = Val(no_pages)
* * If (no_pages Mod 10) = 0 Then
* * * * no_pages = no_pages / 10
* * Else
* * * * no_pages = 1 + Int(no_pages / 10)
* * End If
' Begin iteration
* * For J = 1 To no_pages
* * * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * * * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * * * my_obj.Open "GET", my_url, False
* * * * my_obj.send
* * * * my_var = my_obj.responsetext
* * * * Set my_obj = Nothing
' Extractdata
* * * * yy = 1
* * * * Do Until yy = 0
* * * * * * pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare)
* * * * * * pos_6 = InStr(pos_5, my_var, "", vbTextCompare)
* * * * * * pos_7 = InStr(pos_6, my_var, "<", vbTextCompare)
* * * * * * sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
* * * * * * yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare)
' Use instr, instrrev and mid to extract other info you're interested in
' * * * * * '
' * * * * * '
' * * * * * '
' Put the current schooldatainto the workbook
* * * * * * ActiveCell = sc_name
* * * * * * ActiveCell.Offset(1, 0).Select
* * * * Loop
* * Next
End Sub
Unbelievable, you saved me weeks of work. I have to admit that your
programming skills are far superior to mine. If you have any hints on
the remaining data, you would save my day. The data I am interested in
is:
School name, street, City, State, Rank
So the final excel table would look like the following (no commas)
Hale Academy, 3443 Sw 20Th Street,
Ocala, FL, 10
Jackson County School-Sunland, 3700 Connally Dr,
Marianna, FL, 31
Indigo Christian Jr Acad, 401 N Williamson Blvd, Daytona
Beach, FL, 175
College Park Elementary School, 1330 Sw 33Rd Ave,
Ocala, FL, 179
..
..
..
..
..
I also need the zip codes, but it looks like they might not be part of
the data.
Thanks again,
Peter
|