ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Extracting web data (https://www.excelbanter.com/excel-programming/444600-re-extracting-web-data.html)

ron

Extracting web data
 
Hi Peter...The following code should get you going...Ron

Sub Webpage()
' Retrieve the source code for the first page
J = 1
my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search/FL/~/~/~/rank/~/" & J & "/%5B/url%5D"
Set my_obj = CreateObject("MSXML2.XMLHTTP")
my_obj.Open "GET", my_url, False
my_obj.send
my_var = my_obj.responsetext
Set my_obj = Nothing

' Determine the number of pages to examine
pos_1 = InStr(1, my_var, "search-intro", vbTextCompare)
pos_2 = InStr(pos_1, my_var, "", vbTextCompare)
pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare)
pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare)
no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3)))
no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare)
no_pages = Val(no_pages)
If (no_pages Mod 10) = 0 Then
no_pages = no_pages / 10
Else
no_pages = 1 + Int(no_pages / 10)
End If

' Begin iteration
For J = 1 To no_pages
my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search/FL/~/~/~/rank/~/" & J & "/%5B/url%5D"
Set my_obj = CreateObject("MSXML2.XMLHTTP")
my_obj.Open "GET", my_url, False
my_obj.send
my_var = my_obj.responsetext
Set my_obj = Nothing

' Extract data
yy = 1
Do Until yy = 0
pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare)
pos_6 = InStr(pos_5, my_var, "", vbTextCompare)
pos_7 = InStr(pos_6, my_var, "<", vbTextCompare)
sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare)

' Use instr, instrrev and mid to extract other info you're interested in
' '
' '
' '

' Put the current school data into the workbook
ActiveCell = sc_name
ActiveCell.Offset(1, 0).Select
Loop
Next
End Sub

PeterSo

Extracting web data
 
On May 18, 10:48*am, ron wrote:
Hi Peter...The following code should get you going...Ron

Sub Webpage()
' Retrieve the source code for the first page
* * J = 1
* * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * my_obj.Open "GET", my_url, False
* * my_obj.send
* * my_var = my_obj.responsetext
* * Set my_obj = Nothing

' Determine the number of pages to examine
* * pos_1 = InStr(1, my_var, "search-intro", vbTextCompare)
* * pos_2 = InStr(pos_1, my_var, "", vbTextCompare)
* * pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare)
* * pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare)
* * no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3)))
* * no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare)
* * no_pages = Val(no_pages)
* * If (no_pages Mod 10) = 0 Then
* * * * no_pages = no_pages / 10
* * Else
* * * * no_pages = 1 + Int(no_pages / 10)
* * End If

' Begin iteration
* * For J = 1 To no_pages
* * * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * * * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * * * my_obj.Open "GET", my_url, False
* * * * my_obj.send
* * * * my_var = my_obj.responsetext
* * * * Set my_obj = Nothing

' Extractdata
* * * * yy = 1
* * * * Do Until yy = 0
* * * * * * pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare)
* * * * * * pos_6 = InStr(pos_5, my_var, "", vbTextCompare)
* * * * * * pos_7 = InStr(pos_6, my_var, "<", vbTextCompare)
* * * * * * sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
* * * * * * yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare)

' Use instr, instrrev and mid to extract other info you're interested in
' * * * * * '
' * * * * * '
' * * * * * '

' Put the current schooldatainto the workbook
* * * * * * ActiveCell = sc_name
* * * * * * ActiveCell.Offset(1, 0).Select
* * * * Loop
* * Next
End Sub


Unbelievable, you saved me weeks of work. I have to admit that your
programming skills are far superior to mine. If you have any hints on
the remaining data, you would save my day. The data I am interested in
is:

School name, street, City, State, Rank

So the final excel table would look like the following (no commas)

Hale Academy, 3443 Sw 20Th Street,
Ocala, FL, 10
Jackson County School-Sunland, 3700 Connally Dr,
Marianna, FL, 31
Indigo Christian Jr Acad, 401 N Williamson Blvd, Daytona
Beach, FL, 175
College Park Elementary School, 1330 Sw 33Rd Ave,
Ocala, FL, 179

..
..
..
..
..

I also need the zip codes, but it looks like they might not be part of
the data.

Thanks again,
Peter






PeterSo

Extracting web data
 
On May 18, 10:48*am, ron wrote:
Hi Peter...The following code should get you going...Ron

Sub Webpage()
' Retrieve the source code for the first page
* * J = 1
* * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * my_obj.Open "GET", my_url, False
* * my_obj.send
* * my_var = my_obj.responsetext
* * Set my_obj = Nothing

' Determine the number of pages to examine
* * pos_1 = InStr(1, my_var, "search-intro", vbTextCompare)
* * pos_2 = InStr(pos_1, my_var, "", vbTextCompare)
* * pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare)
* * pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare)
* * no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3)))
* * no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare)
* * no_pages = Val(no_pages)
* * If (no_pages Mod 10) = 0 Then
* * * * no_pages = no_pages / 10
* * Else
* * * * no_pages = 1 + Int(no_pages / 10)
* * End If

' Begin iteration
* * For J = 1 To no_pages
* * * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * * * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * * * my_obj.Open "GET", my_url, False
* * * * my_obj.send
* * * * my_var = my_obj.responsetext
* * * * Set my_obj = Nothing

' Extractdata
* * * * yy = 1
* * * * Do Until yy = 0
* * * * * * pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare)
* * * * * * pos_6 = InStr(pos_5, my_var, "", vbTextCompare)
* * * * * * pos_7 = InStr(pos_6, my_var, "<", vbTextCompare)
* * * * * * sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
* * * * * * yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare)

' Use instr, instrrev and mid to extract other info you're interested in
' * * * * * '
' * * * * * '
' * * * * * '

' Put the current schooldatainto the workbook
* * * * * * ActiveCell = sc_name
* * * * * * ActiveCell.Offset(1, 0).Select
* * * * Loop
* * Next
End Sub


Unbelievable, you saved me weeks of work. I have to admit that your
programming skills are far superior to mine. If you have any hints
on
the remaining data, you would save my day. The data I am interested
in
is:

| A | B | C | D | E |
-------------------------------------------------------------
| School name | street | City | State | Rank |
-------------------------------------------------------------

So the final excel table would look like the following (no commas)

Hale Academy,3443 Sw 20Th Street,Ocala,FL, 10
Jackson County School-Sunland,3700 Connally Dr,Marianna,FL, 31
Indigo Christian Jr Acad,401 N Williamson Blvd,Daytona Beach, FL,175
College Park Elementary School,1330 Sw 33Rd Ave,Ocala,FL,179

..


All times are GMT +1. The time now is 10:03 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com