![]() |
Extracting web data
Hi Peter...The following code should get you going...Ron
Sub Webpage() ' Retrieve the source code for the first page J = 1 my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search/FL/~/~/~/rank/~/" & J & "/%5B/url%5D" Set my_obj = CreateObject("MSXML2.XMLHTTP") my_obj.Open "GET", my_url, False my_obj.send my_var = my_obj.responsetext Set my_obj = Nothing ' Determine the number of pages to examine pos_1 = InStr(1, my_var, "search-intro", vbTextCompare) pos_2 = InStr(pos_1, my_var, "", vbTextCompare) pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare) pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare) no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3))) no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare) no_pages = Val(no_pages) If (no_pages Mod 10) = 0 Then no_pages = no_pages / 10 Else no_pages = 1 + Int(no_pages / 10) End If ' Begin iteration For J = 1 To no_pages my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search/FL/~/~/~/rank/~/" & J & "/%5B/url%5D" Set my_obj = CreateObject("MSXML2.XMLHTTP") my_obj.Open "GET", my_url, False my_obj.send my_var = my_obj.responsetext Set my_obj = Nothing ' Extract data yy = 1 Do Until yy = 0 pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare) pos_6 = InStr(pos_5, my_var, "", vbTextCompare) pos_7 = InStr(pos_6, my_var, "<", vbTextCompare) sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6)) yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare) ' Use instr, instrrev and mid to extract other info you're interested in ' ' ' ' ' ' ' Put the current school data into the workbook ActiveCell = sc_name ActiveCell.Offset(1, 0).Select Loop Next End Sub |
Extracting web data
On May 18, 10:48*am, ron wrote:
Hi Peter...The following code should get you going...Ron Sub Webpage() ' Retrieve the source code for the first page * * J = 1 * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D" * * Set my_obj = CreateObject("MSXML2.XMLHTTP") * * my_obj.Open "GET", my_url, False * * my_obj.send * * my_var = my_obj.responsetext * * Set my_obj = Nothing ' Determine the number of pages to examine * * pos_1 = InStr(1, my_var, "search-intro", vbTextCompare) * * pos_2 = InStr(pos_1, my_var, "", vbTextCompare) * * pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare) * * pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare) * * no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3))) * * no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare) * * no_pages = Val(no_pages) * * If (no_pages Mod 10) = 0 Then * * * * no_pages = no_pages / 10 * * Else * * * * no_pages = 1 + Int(no_pages / 10) * * End If ' Begin iteration * * For J = 1 To no_pages * * * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D" * * * * Set my_obj = CreateObject("MSXML2.XMLHTTP") * * * * my_obj.Open "GET", my_url, False * * * * my_obj.send * * * * my_var = my_obj.responsetext * * * * Set my_obj = Nothing ' Extractdata * * * * yy = 1 * * * * Do Until yy = 0 * * * * * * pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare) * * * * * * pos_6 = InStr(pos_5, my_var, "", vbTextCompare) * * * * * * pos_7 = InStr(pos_6, my_var, "<", vbTextCompare) * * * * * * sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6)) * * * * * * yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare) ' Use instr, instrrev and mid to extract other info you're interested in ' * * * * * ' ' * * * * * ' ' * * * * * ' ' Put the current schooldatainto the workbook * * * * * * ActiveCell = sc_name * * * * * * ActiveCell.Offset(1, 0).Select * * * * Loop * * Next End Sub Unbelievable, you saved me weeks of work. I have to admit that your programming skills are far superior to mine. If you have any hints on the remaining data, you would save my day. The data I am interested in is: School name, street, City, State, Rank So the final excel table would look like the following (no commas) Hale Academy, 3443 Sw 20Th Street, Ocala, FL, 10 Jackson County School-Sunland, 3700 Connally Dr, Marianna, FL, 31 Indigo Christian Jr Acad, 401 N Williamson Blvd, Daytona Beach, FL, 175 College Park Elementary School, 1330 Sw 33Rd Ave, Ocala, FL, 179 .. .. .. .. .. I also need the zip codes, but it looks like they might not be part of the data. Thanks again, Peter |
Extracting web data
On May 18, 10:48*am, ron wrote:
Hi Peter...The following code should get you going...Ron Sub Webpage() ' Retrieve the source code for the first page * * J = 1 * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D" * * Set my_obj = CreateObject("MSXML2.XMLHTTP") * * my_obj.Open "GET", my_url, False * * my_obj.send * * my_var = my_obj.responsetext * * Set my_obj = Nothing ' Determine the number of pages to examine * * pos_1 = InStr(1, my_var, "search-intro", vbTextCompare) * * pos_2 = InStr(pos_1, my_var, "", vbTextCompare) * * pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare) * * pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare) * * no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3))) * * no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare) * * no_pages = Val(no_pages) * * If (no_pages Mod 10) = 0 Then * * * * no_pages = no_pages / 10 * * Else * * * * no_pages = 1 + Int(no_pages / 10) * * End If ' Begin iteration * * For J = 1 To no_pages * * * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D" * * * * Set my_obj = CreateObject("MSXML2.XMLHTTP") * * * * my_obj.Open "GET", my_url, False * * * * my_obj.send * * * * my_var = my_obj.responsetext * * * * Set my_obj = Nothing ' Extractdata * * * * yy = 1 * * * * Do Until yy = 0 * * * * * * pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare) * * * * * * pos_6 = InStr(pos_5, my_var, "", vbTextCompare) * * * * * * pos_7 = InStr(pos_6, my_var, "<", vbTextCompare) * * * * * * sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6)) * * * * * * yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare) ' Use instr, instrrev and mid to extract other info you're interested in ' * * * * * ' ' * * * * * ' ' * * * * * ' ' Put the current schooldatainto the workbook * * * * * * ActiveCell = sc_name * * * * * * ActiveCell.Offset(1, 0).Select * * * * Loop * * Next End Sub Unbelievable, you saved me weeks of work. I have to admit that your programming skills are far superior to mine. If you have any hints on the remaining data, you would save my day. The data I am interested in is: | A | B | C | D | E | ------------------------------------------------------------- | School name | street | City | State | Rank | ------------------------------------------------------------- So the final excel table would look like the following (no commas) Hale Academy,3443 Sw 20Th Street,Ocala,FL, 10 Jackson County School-Sunland,3700 Connally Dr,Marianna,FL, 31 Indigo Christian Jr Acad,401 N Williamson Blvd,Daytona Beach, FL,175 College Park Elementary School,1330 Sw 33Rd Ave,Ocala,FL,179 .. |
All times are GMT +1. The time now is 10:03 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com