Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
ron ron is offline
external usenet poster
 
Posts: 118
Default Extracting web data

Hi Peter...The following code should get you going...Ron

Sub Webpage()
' Retrieve the source code for the first page
J = 1
my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search/FL/~/~/~/rank/~/" & J & "/%5B/url%5D"
Set my_obj = CreateObject("MSXML2.XMLHTTP")
my_obj.Open "GET", my_url, False
my_obj.send
my_var = my_obj.responsetext
Set my_obj = Nothing

' Determine the number of pages to examine
pos_1 = InStr(1, my_var, "search-intro", vbTextCompare)
pos_2 = InStr(pos_1, my_var, "", vbTextCompare)
pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare)
pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare)
no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3)))
no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare)
no_pages = Val(no_pages)
If (no_pages Mod 10) = 0 Then
no_pages = no_pages / 10
Else
no_pages = 1 + Int(no_pages / 10)
End If

' Begin iteration
For J = 1 To no_pages
my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search/FL/~/~/~/rank/~/" & J & "/%5B/url%5D"
Set my_obj = CreateObject("MSXML2.XMLHTTP")
my_obj.Open "GET", my_url, False
my_obj.send
my_var = my_obj.responsetext
Set my_obj = Nothing

' Extract data
yy = 1
Do Until yy = 0
pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare)
pos_6 = InStr(pos_5, my_var, "", vbTextCompare)
pos_7 = InStr(pos_6, my_var, "<", vbTextCompare)
sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare)

' Use instr, instrrev and mid to extract other info you're interested in
' '
' '
' '

' Put the current school data into the workbook
ActiveCell = sc_name
ActiveCell.Offset(1, 0).Select
Loop
Next
End Sub
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default Extracting web data

On May 18, 10:48*am, ron wrote:
Hi Peter...The following code should get you going...Ron

Sub Webpage()
' Retrieve the source code for the first page
* * J = 1
* * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * my_obj.Open "GET", my_url, False
* * my_obj.send
* * my_var = my_obj.responsetext
* * Set my_obj = Nothing

' Determine the number of pages to examine
* * pos_1 = InStr(1, my_var, "search-intro", vbTextCompare)
* * pos_2 = InStr(pos_1, my_var, "", vbTextCompare)
* * pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare)
* * pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare)
* * no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3)))
* * no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare)
* * no_pages = Val(no_pages)
* * If (no_pages Mod 10) = 0 Then
* * * * no_pages = no_pages / 10
* * Else
* * * * no_pages = 1 + Int(no_pages / 10)
* * End If

' Begin iteration
* * For J = 1 To no_pages
* * * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * * * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * * * my_obj.Open "GET", my_url, False
* * * * my_obj.send
* * * * my_var = my_obj.responsetext
* * * * Set my_obj = Nothing

' Extractdata
* * * * yy = 1
* * * * Do Until yy = 0
* * * * * * pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare)
* * * * * * pos_6 = InStr(pos_5, my_var, "", vbTextCompare)
* * * * * * pos_7 = InStr(pos_6, my_var, "<", vbTextCompare)
* * * * * * sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
* * * * * * yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare)

' Use instr, instrrev and mid to extract other info you're interested in
' * * * * * '
' * * * * * '
' * * * * * '

' Put the current schooldatainto the workbook
* * * * * * ActiveCell = sc_name
* * * * * * ActiveCell.Offset(1, 0).Select
* * * * Loop
* * Next
End Sub


Unbelievable, you saved me weeks of work. I have to admit that your
programming skills are far superior to mine. If you have any hints on
the remaining data, you would save my day. The data I am interested in
is:

School name, street, City, State, Rank

So the final excel table would look like the following (no commas)

Hale Academy, 3443 Sw 20Th Street,
Ocala, FL, 10
Jackson County School-Sunland, 3700 Connally Dr,
Marianna, FL, 31
Indigo Christian Jr Acad, 401 N Williamson Blvd, Daytona
Beach, FL, 175
College Park Elementary School, 1330 Sw 33Rd Ave,
Ocala, FL, 179

..
..
..
..
..

I also need the zip codes, but it looks like they might not be part of
the data.

Thanks again,
Peter





  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default Extracting web data

On May 18, 10:48*am, ron wrote:
Hi Peter...The following code should get you going...Ron

Sub Webpage()
' Retrieve the source code for the first page
* * J = 1
* * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * my_obj.Open "GET", my_url, False
* * my_obj.send
* * my_var = my_obj.responsetext
* * Set my_obj = Nothing

' Determine the number of pages to examine
* * pos_1 = InStr(1, my_var, "search-intro", vbTextCompare)
* * pos_2 = InStr(pos_1, my_var, "", vbTextCompare)
* * pos_3 = InStr(1 + pos_2, my_var, "", vbTextCompare)
* * pos_4 = InStr(pos_3, my_var, "Schools", vbTextCompare)
* * no_pages = (Mid(my_var, 1 + pos_3, -1 + pos_4 - (1 + pos_3)))
* * no_pages = Replace(no_pages, ",", "", 1, -1, vbTextCompare)
* * no_pages = Val(no_pages)
* * If (no_pages Mod 10) = 0 Then
* * * * no_pages = no_pages / 10
* * Else
* * * * no_pages = 1 + Int(no_pages / 10)
* * End If

' Begin iteration
* * For J = 1 To no_pages
* * * * my_url = "http://content.usatoday.com/news/nation/environment/smokestack/search..." & J & "/%5B/url%5D"
* * * * Set my_obj = CreateObject("MSXML2.XMLHTTP")
* * * * my_obj.Open "GET", my_url, False
* * * * my_obj.send
* * * * my_var = my_obj.responsetext
* * * * Set my_obj = Nothing

' Extractdata
* * * * yy = 1
* * * * Do Until yy = 0
* * * * * * pos_5 = InStr(yy, my_var, "smokestack/school", vbTextCompare)
* * * * * * pos_6 = InStr(pos_5, my_var, "", vbTextCompare)
* * * * * * pos_7 = InStr(pos_6, my_var, "<", vbTextCompare)
* * * * * * sc_name = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
* * * * * * yy = InStr(pos_7, my_var, "smokestack/school", vbTextCompare)

' Use instr, instrrev and mid to extract other info you're interested in
' * * * * * '
' * * * * * '
' * * * * * '

' Put the current schooldatainto the workbook
* * * * * * ActiveCell = sc_name
* * * * * * ActiveCell.Offset(1, 0).Select
* * * * Loop
* * Next
End Sub


Unbelievable, you saved me weeks of work. I have to admit that your
programming skills are far superior to mine. If you have any hints
on
the remaining data, you would save my day. The data I am interested
in
is:

| A | B | C | D | E |
-------------------------------------------------------------
| School name | street | City | State | Rank |
-------------------------------------------------------------

So the final excel table would look like the following (no commas)

Hale Academy,3443 Sw 20Th Street,Ocala,FL, 10
Jackson County School-Sunland,3700 Connally Dr,Marianna,FL, 31
Indigo Christian Jr Acad,401 N Williamson Blvd,Daytona Beach, FL,175
College Park Elementary School,1330 Sw 33Rd Ave,Ocala,FL,179

..
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting data SMD Excel Worksheet Functions 7 August 28th 09 08:03 PM
extracting data from one sheet based on data in another - VLookup? des Excel Worksheet Functions 3 February 4th 09 07:27 PM
etract unique data from multiple workbooks after extracting data [email protected] Excel Programming 3 December 27th 07 06:56 AM
Text parsing - Extracting data from inconsistent data entry format. u473 Excel Programming 2 August 26th 07 01:51 AM
Extracting Data for .Txt Files By Unique Field Data La Excel Discussion (Misc queries) 3 July 17th 06 01:30 PM


All times are GMT +1. The time now is 06:48 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"