Posted to microsoft.public.excel.programming
|
|
Web Query? Is This Possible?
I just retested the code and it works perfectly. I also notice if you want
the US teritories add "TR" to the array definiation. I already had the
District of columbia in the List.
"ryguy7272" wrote:
Thanks for the Effort Joel!!! This is awesome!!!
The code looks right, but does actually this work for you? When I run it
here IE opens and hangs for a long time. Then, I close the browser and this
line is yellow:
Do While IE.busy = True Or IE.readystate < 4
No data was imported in about 7 minutes or so. I have a cable connection;
pretty fast. I imagine it will take a while to run, but I don't think it was
actually doing anything. unless, everything has to import and then update
all at onece. however, I don't think that's what it is doing. Any thoughts?
Suggestions? I'm going to try to troubleshoot, but if you can think of
something that may prevent this from running (maybe a setting in IE), please
let me know.
Kindest of regards,
Ryan---
--
Ryan---
If this information was helpful, please indicate this by clicking ''Yes''.
"Joel" wrote:
Hre is the final results. Use two macros. I wrote the 1st which gets you
all the URL's. Then write the 2nd macro which gets the actual data .
Sub USA()
'define states in searching webpage
Const FindCategories = 0
Const ExtractCategory = 1
USAStates = Array("AK", "AL", "AR", "AZ", _
"CA", "CO", "CT", "DC", "DE", "FL", "GA", _
"HI", "IA", "ID", "IL", "IN", "KS", "KY", _
"LA", "MA", "MD", "ME", "MI", "MN", "MO", _
"MS", "MT", "NC", "ND", "NE", "NH", "NJ", _
"NM", "NV", "NY", "OH", "OK", "OR", "PA", _
"RI", "SC", "SD", "TN", "TX", "UT", "VA", _
"VT", "WA", "WI", "WV", "WY")
BaseName = "http://www.dsireusa.org/incentives/index.cfm?" & _
"re=1&ee=1&spv=0&st=0&srp=1&state="
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
With Sheets("USA")
.Cells.ClearContents
RowCount = 1
.Range("A" & RowCount) = "State"
.Range("B" & RowCount) = "Category"
.Range("C" & RowCount) = "Topic"
.Range("D" & RowCount) = "SubTopic"
.Range("E" & RowCount) = "URL"
RowCount = 2
For Each USAState In USAStates
URL = BaseName & USAState
IE.Navigate2 URL
Do While IE.busy = True Or _
IE.readystate < 4
DoEvents
Loop
Set document = IE.document
'Call Dump(document)
'code for extracting table
State = FindCategories
For Each itm In IE.document.all
Select Case itm.classname
Case "categorytype"
Select Case Trim(itm.innertext)
Case "Financial Incentives", _
"Rules, Regulations & Policies"
Category = Trim(itm.innertext)
State = ExtractCategory
Case Else
State = FindCategories
End Select
End Select
If State = ExtractCategory Then
Select Case itm.classname
Case "copybold"
Topic = Trim(itm.innertext)
Case "copy"
.Range("A" & RowCount) = USAState
.Range("B" & RowCount) = Category
.Range("C" & RowCount) = Topic
Subtopic = Trim(itm.innertext)
.Range("D" & RowCount) = Subtopic
HREF = itm.FirstChild.HREF
.Range("E" & RowCount) = HREF
RowCount = RowCount + 1
End Select
End If
Next itm
.Columns.AutoFit
Next USAState
End With
IE.Quit
Set IE = Nothing
End Sub
Sub Dump(document)
With Sheets("dump")
.Cells.ClearContents
RowCount = 1
For Each itm In document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.classname
.Range("C" & RowCount) = itm.ID
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
"Joel" wrote:
This is what I have so far. I'm only doing California and NY as a test.
California is working. New York I 'm having problems with one of the links.
It is not working like the others and giving me an error. Will Look at it
tomorrow.
You need to create a worksheet called USA.
Sub USA()
'define states in searching webpage
Const FindCategories = 0
Const ExtractCategory = 1
USAStates = Array("CA", "NY")
BaseName = "http://www.dsireusa.org/incentives/index.cfm?" & _
"re=1&ee=1&spv=0&st=0&srp=1&state="
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
With Sheets("USA")
.Cells.ClearContents
RowCount = 1
For Each USAState In USAStates
URL = BaseName & USAState
IE.Navigate2 URL
Do While IE.busy = True And _
IE.readystate < 4
DoEvents
Loop
.Range("A" & RowCount) = USAState
RowCount = RowCount + 1
Set document = IE.document
'Call Dump(document)
'code for extracting table
State = FindCategories
For Each itm In IE.document.all
Select Case itm.classname
Case "categorytype"
Select Case Trim(itm.innertext)
Case "Financial Incentives", _
"Rules, Regulations & Policies"
.Range("B" & RowCount) = itm.innertext
RowCount = RowCount + 1
State = ExtractCategory
Case Else
State = FindCategories
End Select
End Select
If State = ExtractCategory Then
Select Case itm.classname
Case "copybold"
.Range("C" & RowCount) = itm.innertext
RowCount = RowCount + 1
Case "copy"
.Range("D" & RowCount) = itm.innertext
.Range("E" & RowCount) = _
itm.FirstChild.href
RowCount = RowCount + 1
End Select
End If
Next itm
.Columns.AutoFit
Next USAState
End With
IE.Quit
Set IE = Nothing
End Sub
Sub Dump(document)
With Sheets("dump")
.Cells.ClearContents
RowCount = 1
For Each itm In document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.classname
.Range("C" & RowCount) = itm.ID
.Range("D" & RowCount) = Left(itm.innertext, 1024)
RowCount = RowCount + 1
Next itm
End With
End Sub
"Joel" wrote:
I'm thinking that you may need to open two internet explorer applications.
the links will probably href properties. I probably won't get to look at
this until late tonight or tomorrow morning.
to get to each state webpage I would simply create a table of each atates
abbreviation and use it as part of the URL
California webpage is this
http://www.dsireusa.org/incentives/i...srp=1&state=CA
New York is this
http://www.dsireusa.org/incentives/i...srp=1&state=NY
So you can put all the state abbreviation on a spreadsheet
Then your loop would be
BaseName = "http://www.dsireusa.org/incentives/index.cfm?" &
"_re=1&ee=1&spv=0&st=0&srp=1&state="
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
for RowCount = 1 to 50
URL = BaseName & Range("A" & RowCount)
IE.Navigate2 URL
Do While IE.busy = True And _
IE.readystate < 4
DoEvents
Loop
Set document = IE.document
'add your code here
next rowCount
"ryguy7272" wrote:
Thanks for taking an interest in this Joel. I've seen your work with web
queries; quite remarkable. If anyone can do this, it's probably you.
Ok, this is the main page for the site:
http://www.dsireusa.org/
If you click on 'CA', for instance, you end up he
http://www.dsireusa.org/incentives/i...srp=1&state=CA
What I want to do on this page, is get Excel to drill down to each link,
between 'Financial Incentives' and 'Related Programs & Initiatives' (not
including the information below this).
Clicking on the first hyperlink takes you he
http://www.dsireusa.org/incentives/i...164F&re=1&ee=1
I want to import the date from .WebTables = "6,7,8", offset 2 or 3 rows and
then back out of the page, drill down to the next link, and so on and so
forth. I've done lots of web queries and recorded macros many times to
automate the process of importing this data from the web, but the loop is the
tricky thing here. I don't know how to handle the looping part. Does it
make sense so far?
|