View Single Post
  #15   Report Post  
Posted to microsoft.public.excel.programming
Herbert Chan Herbert Chan is offline
external usenet poster
 
Posts: 26
Default Stuck at Trying to Extract Data from a Website using JSP

I've finally got the time to sit down and study what you've written.

I've run your macro, and it seems that the tagname "html" contains all the
text on the page, including the popup box.

I know nothing about how to control IE. How can I access the content of the
tagname "html" on the page?

I've tried:
Sheets("Sheet2").Range("a1").Value = IE.document.HTML

But the above gives me an error.

Hope you are still following the thread.

Thanks.

Herbert

"Joel" ...
The data is in different tables. The pop up window is displaying specific
data from specific tables. The dollar amount of the bids are shown in the
code below. I included a debug tool that I use which is in the sub DUMP.
I
usually run this code when I working with a webpage.

I also add watch variables when I writing my code. Select variable like
TABLE and right click variable to add watch. I thedn single step through
the
code using F8.

The table vairable will have 18 tables. The index to the table starts at
zero so item 1 in the watch is table(0). You will see a property in the
watch window for tables call sourceindex. The source index starts at zero
and is the same data as the row number from dump (offset by 1). The data
you
need to get is the innertext property.

Sub GetHouse()
'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"
URL = "http://proptx.midland.com.hk/unit/index.jsp"
Unit = "?est_id=E00005&unit_id=U000146982"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL & Unit
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With



Call dump(IE)
Set Table = IE.document.getelementsbytagname("Table")
a = 1
Set PopupWin = Table(2)
b = 1
Set PopupWin = Table(3)
b = 1

'code for extracting table
RowCount = 1
For Each Row In PopupWin.Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell '
RowCount = RowCount + 1
Next Row
End Sub
Sub dump(IE)

'test code
With Sheets("Sheet6")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)

RowCount = RowCount + 1
Next itm
End With
End Sub



"Herbert Chan" wrote:

Thank you for your attempt indeed.

For this particular website, there is actually no corresponding English
version of the same data. The data provided under
http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as
the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005.

ron has pointed out that I can actually use a url such as
http://proptx.midland.com.hk/unit/in..._id=U000146982
to get to the pop up directly. That's a big piece of information for me.

After opening the webpage with the pop up, is there a way to directly
reference to the pop up box without having to copy and paste the whole
page
back to Excel? Is the pop up frame one of the tables in the page?

Now that a way is found, I just wonder if there is a more elegant way to
get
to the data.

HC

"Joel" ...

I'm having problems naviaging across pages. this code will work for 1
page.
I found the english language page for this website. I found a method
to
get
all the houses but only can span the 1st page. I used a 2nd IE to get
the
details. Found when the tagname was A and the innertext was "Details"
I
was
able to get the URL of the details page. See if this helps.



Sub GetHouses()

Dim Districts() As Variant

URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

Set IE2 = CreateObject("InternetExplorer.Application")
IE2.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop


a = 1
Set navigate = IE.document.getelementbyid("estatebox")
Pages = navigate.all.Length - 1
For pagecount = 1 To Pages
If pagecount < 1 Then

Set navigate = IE.document.getelementbyid("estatebox")
Set Form = IE.document.getelementsbytagname("Form")

'I keep on getting the 1st page with this code.
'I can put the page number in the box but the submit gets
'back to page one.
'change to next page
navigate.all(0).Value = pagecount
navigate.all(pagecount).Click
Form(1).submit

Do While IE.readystate < 4 Or _
IE.Busy = True

DoEvents
Loop
End If

Set Body = IE.document.getelementbyid("csBody")
'get table
Set Listings = Body.all(2)

For Each itm In Listings.all
If itm.tagname = "A" And _
itm.innertext = "Details" Then

'naviagate to detains in 2nd explorer
URL2 = itm.href
IE2.Navigate2 URL2
Do While IE2.readystate < 4 Or _
IE2.Busy = True
DoEvents
Loop

End If
Next itm

Next pagecount


End Sub



"HC" wrote:

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to
achieve
more.

On the page:
http://proptx.midland.com.hk/unit/in...?est_id=E00005,
it
shows there are Flat A to Flat H on each floor. The number is the
size
of
that particular flat. When you click on, say "753'" in the cell
denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of
purple
text denotes the location of the flat, the second row of purple text
states
again the area of the flat, and the third row of purple text states
that
there were previously 3 transactions for this particular flat. First
column
is the date of transaction, second column is the sold price, and the
third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying
the
codes, I know that the ID of each flat (unit) is in the saved html
file
and
that won't be too difficult to extract (I've done similar thing
before,
and
I guess I will be able to figure that out). After I've got all the
unit_id,
I want to be able to repeatedly open the corresponding jsp page and
extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start
an
instant of IE and then copy the data from IE to Excel. So do you know
how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that
shows
the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp,
and
it uses unit_id to denote each unit. However, I can't figure out
further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...

Here is code to get you started. I don't know chinese and not sure
what
data you are looking for. I belive the ID=E0005 the part of the
query
that
extracts a particular house. Not sure how to lookup the id's. You
can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of
houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in
Chinese)

If you click on a particular house, the past transaction records of
that
house is shown.

I want to be able to extract the past transaction data and make
charts
to
visualise the price trend of the houses. Now, I'm only able to use
the
"stupid" method of clicking on all the houses and typing in
manually
the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I
have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to
see
any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the
transactions
in
Excel. It seems the website has sort of exposed the data, but I
just
can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a
proper
table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC