Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default New Q to old Post: Stuck at Trying to Extract Data from a Website using JSP

Hope I am not irritating anyone, but I indeed want to know how to extract
the information contained in the tagname "HTML" of an IE page to a cell.

My detailed question and previous discussion is copied below:

===================

I've run your macro, and it seems that the tagname "html" contains all the
text on the page, including the popup box.

I know nothing about how to control IE. How can I access the content of the
tagname "html" on the page?

I've tried:
Sheets("Sheet2").Range("a1").Value = IE.document.HTML

But the above gives me an error.

Hope you are still following the thread.

Thanks.

Herbert

"Joel" ...
The data is in different tables. The pop up window is displaying specific
data from specific tables. The dollar amount of the bids are shown in the
code below. I included a debug tool that I use which is in the sub DUMP.
I
usually run this code when I working with a webpage.

I also add watch variables when I writing my code. Select variable like
TABLE and right click variable to add watch. I thedn single step through
the
code using F8.

The table vairable will have 18 tables. The index to the table starts at
zero so item 1 in the watch is table(0). You will see a property in the
watch window for tables call sourceindex. The source index starts at zero
and is the same data as the row number from dump (offset by 1). The data
you
need to get is the innertext property.

Sub GetHouse()
'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"
URL = "http://proptx.midland.com.hk/unit/index.jsp"
Unit = "?est_id=E00005&unit_id=U000146982"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL & Unit
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With



Call dump(IE)
Set Table = IE.document.getelementsbytagname("Table")
a = 1
Set PopupWin = Table(2)
b = 1
Set PopupWin = Table(3)
b = 1

'code for extracting table
RowCount = 1
For Each Row In PopupWin.Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell '
RowCount = RowCount + 1
Next Row
End Sub
Sub dump(IE)

'test code
With Sheets("Sheet6")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)

RowCount = RowCount + 1
Next itm
End With
End Sub



"Herbert Chan" wrote:

Thank you for your attempt indeed.

For this particular website, there is actually no corresponding English
version of the same data. The data provided under
http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as
the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005.

ron has pointed out that I can actually use a url such as
http://proptx.midland.com.hk/unit/in..._id=U000146982
to get to the pop up directly. That's a big piece of information for me.

After opening the webpage with the pop up, is there a way to directly
reference to the pop up box without having to copy and paste the whole
page
back to Excel? Is the pop up frame one of the tables in the page?

Now that a way is found, I just wonder if there is a more elegant way to
get
to the data.

HC

"Joel" ...


I'm having problems naviaging across pages. this code will work for 1
page.
I found the english language page for this website. I found a method
to
get
all the houses but only can span the 1st page. I used a 2nd IE to get
the
details. Found when the tagname was A and the innertext was "Details"
I
was
able to get the URL of the details page. See if this helps.



Sub GetHouses()

Dim Districts() As Variant

URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

Set IE2 = CreateObject("InternetExplorer.Application")
IE2.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop


a = 1
Set navigate = IE.document.getelementbyid("estatebox")
Pages = navigate.all.Length - 1
For pagecount = 1 To Pages
If pagecount < 1 Then

Set navigate = IE.document.getelementbyid("estatebox")
Set Form = IE.document.getelementsbytagname("Form")

'I keep on getting the 1st page with this code.
'I can put the page number in the box but the submit gets
'back to page one.
'change to next page
navigate.all(0).Value = pagecount
navigate.all(pagecount).Click
Form(1).submit

Do While IE.readystate < 4 Or _
IE.Busy = True

DoEvents
Loop
End If

Set Body = IE.document.getelementbyid("csBody")
'get table
Set Listings = Body.all(2)

For Each itm In Listings.all
If itm.tagname = "A" And _
itm.innertext = "Details" Then

'naviagate to detains in 2nd explorer
URL2 = itm.href
IE2.Navigate2 URL2
Do While IE2.readystate < 4 Or _
IE2.Busy = True
DoEvents
Loop

End If
Next itm

Next pagecount


End Sub



"HC" wrote:

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to
achieve
more.

On the page:
http://proptx.midland.com.hk/unit/in...?est_id=E00005,
it
shows there are Flat A to Flat H on each floor. The number is the
size
of
that particular flat. When you click on, say "753'" in the cell
denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of
purple
text denotes the location of the flat, the second row of purple text
states
again the area of the flat, and the third row of purple text states
that
there were previously 3 transactions for this particular flat. First
column
is the date of transaction, second column is the sold price, and the
third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying
the
codes, I know that the ID of each flat (unit) is in the saved html
file
and
that won't be too difficult to extract (I've done similar thing
before,
and
I guess I will be able to figure that out). After I've got all the
unit_id,
I want to be able to repeatedly open the corresponding jsp page and
extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start
an
instant of IE and then copy the data from IE to Excel. So do you know
how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that
shows
the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp,
and
it uses unit_id to denote each unit. However, I can't figure out
further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...

Here is code to get you started. I don't know chinese and not sure
what
data you are looking for. I belive the ID=E0005 the part of the
query
that
extracts a particular house. Not sure how to lookup the id's. You
can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of
houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in
Chinese)

If you click on a particular house, the past transaction records of
that
house is shown.

I want to be able to extract the past transaction data and make
charts
to
visualise the price trend of the houses. Now, I'm only able to use
the
"stupid" method of clicking on all the houses and typing in
manually
the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I
have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to
see
any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the
transactions
in
Excel. It seems the website has sort of exposed the data, but I
just
can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a
proper
table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC












  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 82
Default New Q to old Post: Stuck at Trying to Extract Data from a Websiteusing JSP

Herbert

Take a look at innerHTML or outerHTML.

Perhaps something like this.

MsgBox doc.getelementsbytagname("HTML")(0).innerhtml

By the way what is it you actually want to do in the other thread?

If for example you want to get every SCRIPT element.

For Each scr In doc.getelementsbytagname("SCRIPT")
MsgBox scr.innerhtml
Next scr
  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default New Q to old Post: Stuck at Trying to Extract Data from a Website

This is what you asked for.

Set HTMLTAG = IE.document.getElementsByTagName("HTMP")
RowCount = 1
for each itm in HTMLTag
Sheets("Sheet2").Range("a" & rowcount).Value = itm.innertext
RowCount = RowCount + 1
next itm


This is what I think you really want. the is no HTML tag. there are four
properties you can get.
1) innertext (in code below)
2) innerhtml
3) outertext
4) outerhtml

If you take the entire string the excel sheet will get an error because the
limits on the length of a text string. I only put the 1st 1024 character in
the worksheet cell in the code below.

RowCount = 1
with Sheets("Sheet2")
for each itm in IE.document.all
.Range("A" & rowcount).Value = itm.tagname
.Range("B" & rowcount).Value = itm.classname
.Range("C" & rowcount).Value = left(itm.innertext ,1024)
RowCount = RowCount + 1
next itm
end with

"H Chan" wrote:

Hope I am not irritating anyone, but I indeed want to know how to extract
the information contained in the tagname "HTML" of an IE page to a cell.

My detailed question and previous discussion is copied below:

===================

I've run your macro, and it seems that the tagname "html" contains all the
text on the page, including the popup box.

I know nothing about how to control IE. How can I access the content of the
tagname "html" on the page?

I've tried:
Sheets("Sheet2").Range("a1").Value = IE.document.HTML

But the above gives me an error.

Hope you are still following the thread.

Thanks.

Herbert

"Joel" ...
The data is in different tables. The pop up window is displaying specific
data from specific tables. The dollar amount of the bids are shown in the
code below. I included a debug tool that I use which is in the sub DUMP.
I
usually run this code when I working with a webpage.

I also add watch variables when I writing my code. Select variable like
TABLE and right click variable to add watch. I thedn single step through
the
code using F8.

The table vairable will have 18 tables. The index to the table starts at
zero so item 1 in the watch is table(0). You will see a property in the
watch window for tables call sourceindex. The source index starts at zero
and is the same data as the row number from dump (offset by 1). The data
you
need to get is the innertext property.

Sub GetHouse()
'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"
URL = "http://proptx.midland.com.hk/unit/index.jsp"
Unit = "?est_id=E00005&unit_id=U000146982"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL & Unit
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With



Call dump(IE)
Set Table = IE.document.getelementsbytagname("Table")
a = 1
Set PopupWin = Table(2)
b = 1
Set PopupWin = Table(3)
b = 1

'code for extracting table
RowCount = 1
For Each Row In PopupWin.Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell '
RowCount = RowCount + 1
Next Row
End Sub
Sub dump(IE)

'test code
With Sheets("Sheet6")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)

RowCount = RowCount + 1
Next itm
End With
End Sub



"Herbert Chan" wrote:

Thank you for your attempt indeed.

For this particular website, there is actually no corresponding English
version of the same data. The data provided under
http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as
the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005.

ron has pointed out that I can actually use a url such as
http://proptx.midland.com.hk/unit/in..._id=U000146982
to get to the pop up directly. That's a big piece of information for me.

After opening the webpage with the pop up, is there a way to directly
reference to the pop up box without having to copy and paste the whole
page
back to Excel? Is the pop up frame one of the tables in the page?

Now that a way is found, I just wonder if there is a more elegant way to
get
to the data.

HC

"Joel" ...


I'm having problems naviaging across pages. this code will work for 1
page.
I found the english language page for this website. I found a method
to
get
all the houses but only can span the 1st page. I used a 2nd IE to get
the
details. Found when the tagname was A and the innertext was "Details"
I
was
able to get the URL of the details page. See if this helps.



Sub GetHouses()

Dim Districts() As Variant

URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

Set IE2 = CreateObject("InternetExplorer.Application")
IE2.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop


a = 1
Set navigate = IE.document.getelementbyid("estatebox")
Pages = navigate.all.Length - 1
For pagecount = 1 To Pages
If pagecount < 1 Then

Set navigate = IE.document.getelementbyid("estatebox")
Set Form = IE.document.getelementsbytagname("Form")

'I keep on getting the 1st page with this code.
'I can put the page number in the box but the submit gets
'back to page one.
'change to next page
navigate.all(0).Value = pagecount
navigate.all(pagecount).Click
Form(1).submit

Do While IE.readystate < 4 Or _
IE.Busy = True

DoEvents
Loop
End If

Set Body = IE.document.getelementbyid("csBody")
'get table
Set Listings = Body.all(2)

For Each itm In Listings.all
If itm.tagname = "A" And _
itm.innertext = "Details" Then

'naviagate to detains in 2nd explorer
URL2 = itm.href
IE2.Navigate2 URL2
Do While IE2.readystate < 4 Or _
IE2.Busy = True
DoEvents
Loop

End If
Next itm

Next pagecount


End Sub



"HC" wrote:

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to
achieve
more.

On the page:
http://proptx.midland.com.hk/unit/in...?est_id=E00005,
it
shows there are Flat A to Flat H on each floor. The number is the
size
of
that particular flat. When you click on, say "753'" in the cell
denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of
purple
text denotes the location of the flat, the second row of purple text
states
again the area of the flat, and the third row of purple text states
that
there were previously 3 transactions for this particular flat. First
column
is the date of transaction, second column is the sold price, and the
third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying
the
codes, I know that the ID of each flat (unit) is in the saved html
file
and
that won't be too difficult to extract (I've done similar thing
before,
and
I guess I will be able to figure that out). After I've got all the
unit_id,
I want to be able to repeatedly open the corresponding jsp page and
extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start
an
instant of IE and then copy the data from IE to Excel. So do you know
how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that
shows
the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp,
and
it uses unit_id to denote each unit. However, I can't figure out
further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...

Here is code to get you started. I don't know chinese and not sure
what
data you are looking for. I belive the ID=E0005 the part of the
query
that
extracts a particular house. Not sure how to lookup the id's. You

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Stuck at Trying to Extract Data from a Website using JSP HC[_2_] Excel Programming 16 August 9th 09 01:52 PM
How to Post data to website to VBA bob Excel Programming 1 March 25th 09 11:18 AM
How to extract one data from a website Lamb Chop[_2_] Excel Discussion (Misc queries) 1 June 20th 08 02:19 PM
code to extract data from a website runs of XP but not Vista Russ Excel Programming 1 May 16th 08 01:30 AM
Excel Macro - Extract data from website Icy Excel Programming 0 January 9th 08 02:39 PM


All times are GMT +1. The time now is 09:52 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"