Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to comp.lang.java.programmer,microsoft.public.excel.programming
external usenet poster
 
Posts: 2
Default Stuck at Trying to Extract Data from a Website using JSP

Hello,

There is this page that lists out the past transaction records of houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese)

If you click on a particular house, the past transaction records of that
house is shown.

I want to be able to extract the past transaction data and make charts to
visualise the price trend of the houses. Now, I'm only able to use the
"stupid" method of clicking on all the houses and typing in manually the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions in
Excel. It seems the website has sort of exposed the data, but I just can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a proper table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC


  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default Stuck at Trying to Extract Data from a Website using JSP

Here is code to get you started. I don't know chinese and not sure what
data you are looking for. I belive the ID=E0005 the part of the query that
extracts a particular house. Not sure how to lookup the id's. You can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese)

If you click on a particular house, the past transaction records of that
house is shown.

I want to be able to extract the past transaction data and make charts to
visualise the price trend of the houses. Now, I'm only able to use the
"stupid" method of clicking on all the houses and typing in manually the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions in
Excel. It seems the website has sort of exposed the data, but I just can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a proper table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC



  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default Stuck at Trying to Extract Data from a Website using JSP

Here is something I'm working on. The webpage I dumped has a magnifier glass
which brings you to a search page. The search page is a form with a number
of drop down boxes. I don't understand chineese but I'm able to go to my IE
explorer and from the menu on thie IE explorer go to View - Source. the HTML
is in english and has some useful comments that I'm looking at right now.
Let you know what I find.

"HC" wrote:

Hello,

There is this page that lists out the past transaction records of houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese)

If you click on a particular house, the past transaction records of that
house is shown.

I want to be able to extract the past transaction data and make charts to
visualise the price trend of the houses. Now, I'm only able to use the
"stupid" method of clicking on all the houses and typing in manually the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions in
Excel. It seems the website has sort of exposed the data, but I just can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a proper table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC



  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2
Default Stuck at Trying to Extract Data from a Website using JSP

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to achieve
more.

On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it
shows there are Flat A to Flat H on each floor. The number is the size of
that particular flat. When you click on, say "753'" in the cell denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of purple
text denotes the location of the flat, the second row of purple text states
again the area of the flat, and the third row of purple text states that
there were previously 3 transactions for this particular flat. First column
is the date of transaction, second column is the sold price, and the third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying the
codes, I know that the ID of each flat (unit) is in the saved html file and
that won't be too difficult to extract (I've done similar thing before, and
I guess I will be able to figure that out). After I've got all the unit_id,
I want to be able to repeatedly open the corresponding jsp page and extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start an
instant of IE and then copy the data from IE to Excel. So do you know how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that shows the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and
it uses unit_id to denote each unit. However, I can't figure out further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...
Here is code to get you started. I don't know chinese and not sure what
data you are looking for. I belive the ID=E0005 the part of the query
that
extracts a particular house. Not sure how to lookup the id's. You can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in
Chinese)

If you click on a particular house, the past transaction records of that
house is shown.

I want to be able to extract the past transaction data and make charts to
visualise the price trend of the houses. Now, I'm only able to use the
"stupid" method of clicking on all the houses and typing in manually the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to see
any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions in
Excel. It seems the website has sort of exposed the data, but I just
can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a proper
table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC





  #5   Report Post  
Posted to microsoft.public.excel.programming
ron ron is offline
external usenet poster
 
Posts: 118
Default Stuck at Trying to Extract Data from a Website using JSP

On Jul 26, 7:24*am, "HC" wrote...snip...
So do you know how I
can access the value displayed in the popped up orange box?


HC...The following url produces the original webpage and pop-up box
for flat 26/F A

http://proptx.midland.com.hk/unit/in..._id=U000146982

If you programatically select all and copy / paste as text into an
Excel spreadsheet, the pop-up box info that you want will be found
near the top of the excel sheet...Ron


  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default Stuck at Trying to Extract Data from a Website using JSP

I'm having problems naviaging across pages. this code will work for 1 page.
I found the english language page for this website. I found a method to get
all the houses but only can span the 1st page. I used a 2nd IE to get the
details. Found when the tagname was A and the innertext was "Details" I was
able to get the URL of the details page. See if this helps.



Sub GetHouses()

Dim Districts() As Variant

URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

Set IE2 = CreateObject("InternetExplorer.Application")
IE2.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop


a = 1
Set navigate = IE.document.getelementbyid("estatebox")
Pages = navigate.all.Length - 1
For pagecount = 1 To Pages
If pagecount < 1 Then

Set navigate = IE.document.getelementbyid("estatebox")
Set Form = IE.document.getelementsbytagname("Form")

'I keep on getting the 1st page with this code.
'I can put the page number in the box but the submit gets
'back to page one.
'change to next page
navigate.all(0).Value = pagecount
navigate.all(pagecount).Click
Form(1).submit

Do While IE.readystate < 4 Or _
IE.Busy = True

DoEvents
Loop
End If

Set Body = IE.document.getelementbyid("csBody")
'get table
Set Listings = Body.all(2)

For Each itm In Listings.all
If itm.tagname = "A" And _
itm.innertext = "Details" Then

'naviagate to detains in 2nd explorer
URL2 = itm.href
IE2.Navigate2 URL2
Do While IE2.readystate < 4 Or _
IE2.Busy = True
DoEvents
Loop

End If
Next itm

Next pagecount


End Sub



"HC" wrote:

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to achieve
more.

On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it
shows there are Flat A to Flat H on each floor. The number is the size of
that particular flat. When you click on, say "753'" in the cell denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of purple
text denotes the location of the flat, the second row of purple text states
again the area of the flat, and the third row of purple text states that
there were previously 3 transactions for this particular flat. First column
is the date of transaction, second column is the sold price, and the third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying the
codes, I know that the ID of each flat (unit) is in the saved html file and
that won't be too difficult to extract (I've done similar thing before, and
I guess I will be able to figure that out). After I've got all the unit_id,
I want to be able to repeatedly open the corresponding jsp page and extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start an
instant of IE and then copy the data from IE to Excel. So do you know how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that shows the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and
it uses unit_id to denote each unit. However, I can't figure out further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...
Here is code to get you started. I don't know chinese and not sure what
data you are looking for. I belive the ID=E0005 the part of the query
that
extracts a particular house. Not sure how to lookup the id's. You can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in
Chinese)

If you click on a particular house, the past transaction records of that
house is shown.

I want to be able to extract the past transaction data and make charts to
visualise the price trend of the houses. Now, I'm only able to use the
"stupid" method of clicking on all the houses and typing in manually the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to see
any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions in
Excel. It seems the website has sort of exposed the data, but I just
can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a proper
table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC






  #7   Report Post  
Posted to comp.lang.java.programmer,microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Stuck at Trying to Extract Data from a Website using JSP

In article , "HC" wrote:

There is this page that lists out the past transaction records of
houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005
(it's in Chinese)

If you click on a particular house, the past transaction records of
that house is shown.

I want to be able to extract the past transaction data and make
charts to visualise the price trend of the houses. Now, I'm only
able to use the "stupid" method of clicking on all the houses and
typing in manually the transaction records in Excel and then chart
the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to
see any information will come up, but there's nothing in the page.


Well, there are eight empty lines, delimited by CR/LF. Possibly, they
don't want strangers scraping their data.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions
in Excel. It seems the website has sort of exposed the data, but I
just can't find a way to get the data out, at least one house at a
time.


You could ask them for the data. Alternatively, you might be able to
interpret the JavaScript usefully.

I want to do webquery in Excel and then extract the data to a proper
table.


In my locale, the tax authority makes similar data available for
download in convenient CSV format.

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews
  #8   Report Post  
Posted to comp.lang.java.programmer,microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Stuck at Trying to Extract Data from a Website using JSP

On Sun, 26 Jul 2009 17:26:30 +0800, "HC" wrote, quoted or
indirectly quoted someone who said :

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.


I got a temporarily unavailable on that URL.

I wrote a simple screenscrape to go to Oanda.com to pick up the daily
exchange rates. To my surprise I got a rude letter from their lawyers
and they blocked me from the site. They were willing to give the
information freely to the public, so long as it was not actually used.

So, I warn you, the same thing may happen to you if you succeed in
screenscraping that website.

See http://mindprod.com/jgloss/screenscraping.html

If you think they won't mind you screenscraping, perhaps they might be
willing to provide the data in XML, CSV, SOAP or other
computer-friendly format.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"The industrial civilisation is based on the consumption of energy resources that are inherently limited in quantity, and that are about to become scarce. When they do, competition for what remains will trigger dramatic economic and geopolitical events; in the end, it may be impossible for even a single nation to sustain industrialism as we have know it in the twentieth century."
~ Richard Heinberg, The Party’s Over: Oil, War, and the Fate of Industrial Societies
  #9   Report Post  
Posted to microsoft.public.excel.programming
ron ron is offline
external usenet poster
 
Posts: 118
Default Stuck at Trying to Extract Data from a Website using JSP

On Jul 26, 10:04*am, ron wrote:
On Jul 26, 7:24*am, "HC" wrote...snip...

So do you know how I
can access the value displayed in the popped up orange box?


HC...The following url produces the original webpage and pop-up box
for flat 26/F A

http://proptx.midland.com.hk/unit/in...05&unit_id=U00....

If you programatically select all and copy / paste as text into an
Excel spreadsheet, the pop-up box info that you want will be found
near the top of the excel sheet...Ron


So just to be clear, all of the flats that have a pop-up window have a
unit ID which can be extracted from the source code (div id=) at

http://proptx.midland.com.hk/unit/in...?est_id=E00005

and assigned to an array. It sounds like HC has already collected all
of these unit IDs.

They can then be programatically added to the base url to create the
link I attached in my earlier post

my_url = base url + unit ID
= "http://proptx.midland.com.hk/unit/index.jsp?
est_id=E00005&" _
ID_array(J)

One can then step through these urls one at a time and open the web
page plus pop-up window for each url; scrape each window plus pop-up,
paste the contents into excel and extract the desired information that
was contained in the pop-up window...Ron
  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 26
Default Stuck at Trying to Extract Data from a Website using JSP

Thank you for your attempt indeed.

For this particular website, there is actually no corresponding English
version of the same data. The data provided under
http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as
the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005.

ron has pointed out that I can actually use a url such as
http://proptx.midland.com.hk/unit/in..._id=U000146982
to get to the pop up directly. That's a big piece of information for me.

After opening the webpage with the pop up, is there a way to directly
reference to the pop up box without having to copy and paste the whole page
back to Excel? Is the pop up frame one of the tables in the page?

Now that a way is found, I just wonder if there is a more elegant way to get
to the data.

HC

"Joel" ...
I'm having problems naviaging across pages. this code will work for 1
page.
I found the english language page for this website. I found a method to
get
all the houses but only can span the 1st page. I used a 2nd IE to get
the
details. Found when the tagname was A and the innertext was "Details" I
was
able to get the URL of the details page. See if this helps.



Sub GetHouses()

Dim Districts() As Variant

URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

Set IE2 = CreateObject("InternetExplorer.Application")
IE2.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop


a = 1
Set navigate = IE.document.getelementbyid("estatebox")
Pages = navigate.all.Length - 1
For pagecount = 1 To Pages
If pagecount < 1 Then

Set navigate = IE.document.getelementbyid("estatebox")
Set Form = IE.document.getelementsbytagname("Form")

'I keep on getting the 1st page with this code.
'I can put the page number in the box but the submit gets
'back to page one.
'change to next page
navigate.all(0).Value = pagecount
navigate.all(pagecount).Click
Form(1).submit

Do While IE.readystate < 4 Or _
IE.Busy = True

DoEvents
Loop
End If

Set Body = IE.document.getelementbyid("csBody")
'get table
Set Listings = Body.all(2)

For Each itm In Listings.all
If itm.tagname = "A" And _
itm.innertext = "Details" Then

'naviagate to detains in 2nd explorer
URL2 = itm.href
IE2.Navigate2 URL2
Do While IE2.readystate < 4 Or _
IE2.Busy = True
DoEvents
Loop

End If
Next itm

Next pagecount


End Sub



"HC" wrote:

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to
achieve
more.

On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005,
it
shows there are Flat A to Flat H on each floor. The number is the size
of
that particular flat. When you click on, say "753'" in the cell denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of
purple
text denotes the location of the flat, the second row of purple text
states
again the area of the flat, and the third row of purple text states that
there were previously 3 transactions for this particular flat. First
column
is the date of transaction, second column is the sold price, and the
third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying
the
codes, I know that the ID of each flat (unit) is in the saved html file
and
that won't be too difficult to extract (I've done similar thing before,
and
I guess I will be able to figure that out). After I've got all the
unit_id,
I want to be able to repeatedly open the corresponding jsp page and
extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start an
instant of IE and then copy the data from IE to Excel. So do you know
how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that shows
the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp,
and
it uses unit_id to denote each unit. However, I can't figure out further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...

Here is code to get you started. I don't know chinese and not sure
what
data you are looking for. I belive the ID=E0005 the part of the query
that
extracts a particular house. Not sure how to lookup the id's. You can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of
houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in
Chinese)

If you click on a particular house, the past transaction records of
that
house is shown.

I want to be able to extract the past transaction data and make charts
to
visualise the price trend of the houses. Now, I'm only able to use
the
"stupid" method of clicking on all the houses and typing in manually
the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to
see
any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions
in
Excel. It seems the website has sort of exposed the data, but I just
can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a proper
table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC










  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default Stuck at Trying to Extract Data from a Website using JSP

The data is in different tables. The pop up window is displaying specific
data from specific tables. The dollar amount of the bids are shown in the
code below. I included a debug tool that I use which is in the sub DUMP. I
usually run this code when I working with a webpage.

I also add watch variables when I writing my code. Select variable like
TABLE and right click variable to add watch. I thedn single step through the
code using F8.

The table vairable will have 18 tables. The index to the table starts at
zero so item 1 in the watch is table(0). You will see a property in the
watch window for tables call sourceindex. The source index starts at zero
and is the same data as the row number from dump (offset by 1). The data you
need to get is the innertext property.

Sub GetHouse()
'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"
URL = "http://proptx.midland.com.hk/unit/index.jsp"
Unit = "?est_id=E00005&unit_id=U000146982"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL & Unit
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With



Call dump(IE)
Set Table = IE.document.getelementsbytagname("Table")
a = 1
Set PopupWin = Table(2)
b = 1
Set PopupWin = Table(3)
b = 1

'code for extracting table
RowCount = 1
For Each Row In PopupWin.Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell '
RowCount = RowCount + 1
Next Row
End Sub
Sub dump(IE)

'test code
With Sheets("Sheet6")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)

RowCount = RowCount + 1
Next itm
End With
End Sub



"Herbert Chan" wrote:

Thank you for your attempt indeed.

For this particular website, there is actually no corresponding English
version of the same data. The data provided under
http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as
the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005.

ron has pointed out that I can actually use a url such as
http://proptx.midland.com.hk/unit/in..._id=U000146982
to get to the pop up directly. That's a big piece of information for me.

After opening the webpage with the pop up, is there a way to directly
reference to the pop up box without having to copy and paste the whole page
back to Excel? Is the pop up frame one of the tables in the page?

Now that a way is found, I just wonder if there is a more elegant way to get
to the data.

HC

"Joel" ...
I'm having problems naviaging across pages. this code will work for 1
page.
I found the english language page for this website. I found a method to
get
all the houses but only can span the 1st page. I used a 2nd IE to get
the
details. Found when the tagname was A and the innertext was "Details" I
was
able to get the URL of the details page. See if this helps.



Sub GetHouses()

Dim Districts() As Variant

URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

Set IE2 = CreateObject("InternetExplorer.Application")
IE2.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop


a = 1
Set navigate = IE.document.getelementbyid("estatebox")
Pages = navigate.all.Length - 1
For pagecount = 1 To Pages
If pagecount < 1 Then

Set navigate = IE.document.getelementbyid("estatebox")
Set Form = IE.document.getelementsbytagname("Form")

'I keep on getting the 1st page with this code.
'I can put the page number in the box but the submit gets
'back to page one.
'change to next page
navigate.all(0).Value = pagecount
navigate.all(pagecount).Click
Form(1).submit

Do While IE.readystate < 4 Or _
IE.Busy = True

DoEvents
Loop
End If

Set Body = IE.document.getelementbyid("csBody")
'get table
Set Listings = Body.all(2)

For Each itm In Listings.all
If itm.tagname = "A" And _
itm.innertext = "Details" Then

'naviagate to detains in 2nd explorer
URL2 = itm.href
IE2.Navigate2 URL2
Do While IE2.readystate < 4 Or _
IE2.Busy = True
DoEvents
Loop

End If
Next itm

Next pagecount


End Sub



"HC" wrote:

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to
achieve
more.

On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005,
it
shows there are Flat A to Flat H on each floor. The number is the size
of
that particular flat. When you click on, say "753'" in the cell denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of
purple
text denotes the location of the flat, the second row of purple text
states
again the area of the flat, and the third row of purple text states that
there were previously 3 transactions for this particular flat. First
column
is the date of transaction, second column is the sold price, and the
third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying
the
codes, I know that the ID of each flat (unit) is in the saved html file
and
that won't be too difficult to extract (I've done similar thing before,
and
I guess I will be able to figure that out). After I've got all the
unit_id,
I want to be able to repeatedly open the corresponding jsp page and
extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start an
instant of IE and then copy the data from IE to Excel. So do you know
how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that shows
the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp,
and
it uses unit_id to denote each unit. However, I can't figure out further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...

Here is code to get you started. I don't know chinese and not sure
what
data you are looking for. I belive the ID=E0005 the part of the query
that
extracts a particular house. Not sure how to lookup the id's. You can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of
houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in
Chinese)

If you click on a particular house, the past transaction records of
that
house is shown.

I want to be able to extract the past transaction data and make charts
to
visualise the price trend of the houses. Now, I'm only able to use
the
"stupid" method of clicking on all the houses and typing in manually
the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to
see
any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the transactions
in
Excel. It seems the website has sort of exposed the data, but I just
can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a proper
table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC









  #12   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default Stuck at Trying to Extract Data from a Website using JSP

If the data is on the webpage and not visible it is still in public domain.
You can view the source code in IE by going to menu view - Source. If the
data is on the page then it is in puiblic domain. If you have to request the
data by using a java script then it is not in public domain. In this case
the data is on on webpage and is public.

"Roedy Green" wrote:

On Sun, 26 Jul 2009 17:26:30 +0800, "HC" wrote, quoted or
indirectly quoted someone who said :

I wish to be able to extract the data to Excel automatically. I have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.


I got a temporarily unavailable on that URL.

I wrote a simple screenscrape to go to Oanda.com to pick up the daily
exchange rates. To my surprise I got a rude letter from their lawyers
and they blocked me from the site. They were willing to give the
information freely to the public, so long as it was not actually used.

So, I warn you, the same thing may happen to you if you succeed in
screenscraping that website.

See http://mindprod.com/jgloss/screenscraping.html

If you think they won't mind you screenscraping, perhaps they might be
willing to provide the data in XML, CSV, SOAP or other
computer-friendly format.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"The industrial civilisation is based on the consumption of energy resources that are inherently limited in quantity, and that are about to become scarce. When they do, competition for what remains will trigger dramatic economic and geopolitical events; in the end, it may be impossible for even a single nation to sustain industrialism as we have know it in the twentieth century."
~ Richard Heinberg, The Partys Over: Oil, War, and the Fate of Industrial Societies

  #13   Report Post  
Posted to microsoft.public.excel.programming
ron ron is offline
external usenet poster
 
Posts: 118
Default Stuck at Trying to Extract Data from a Website using JSP

On Jul 26, 8:24*pm, "Herbert Chan" wrote...snip...
Now that a way is found, I just wonder if there is a more elegant way to get
to the data.


HC...I agree it is not a very elegant approach. Opening IE and moving
from url to url takes time. I prefer to use the GET method whenever
possible as it does not require IE to be open. I have a macro that
checks real estate data at 800 different addresses. Using IE to move
from page to page, the maco takes 45 minutes to run; using the GET
method and parsing the source code behind each web page that is
assigned to a variable only requires 14 minutes. However when I tried
the GET method with the url I used above, the source code did not
contain the pop-up box information.

my_url = "http://proptx.midland.com.hk/unit/index.jsp?
est_id=E00005&unit_id=U000146982"
Set my_object = CreateObject("MSXML2.XMLHTTP")
my_object.Open "GET", my_url, False
my_object.send
my_var = RL.responsetext
Set my_object = Nothing

An alternative would be to still run IE, scrape the data to the
clipboard, then, rather than paste the data into a spreadsheet, the
clipboard contents can be transferred to a variable which could then
be parsed (instr, mid, left, etc.) and the desired information
extracted. I have tried this method and it works for the flats on the
web page.

Set my_object = CreateObject("htmlfile")
my_var = my_object.ParentWindow.ClipboardData.GetData("text ")

....Ron
  #14   Report Post  
Posted to microsoft.public.excel.programming
ron ron is offline
external usenet poster
 
Posts: 118
Default Stuck at Trying to Extract Data from a Website using JSP

On Jul 27, 9:38*am, ron wrote:
On Jul 26, 8:24*pm, "Herbert Chan" wrote...snip...

Now that a way is found, I just wonder if there is a more elegant way to get
to the data.


HC...I agree it is not a very elegant approach. *Opening IE and moving
from url to url takes time. *I prefer to use the GET method whenever
possible as it does not require IE to be open. *I have a macro that
checks real estate data at 800 different addresses. *Using IE to move
from page to page, the maco takes 45 minutes to run; using the GET
method and parsing the source code behind each web page that is
assigned to a variable only requires 14 minutes. *However when I tried
the GET method with the url I used above, the source code did not
contain the pop-up box information.

* * my_url = "http://proptx.midland.com.hk/unit/index.jsp?
est_id=E00005&unit_id=U000146982"
* * Set my_object = CreateObject("MSXML2.XMLHTTP")
* * my_object.Open "GET", my_url, False
* * my_object.send
* * my_var = RL.responsetext
* * Set my_object = Nothing

An alternative would be to still run IE, scrape the data to the
clipboard, then, rather than paste the data into a spreadsheet, the
clipboard contents can be transferred to a variable which could then
be parsed (instr, mid, left, etc.) and the desired information
extracted. *I have tried this method and it works for the flats on the
web page.

Set my_object = CreateObject("htmlfile")
my_var = my_object.ParentWindow.ClipboardData.GetData("text ")

...Ron


PS...to use the clipboard transfer method the Microsoft Forms 2.0
Object Library must be selected...Ron
  #15   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 26
Default Stuck at Trying to Extract Data from a Website using JSP

I've finally got the time to sit down and study what you've written.

I've run your macro, and it seems that the tagname "html" contains all the
text on the page, including the popup box.

I know nothing about how to control IE. How can I access the content of the
tagname "html" on the page?

I've tried:
Sheets("Sheet2").Range("a1").Value = IE.document.HTML

But the above gives me an error.

Hope you are still following the thread.

Thanks.

Herbert

"Joel" ...
The data is in different tables. The pop up window is displaying specific
data from specific tables. The dollar amount of the bids are shown in the
code below. I included a debug tool that I use which is in the sub DUMP.
I
usually run this code when I working with a webpage.

I also add watch variables when I writing my code. Select variable like
TABLE and right click variable to add watch. I thedn single step through
the
code using F8.

The table vairable will have 18 tables. The index to the table starts at
zero so item 1 in the watch is table(0). You will see a property in the
watch window for tables call sourceindex. The source index starts at zero
and is the same data as the row number from dump (offset by 1). The data
you
need to get is the innertext property.

Sub GetHouse()
'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"
URL = "http://proptx.midland.com.hk/unit/index.jsp"
Unit = "?est_id=E00005&unit_id=U000146982"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL & Unit
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With



Call dump(IE)
Set Table = IE.document.getelementsbytagname("Table")
a = 1
Set PopupWin = Table(2)
b = 1
Set PopupWin = Table(3)
b = 1

'code for extracting table
RowCount = 1
For Each Row In PopupWin.Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell '
RowCount = RowCount + 1
Next Row
End Sub
Sub dump(IE)

'test code
With Sheets("Sheet6")
.Cells.ClearContents
RowCount = 1
For Each itm In IE.document.all
.Range("A" & RowCount) = itm.tagname
.Range("B" & RowCount) = itm.ID
.Range("C" & RowCount) = itm.classname
.Range("D" & RowCount) = Left(itm.innertext, 1024)

RowCount = RowCount + 1
Next itm
End With
End Sub



"Herbert Chan" wrote:

Thank you for your attempt indeed.

For this particular website, there is actually no corresponding English
version of the same data. The data provided under
http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as
the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005.

ron has pointed out that I can actually use a url such as
http://proptx.midland.com.hk/unit/in..._id=U000146982
to get to the pop up directly. That's a big piece of information for me.

After opening the webpage with the pop up, is there a way to directly
reference to the pop up box without having to copy and paste the whole
page
back to Excel? Is the pop up frame one of the tables in the page?

Now that a way is found, I just wonder if there is a more elegant way to
get
to the data.

HC

"Joel" ...

I'm having problems naviaging across pages. this code will work for 1
page.
I found the english language page for this website. I found a method
to
get
all the houses but only can span the 1st page. I used a 2nd IE to get
the
details. Found when the tagname was A and the innertext was "Details"
I
was
able to get the URL of the details page. See if this helps.



Sub GetHouses()

Dim Districts() As Variant

URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en"

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

Set IE2 = CreateObject("InternetExplorer.Application")
IE2.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop


a = 1
Set navigate = IE.document.getelementbyid("estatebox")
Pages = navigate.all.Length - 1
For pagecount = 1 To Pages
If pagecount < 1 Then

Set navigate = IE.document.getelementbyid("estatebox")
Set Form = IE.document.getelementsbytagname("Form")

'I keep on getting the 1st page with this code.
'I can put the page number in the box but the submit gets
'back to page one.
'change to next page
navigate.all(0).Value = pagecount
navigate.all(pagecount).Click
Form(1).submit

Do While IE.readystate < 4 Or _
IE.Busy = True

DoEvents
Loop
End If

Set Body = IE.document.getelementbyid("csBody")
'get table
Set Listings = Body.all(2)

For Each itm In Listings.all
If itm.tagname = "A" And _
itm.innertext = "Details" Then

'naviagate to detains in 2nd explorer
URL2 = itm.href
IE2.Navigate2 URL2
Do While IE2.readystate < 4 Or _
IE2.Busy = True
DoEvents
Loop

End If
Next itm

Next pagecount


End Sub



"HC" wrote:

Hello Joel,

Thank you for your help. Maybe I need to elaborate what I want to
achieve
more.

On the page:
http://proptx.midland.com.hk/unit/in...?est_id=E00005,
it
shows there are Flat A to Flat H on each floor. The number is the
size
of
that particular flat. When you click on, say "753'" in the cell
denoting
25/F Flat A, an orange box pops up. Inside the box, the first row of
purple
text denotes the location of the flat, the second row of purple text
states
again the area of the flat, and the third row of purple text states
that
there were previously 3 transactions for this particular flat. First
column
is the date of transaction, second column is the sold price, and the
third
column is cost per square feet. I want to be able to progammatically
extract these transaction records to Excel for charting.

I am looking to achieve either of the following:

1. Find out the way to directly access the popup box. After studying
the
codes, I know that the ID of each flat (unit) is in the saved html
file
and
that won't be too difficult to extract (I've done similar thing
before,
and
I guess I will be able to figure that out). After I've got all the
unit_id,
I want to be able to repeatedly open the corresponding jsp page and
extract
the transaction records for the flats (units).

2. As you have enlightened me in your post, or maybe I can just start
an
instant of IE and then copy the data from IE to Excel. So do you know
how I
can access the value displayed in the popped up orange box?

After some studying of the codes, it seems that the jsp page that
shows
the
orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp,
and
it uses unit_id to denote each unit. However, I can't figure out
further
how to get to the data.

Hope I've made myself clear. I'm totally stuck.

Hope you are able to help.

Thank you very much indeed.

HC

"Joel" ...

Here is code to get you started. I don't know chinese and not sure
what
data you are looking for. I belive the ID=E0005 the part of the
query
that
extracts a particular house. Not sure how to lookup the id's. You
can
create a string to get the URL like this:

----------------------------------------------------------------
ID = "E00005"

URL = "http://proptx.midland.com.hk/unit/index.jsp
Request = URL & "?est_id=" & ID
IE.Navigate2 URL
------------------------------------------------------------------
Below is code to dump the info from the house you had listed.


Sub GetHouse()

URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005"


Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True


'get web page
IE.Navigate2 URL
Do While IE.readystate < 4 Or _
IE.Busy = True
DoEvents
Loop

'test code
'With Sheets("Sheet3")
' RowCount = 1
' For Each itm In IE.document.all
' .Range("A" & RowCount) = itm.tagname
' .Range("B" & RowCount) = itm.ID
' .Range("C" & RowCount) = itm.classname
' .Range("D" & RowCount) = Left(itm.innertext, 1024)

' RowCount = RowCount + 1
' Next itm
'End With




Set Table = IE.document.getElementsByTagname("Table")

RowCount = 1
For Each Row In Table(7).Rows
Colcount = 1
For Each cell In Row.Cells
Cells(RowCount, Colcount) = cell.innertext
Colcount = Colcount + 1
Next cell

RowCount = RowCount + 1
Next Row
End Sub



"HC" wrote:

Hello,

There is this page that lists out the past transaction records of
houses:
http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in
Chinese)

If you click on a particular house, the past transaction records of
that
house is shown.

I want to be able to extract the past transaction data and make
charts
to
visualise the price trend of the houses. Now, I'm only able to use
the
"stupid" method of clicking on all the houses and typing in
manually
the
transaction records in Excel and then chart the data.

I wish to be able to extract the data to Excel automatically. I
have
studied the underlying jsp pages and it seems that the site uses
http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data.

I have tried typing in
http://proptx.midland.com.hk/unit/un..._id=U000146982 to
see
any
information will come up, but there's nothing in the page.

I'm totally stuck. As I want to monitor the trend of a number of
developments, it will be very tedious to type up all the
transactions
in
Excel. It seems the website has sort of exposed the data, but I
just
can't
find a way to get the data out, at least one house at a time.

I want to do webquery in Excel and then extract the data to a
proper
table.

Hope some experts can point me in the right direction.

Regards and thanks in advance,

HC













  #16   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 26
Default Stuck at Trying to Extract Data from a Website using JSP

How can I make a select all on the IE page and do the copy to clipboard?

URL = "http://proptx.midland.com.hk/unit/index.jsp"
Unit = "?est_id=E00005&unit_id=U000146982"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate2 URL & Unit

How to do the copy after the above?

Herbert

"ron" ...
On Jul 27, 9:38 am, ron wrote:
On Jul 26, 8:24 pm, "Herbert Chan" wrote...snip...

Now that a way is found, I just wonder if there is a more elegant way to
get
to the data.


HC...I agree it is not a very elegant approach. Opening IE and moving
from url to url takes time. I prefer to use the GET method whenever
possible as it does not require IE to be open. I have a macro that
checks real estate data at 800 different addresses. Using IE to move
from page to page, the maco takes 45 minutes to run; using the GET
method and parsing the source code behind each web page that is
assigned to a variable only requires 14 minutes. However when I tried
the GET method with the url I used above, the source code did not
contain the pop-up box information.

my_url = "http://proptx.midland.com.hk/unit/index.jsp?
est_id=E00005&unit_id=U000146982"
Set my_object = CreateObject("MSXML2.XMLHTTP")
my_object.Open "GET", my_url, False
my_object.send
my_var = RL.responsetext
Set my_object = Nothing

An alternative would be to still run IE, scrape the data to the
clipboard, then, rather than paste the data into a spreadsheet, the
clipboard contents can be transferred to a variable which could then
be parsed (instr, mid, left, etc.) and the desired information
extracted. I have tried this method and it works for the flats on the
web page.

Set my_object = CreateObject("htmlfile")
my_var = my_object.ParentWindow.ClipboardData.GetData("text ")

...Ron


PS...to use the clipboard transfer method the Microsoft Forms 2.0
Object Library must be selected...Ron


  #17   Report Post  
Posted to microsoft.public.excel.programming
ron ron is offline
external usenet poster
 
Posts: 118
Default Stuck at Trying to Extract Data from a Website using JSP

On Aug 8, 8:53*pm, "Herbert Chan" wrote:
How can I make a select all on the IE page and do the copy to clipboard?

URL = "http://proptx.midland.com.hk/unit/index.jsp"
Unit = "?est_id=E00005&unit_id=U000146982"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate2 URL & Unit

How to do the copy after the above?

Herbert

"ron" ...
On Jul 27, 9:38 am, ron wrote:





On Jul 26, 8:24 pm, "Herbert Chan" wrote...snip...


Now that a way is found, I just wonder if there is a more elegant way to
get
to the data.


HC...I agree it is not a very elegant approach. Opening IE and moving
from url to url takes time. I prefer to use the GET method whenever
possible as it does not require IE to be open. I have a macro that
checks real estate data at 800 different addresses. Using IE to move
from page to page, the maco takes 45 minutes to run; using the GET
method and parsing the source code behind each web page that is
assigned to a variable only requires 14 minutes. However when I tried
the GET method with the url I used above, the source code did not
contain the pop-up box information.


my_url = "http://proptx.midland.com.hk/unit/index.jsp?
est_id=E00005&unit_id=U000146982"
Set my_object = CreateObject("MSXML2.XMLHTTP")
my_object.Open "GET", my_url, False
my_object.send
my_var = RL.responsetext
Set my_object = Nothing


An alternative would be to still run IE, scrape the data to the
clipboard, then, rather than paste the data into a spreadsheet, the
clipboard contents can be transferred to a variable which could then
be parsed (instr, mid, left, etc.) and the desired information
extracted. I have tried this method and it works for the flats on the
web page.


Set my_object = CreateObject("htmlfile")
my_var = my_object.ParentWindow.ClipboardData.GetData("text ")


...Ron


PS...to use the clipboard transfer method the Microsoft Forms 2.0
Object Library must be selected...Ron- Hide quoted text -

- Show quoted text -


Herbert...Try the following to select the entire page and then copy to
the clipboard

ie.ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DONTPROMPTUSER

ie.ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT

A reference to Microsoft Internet Controls must be set...Ron
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
HOW DO I ENTER DATA WHEN I SEEM TO BE STUCK IN A SUMMATION FORMULA kenneth j peterson New Users to Excel 4 January 12th 09 05:48 PM
How to extract one data from a website Lamb Chop[_2_] Excel Discussion (Misc queries) 1 June 20th 08 02:19 PM
code to extract data from a website runs of XP but not Vista Russ Excel Programming 1 May 16th 08 01:30 AM
Excel Macro - Extract data from website Icy Excel Programming 0 January 9th 08 02:39 PM
inporting data from website where website address frequently chang HP Excel Programming 3 June 24th 07 03:50 PM


All times are GMT +1. The time now is 07:22 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"