Home |
Search |
Today's Posts |
#1
Posted to comp.lang.java.programmer,microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
Hello,
There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
Here is code to get you started. I don't know chinese and not sure what
data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You can create a string to get the URL like this: ---------------------------------------------------------------- ID = "E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp Request = URL & "?est_id=" & ID IE.Navigate2 URL ------------------------------------------------------------------ Below is code to dump the info from the house you had listed. Sub GetHouse() URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Set Table = IE.document.getElementsByTagname("Table") RowCount = 1 For Each Row In Table(7).Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell RowCount = RowCount + 1 Next Row End Sub "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
Here is something I'm working on. The webpage I dumped has a magnifier glass
which brings you to a search page. The search page is a form with a number of drop down boxes. I don't understand chineese but I'm able to go to my IE explorer and from the menu on thie IE explorer go to View - Source. the HTML is in english and has some useful comments that I'm looking at right now. Let you know what I find. "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
Hello Joel,
Thank you for your help. Maybe I need to elaborate what I want to achieve more. On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it shows there are Flat A to Flat H on each floor. The number is the size of that particular flat. When you click on, say "753'" in the cell denoting 25/F Flat A, an orange box pops up. Inside the box, the first row of purple text denotes the location of the flat, the second row of purple text states again the area of the flat, and the third row of purple text states that there were previously 3 transactions for this particular flat. First column is the date of transaction, second column is the sold price, and the third column is cost per square feet. I want to be able to progammatically extract these transaction records to Excel for charting. I am looking to achieve either of the following: 1. Find out the way to directly access the popup box. After studying the codes, I know that the ID of each flat (unit) is in the saved html file and that won't be too difficult to extract (I've done similar thing before, and I guess I will be able to figure that out). After I've got all the unit_id, I want to be able to repeatedly open the corresponding jsp page and extract the transaction records for the flats (units). 2. As you have enlightened me in your post, or maybe I can just start an instant of IE and then copy the data from IE to Excel. So do you know how I can access the value displayed in the popped up orange box? After some studying of the codes, it seems that the jsp page that shows the orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and it uses unit_id to denote each unit. However, I can't figure out further how to get to the data. Hope I've made myself clear. I'm totally stuck. Hope you are able to help. Thank you very much indeed. HC "Joel" ... Here is code to get you started. I don't know chinese and not sure what data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You can create a string to get the URL like this: ---------------------------------------------------------------- ID = "E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp Request = URL & "?est_id=" & ID IE.Navigate2 URL ------------------------------------------------------------------ Below is code to dump the info from the house you had listed. Sub GetHouse() URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Set Table = IE.document.getElementsByTagname("Table") RowCount = 1 For Each Row In Table(7).Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell RowCount = RowCount + 1 Next Row End Sub "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
On Jul 26, 7:24*am, "HC" wrote...snip...
So do you know how I can access the value displayed in the popped up orange box? HC...The following url produces the original webpage and pop-up box for flat 26/F A http://proptx.midland.com.hk/unit/in..._id=U000146982 If you programatically select all and copy / paste as text into an Excel spreadsheet, the pop-up box info that you want will be found near the top of the excel sheet...Ron |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
I'm having problems naviaging across pages. this code will work for 1 page.
I found the english language page for this website. I found a method to get all the houses but only can span the 1st page. I used a 2nd IE to get the details. Found when the tagname was A and the innertext was "Details" I was able to get the URL of the details page. See if this helps. Sub GetHouses() Dim Districts() As Variant URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True Set IE2 = CreateObject("InternetExplorer.Application") IE2.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop a = 1 Set navigate = IE.document.getelementbyid("estatebox") Pages = navigate.all.Length - 1 For pagecount = 1 To Pages If pagecount < 1 Then Set navigate = IE.document.getelementbyid("estatebox") Set Form = IE.document.getelementsbytagname("Form") 'I keep on getting the 1st page with this code. 'I can put the page number in the box but the submit gets 'back to page one. 'change to next page navigate.all(0).Value = pagecount navigate.all(pagecount).Click Form(1).submit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop End If Set Body = IE.document.getelementbyid("csBody") 'get table Set Listings = Body.all(2) For Each itm In Listings.all If itm.tagname = "A" And _ itm.innertext = "Details" Then 'naviagate to detains in 2nd explorer URL2 = itm.href IE2.Navigate2 URL2 Do While IE2.readystate < 4 Or _ IE2.Busy = True DoEvents Loop End If Next itm Next pagecount End Sub "HC" wrote: Hello Joel, Thank you for your help. Maybe I need to elaborate what I want to achieve more. On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it shows there are Flat A to Flat H on each floor. The number is the size of that particular flat. When you click on, say "753'" in the cell denoting 25/F Flat A, an orange box pops up. Inside the box, the first row of purple text denotes the location of the flat, the second row of purple text states again the area of the flat, and the third row of purple text states that there were previously 3 transactions for this particular flat. First column is the date of transaction, second column is the sold price, and the third column is cost per square feet. I want to be able to progammatically extract these transaction records to Excel for charting. I am looking to achieve either of the following: 1. Find out the way to directly access the popup box. After studying the codes, I know that the ID of each flat (unit) is in the saved html file and that won't be too difficult to extract (I've done similar thing before, and I guess I will be able to figure that out). After I've got all the unit_id, I want to be able to repeatedly open the corresponding jsp page and extract the transaction records for the flats (units). 2. As you have enlightened me in your post, or maybe I can just start an instant of IE and then copy the data from IE to Excel. So do you know how I can access the value displayed in the popped up orange box? After some studying of the codes, it seems that the jsp page that shows the orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and it uses unit_id to denote each unit. However, I can't figure out further how to get to the data. Hope I've made myself clear. I'm totally stuck. Hope you are able to help. Thank you very much indeed. HC "Joel" ... Here is code to get you started. I don't know chinese and not sure what data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You can create a string to get the URL like this: ---------------------------------------------------------------- ID = "E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp Request = URL & "?est_id=" & ID IE.Navigate2 URL ------------------------------------------------------------------ Below is code to dump the info from the house you had listed. Sub GetHouse() URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Set Table = IE.document.getElementsByTagname("Table") RowCount = 1 For Each Row In Table(7).Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell RowCount = RowCount + 1 Next Row End Sub "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#7
Posted to comp.lang.java.programmer,microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
In article , "HC" wrote:
There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. Well, there are eight empty lines, delimited by CR/LF. Possibly, they don't want strangers scraping their data. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. You could ask them for the data. Alternatively, you might be able to interpret the JavaScript usefully. I want to do webquery in Excel and then extract the data to a proper table. In my locale, the tax authority makes similar data available for download in convenient CSV format. -- John B. Matthews trashgod at gmail dot com <http://sites.google.com/site/drjohnbmatthews |
#8
Posted to comp.lang.java.programmer,microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
On Sun, 26 Jul 2009 17:26:30 +0800, "HC" wrote, quoted or
indirectly quoted someone who said : I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I got a temporarily unavailable on that URL. I wrote a simple screenscrape to go to Oanda.com to pick up the daily exchange rates. To my surprise I got a rude letter from their lawyers and they blocked me from the site. They were willing to give the information freely to the public, so long as it was not actually used. So, I warn you, the same thing may happen to you if you succeed in screenscraping that website. See http://mindprod.com/jgloss/screenscraping.html If you think they won't mind you screenscraping, perhaps they might be willing to provide the data in XML, CSV, SOAP or other computer-friendly format. -- Roedy Green Canadian Mind Products http://mindprod.com "The industrial civilisation is based on the consumption of energy resources that are inherently limited in quantity, and that are about to become scarce. When they do, competition for what remains will trigger dramatic economic and geopolitical events; in the end, it may be impossible for even a single nation to sustain industrialism as we have know it in the twentieth century." ~ Richard Heinberg, The Party’s Over: Oil, War, and the Fate of Industrial Societies |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
On Jul 26, 10:04*am, ron wrote:
On Jul 26, 7:24*am, "HC" wrote...snip... So do you know how I can access the value displayed in the popped up orange box? HC...The following url produces the original webpage and pop-up box for flat 26/F A http://proptx.midland.com.hk/unit/in...05&unit_id=U00.... If you programatically select all and copy / paste as text into an Excel spreadsheet, the pop-up box info that you want will be found near the top of the excel sheet...Ron So just to be clear, all of the flats that have a pop-up window have a unit ID which can be extracted from the source code (div id=) at http://proptx.midland.com.hk/unit/in...?est_id=E00005 and assigned to an array. It sounds like HC has already collected all of these unit IDs. They can then be programatically added to the base url to create the link I attached in my earlier post my_url = base url + unit ID = "http://proptx.midland.com.hk/unit/index.jsp? est_id=E00005&" _ ID_array(J) One can then step through these urls one at a time and open the web page plus pop-up window for each url; scrape each window plus pop-up, paste the contents into excel and extract the desired information that was contained in the pop-up window...Ron |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
Thank you for your attempt indeed.
For this particular website, there is actually no corresponding English version of the same data. The data provided under http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005. ron has pointed out that I can actually use a url such as http://proptx.midland.com.hk/unit/in..._id=U000146982 to get to the pop up directly. That's a big piece of information for me. After opening the webpage with the pop up, is there a way to directly reference to the pop up box without having to copy and paste the whole page back to Excel? Is the pop up frame one of the tables in the page? Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC "Joel" ... I'm having problems naviaging across pages. this code will work for 1 page. I found the english language page for this website. I found a method to get all the houses but only can span the 1st page. I used a 2nd IE to get the details. Found when the tagname was A and the innertext was "Details" I was able to get the URL of the details page. See if this helps. Sub GetHouses() Dim Districts() As Variant URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True Set IE2 = CreateObject("InternetExplorer.Application") IE2.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop a = 1 Set navigate = IE.document.getelementbyid("estatebox") Pages = navigate.all.Length - 1 For pagecount = 1 To Pages If pagecount < 1 Then Set navigate = IE.document.getelementbyid("estatebox") Set Form = IE.document.getelementsbytagname("Form") 'I keep on getting the 1st page with this code. 'I can put the page number in the box but the submit gets 'back to page one. 'change to next page navigate.all(0).Value = pagecount navigate.all(pagecount).Click Form(1).submit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop End If Set Body = IE.document.getelementbyid("csBody") 'get table Set Listings = Body.all(2) For Each itm In Listings.all If itm.tagname = "A" And _ itm.innertext = "Details" Then 'naviagate to detains in 2nd explorer URL2 = itm.href IE2.Navigate2 URL2 Do While IE2.readystate < 4 Or _ IE2.Busy = True DoEvents Loop End If Next itm Next pagecount End Sub "HC" wrote: Hello Joel, Thank you for your help. Maybe I need to elaborate what I want to achieve more. On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it shows there are Flat A to Flat H on each floor. The number is the size of that particular flat. When you click on, say "753'" in the cell denoting 25/F Flat A, an orange box pops up. Inside the box, the first row of purple text denotes the location of the flat, the second row of purple text states again the area of the flat, and the third row of purple text states that there were previously 3 transactions for this particular flat. First column is the date of transaction, second column is the sold price, and the third column is cost per square feet. I want to be able to progammatically extract these transaction records to Excel for charting. I am looking to achieve either of the following: 1. Find out the way to directly access the popup box. After studying the codes, I know that the ID of each flat (unit) is in the saved html file and that won't be too difficult to extract (I've done similar thing before, and I guess I will be able to figure that out). After I've got all the unit_id, I want to be able to repeatedly open the corresponding jsp page and extract the transaction records for the flats (units). 2. As you have enlightened me in your post, or maybe I can just start an instant of IE and then copy the data from IE to Excel. So do you know how I can access the value displayed in the popped up orange box? After some studying of the codes, it seems that the jsp page that shows the orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and it uses unit_id to denote each unit. However, I can't figure out further how to get to the data. Hope I've made myself clear. I'm totally stuck. Hope you are able to help. Thank you very much indeed. HC "Joel" ... Here is code to get you started. I don't know chinese and not sure what data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You can create a string to get the URL like this: ---------------------------------------------------------------- ID = "E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp Request = URL & "?est_id=" & ID IE.Navigate2 URL ------------------------------------------------------------------ Below is code to dump the info from the house you had listed. Sub GetHouse() URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Set Table = IE.document.getElementsByTagname("Table") RowCount = 1 For Each Row In Table(7).Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell RowCount = RowCount + 1 Next Row End Sub "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
The data is in different tables. The pop up window is displaying specific
data from specific tables. The dollar amount of the bids are shown in the code below. I included a debug tool that I use which is in the sub DUMP. I usually run this code when I working with a webpage. I also add watch variables when I writing my code. Select variable like TABLE and right click variable to add watch. I thedn single step through the code using F8. The table vairable will have 18 tables. The index to the table starts at zero so item 1 in the watch is table(0). You will see a property in the watch window for tables call sourceindex. The source index starts at zero and is the same data as the row number from dump (offset by 1). The data you need to get is the innertext property. Sub GetHouse() 'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp" Unit = "?est_id=E00005&unit_id=U000146982" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL & Unit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Call dump(IE) Set Table = IE.document.getelementsbytagname("Table") a = 1 Set PopupWin = Table(2) b = 1 Set PopupWin = Table(3) b = 1 'code for extracting table RowCount = 1 For Each Row In PopupWin.Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell ' RowCount = RowCount + 1 Next Row End Sub Sub dump(IE) 'test code With Sheets("Sheet6") .Cells.ClearContents RowCount = 1 For Each itm In IE.document.all .Range("A" & RowCount) = itm.tagname .Range("B" & RowCount) = itm.ID .Range("C" & RowCount) = itm.classname .Range("D" & RowCount) = Left(itm.innertext, 1024) RowCount = RowCount + 1 Next itm End With End Sub "Herbert Chan" wrote: Thank you for your attempt indeed. For this particular website, there is actually no corresponding English version of the same data. The data provided under http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005. ron has pointed out that I can actually use a url such as http://proptx.midland.com.hk/unit/in..._id=U000146982 to get to the pop up directly. That's a big piece of information for me. After opening the webpage with the pop up, is there a way to directly reference to the pop up box without having to copy and paste the whole page back to Excel? Is the pop up frame one of the tables in the page? Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC "Joel" ... I'm having problems naviaging across pages. this code will work for 1 page. I found the english language page for this website. I found a method to get all the houses but only can span the 1st page. I used a 2nd IE to get the details. Found when the tagname was A and the innertext was "Details" I was able to get the URL of the details page. See if this helps. Sub GetHouses() Dim Districts() As Variant URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True Set IE2 = CreateObject("InternetExplorer.Application") IE2.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop a = 1 Set navigate = IE.document.getelementbyid("estatebox") Pages = navigate.all.Length - 1 For pagecount = 1 To Pages If pagecount < 1 Then Set navigate = IE.document.getelementbyid("estatebox") Set Form = IE.document.getelementsbytagname("Form") 'I keep on getting the 1st page with this code. 'I can put the page number in the box but the submit gets 'back to page one. 'change to next page navigate.all(0).Value = pagecount navigate.all(pagecount).Click Form(1).submit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop End If Set Body = IE.document.getelementbyid("csBody") 'get table Set Listings = Body.all(2) For Each itm In Listings.all If itm.tagname = "A" And _ itm.innertext = "Details" Then 'naviagate to detains in 2nd explorer URL2 = itm.href IE2.Navigate2 URL2 Do While IE2.readystate < 4 Or _ IE2.Busy = True DoEvents Loop End If Next itm Next pagecount End Sub "HC" wrote: Hello Joel, Thank you for your help. Maybe I need to elaborate what I want to achieve more. On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it shows there are Flat A to Flat H on each floor. The number is the size of that particular flat. When you click on, say "753'" in the cell denoting 25/F Flat A, an orange box pops up. Inside the box, the first row of purple text denotes the location of the flat, the second row of purple text states again the area of the flat, and the third row of purple text states that there were previously 3 transactions for this particular flat. First column is the date of transaction, second column is the sold price, and the third column is cost per square feet. I want to be able to progammatically extract these transaction records to Excel for charting. I am looking to achieve either of the following: 1. Find out the way to directly access the popup box. After studying the codes, I know that the ID of each flat (unit) is in the saved html file and that won't be too difficult to extract (I've done similar thing before, and I guess I will be able to figure that out). After I've got all the unit_id, I want to be able to repeatedly open the corresponding jsp page and extract the transaction records for the flats (units). 2. As you have enlightened me in your post, or maybe I can just start an instant of IE and then copy the data from IE to Excel. So do you know how I can access the value displayed in the popped up orange box? After some studying of the codes, it seems that the jsp page that shows the orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and it uses unit_id to denote each unit. However, I can't figure out further how to get to the data. Hope I've made myself clear. I'm totally stuck. Hope you are able to help. Thank you very much indeed. HC "Joel" ... Here is code to get you started. I don't know chinese and not sure what data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You can create a string to get the URL like this: ---------------------------------------------------------------- ID = "E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp Request = URL & "?est_id=" & ID IE.Navigate2 URL ------------------------------------------------------------------ Below is code to dump the info from the house you had listed. Sub GetHouse() URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Set Table = IE.document.getElementsByTagname("Table") RowCount = 1 For Each Row In Table(7).Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell RowCount = RowCount + 1 Next Row End Sub "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#12
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
If the data is on the webpage and not visible it is still in public domain.
You can view the source code in IE by going to menu view - Source. If the data is on the page then it is in puiblic domain. If you have to request the data by using a java script then it is not in public domain. In this case the data is on on webpage and is public. "Roedy Green" wrote: On Sun, 26 Jul 2009 17:26:30 +0800, "HC" wrote, quoted or indirectly quoted someone who said : I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I got a temporarily unavailable on that URL. I wrote a simple screenscrape to go to Oanda.com to pick up the daily exchange rates. To my surprise I got a rude letter from their lawyers and they blocked me from the site. They were willing to give the information freely to the public, so long as it was not actually used. So, I warn you, the same thing may happen to you if you succeed in screenscraping that website. See http://mindprod.com/jgloss/screenscraping.html If you think they won't mind you screenscraping, perhaps they might be willing to provide the data in XML, CSV, SOAP or other computer-friendly format. -- Roedy Green Canadian Mind Products http://mindprod.com "The industrial civilisation is based on the consumption of energy resources that are inherently limited in quantity, and that are about to become scarce. When they do, competition for what remains will trigger dramatic economic and geopolitical events; in the end, it may be impossible for even a single nation to sustain industrialism as we have know it in the twentieth century." ~ Richard Heinberg, The Partys Over: Oil, War, and the Fate of Industrial Societies |
#13
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
On Jul 26, 8:24*pm, "Herbert Chan" wrote...snip...
Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC...I agree it is not a very elegant approach. Opening IE and moving from url to url takes time. I prefer to use the GET method whenever possible as it does not require IE to be open. I have a macro that checks real estate data at 800 different addresses. Using IE to move from page to page, the maco takes 45 minutes to run; using the GET method and parsing the source code behind each web page that is assigned to a variable only requires 14 minutes. However when I tried the GET method with the url I used above, the source code did not contain the pop-up box information. my_url = "http://proptx.midland.com.hk/unit/index.jsp? est_id=E00005&unit_id=U000146982" Set my_object = CreateObject("MSXML2.XMLHTTP") my_object.Open "GET", my_url, False my_object.send my_var = RL.responsetext Set my_object = Nothing An alternative would be to still run IE, scrape the data to the clipboard, then, rather than paste the data into a spreadsheet, the clipboard contents can be transferred to a variable which could then be parsed (instr, mid, left, etc.) and the desired information extracted. I have tried this method and it works for the flats on the web page. Set my_object = CreateObject("htmlfile") my_var = my_object.ParentWindow.ClipboardData.GetData("text ") ....Ron |
#14
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
On Jul 27, 9:38*am, ron wrote:
On Jul 26, 8:24*pm, "Herbert Chan" wrote...snip... Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC...I agree it is not a very elegant approach. *Opening IE and moving from url to url takes time. *I prefer to use the GET method whenever possible as it does not require IE to be open. *I have a macro that checks real estate data at 800 different addresses. *Using IE to move from page to page, the maco takes 45 minutes to run; using the GET method and parsing the source code behind each web page that is assigned to a variable only requires 14 minutes. *However when I tried the GET method with the url I used above, the source code did not contain the pop-up box information. * * my_url = "http://proptx.midland.com.hk/unit/index.jsp? est_id=E00005&unit_id=U000146982" * * Set my_object = CreateObject("MSXML2.XMLHTTP") * * my_object.Open "GET", my_url, False * * my_object.send * * my_var = RL.responsetext * * Set my_object = Nothing An alternative would be to still run IE, scrape the data to the clipboard, then, rather than paste the data into a spreadsheet, the clipboard contents can be transferred to a variable which could then be parsed (instr, mid, left, etc.) and the desired information extracted. *I have tried this method and it works for the flats on the web page. Set my_object = CreateObject("htmlfile") my_var = my_object.ParentWindow.ClipboardData.GetData("text ") ...Ron PS...to use the clipboard transfer method the Microsoft Forms 2.0 Object Library must be selected...Ron |
#15
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
I've finally got the time to sit down and study what you've written.
I've run your macro, and it seems that the tagname "html" contains all the text on the page, including the popup box. I know nothing about how to control IE. How can I access the content of the tagname "html" on the page? I've tried: Sheets("Sheet2").Range("a1").Value = IE.document.HTML But the above gives me an error. Hope you are still following the thread. Thanks. Herbert "Joel" ... The data is in different tables. The pop up window is displaying specific data from specific tables. The dollar amount of the bids are shown in the code below. I included a debug tool that I use which is in the sub DUMP. I usually run this code when I working with a webpage. I also add watch variables when I writing my code. Select variable like TABLE and right click variable to add watch. I thedn single step through the code using F8. The table vairable will have 18 tables. The index to the table starts at zero so item 1 in the watch is table(0). You will see a property in the watch window for tables call sourceindex. The source index starts at zero and is the same data as the row number from dump (offset by 1). The data you need to get is the innertext property. Sub GetHouse() 'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp" Unit = "?est_id=E00005&unit_id=U000146982" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL & Unit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Call dump(IE) Set Table = IE.document.getelementsbytagname("Table") a = 1 Set PopupWin = Table(2) b = 1 Set PopupWin = Table(3) b = 1 'code for extracting table RowCount = 1 For Each Row In PopupWin.Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell ' RowCount = RowCount + 1 Next Row End Sub Sub dump(IE) 'test code With Sheets("Sheet6") .Cells.ClearContents RowCount = 1 For Each itm In IE.document.all .Range("A" & RowCount) = itm.tagname .Range("B" & RowCount) = itm.ID .Range("C" & RowCount) = itm.classname .Range("D" & RowCount) = Left(itm.innertext, 1024) RowCount = RowCount + 1 Next itm End With End Sub "Herbert Chan" wrote: Thank you for your attempt indeed. For this particular website, there is actually no corresponding English version of the same data. The data provided under http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005. ron has pointed out that I can actually use a url such as http://proptx.midland.com.hk/unit/in..._id=U000146982 to get to the pop up directly. That's a big piece of information for me. After opening the webpage with the pop up, is there a way to directly reference to the pop up box without having to copy and paste the whole page back to Excel? Is the pop up frame one of the tables in the page? Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC "Joel" ... I'm having problems naviaging across pages. this code will work for 1 page. I found the english language page for this website. I found a method to get all the houses but only can span the 1st page. I used a 2nd IE to get the details. Found when the tagname was A and the innertext was "Details" I was able to get the URL of the details page. See if this helps. Sub GetHouses() Dim Districts() As Variant URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True Set IE2 = CreateObject("InternetExplorer.Application") IE2.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop a = 1 Set navigate = IE.document.getelementbyid("estatebox") Pages = navigate.all.Length - 1 For pagecount = 1 To Pages If pagecount < 1 Then Set navigate = IE.document.getelementbyid("estatebox") Set Form = IE.document.getelementsbytagname("Form") 'I keep on getting the 1st page with this code. 'I can put the page number in the box but the submit gets 'back to page one. 'change to next page navigate.all(0).Value = pagecount navigate.all(pagecount).Click Form(1).submit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop End If Set Body = IE.document.getelementbyid("csBody") 'get table Set Listings = Body.all(2) For Each itm In Listings.all If itm.tagname = "A" And _ itm.innertext = "Details" Then 'naviagate to detains in 2nd explorer URL2 = itm.href IE2.Navigate2 URL2 Do While IE2.readystate < 4 Or _ IE2.Busy = True DoEvents Loop End If Next itm Next pagecount End Sub "HC" wrote: Hello Joel, Thank you for your help. Maybe I need to elaborate what I want to achieve more. On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it shows there are Flat A to Flat H on each floor. The number is the size of that particular flat. When you click on, say "753'" in the cell denoting 25/F Flat A, an orange box pops up. Inside the box, the first row of purple text denotes the location of the flat, the second row of purple text states again the area of the flat, and the third row of purple text states that there were previously 3 transactions for this particular flat. First column is the date of transaction, second column is the sold price, and the third column is cost per square feet. I want to be able to progammatically extract these transaction records to Excel for charting. I am looking to achieve either of the following: 1. Find out the way to directly access the popup box. After studying the codes, I know that the ID of each flat (unit) is in the saved html file and that won't be too difficult to extract (I've done similar thing before, and I guess I will be able to figure that out). After I've got all the unit_id, I want to be able to repeatedly open the corresponding jsp page and extract the transaction records for the flats (units). 2. As you have enlightened me in your post, or maybe I can just start an instant of IE and then copy the data from IE to Excel. So do you know how I can access the value displayed in the popped up orange box? After some studying of the codes, it seems that the jsp page that shows the orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and it uses unit_id to denote each unit. However, I can't figure out further how to get to the data. Hope I've made myself clear. I'm totally stuck. Hope you are able to help. Thank you very much indeed. HC "Joel" ... Here is code to get you started. I don't know chinese and not sure what data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You can create a string to get the URL like this: ---------------------------------------------------------------- ID = "E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp Request = URL & "?est_id=" & ID IE.Navigate2 URL ------------------------------------------------------------------ Below is code to dump the info from the house you had listed. Sub GetHouse() URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Set Table = IE.document.getElementsByTagname("Table") RowCount = 1 For Each Row In Table(7).Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell RowCount = RowCount + 1 Next Row End Sub "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#16
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
How can I make a select all on the IE page and do the copy to clipboard?
URL = "http://proptx.midland.com.hk/unit/index.jsp" Unit = "?est_id=E00005&unit_id=U000146982" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True IE.Navigate2 URL & Unit How to do the copy after the above? Herbert "ron" ... On Jul 27, 9:38 am, ron wrote: On Jul 26, 8:24 pm, "Herbert Chan" wrote...snip... Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC...I agree it is not a very elegant approach. Opening IE and moving from url to url takes time. I prefer to use the GET method whenever possible as it does not require IE to be open. I have a macro that checks real estate data at 800 different addresses. Using IE to move from page to page, the maco takes 45 minutes to run; using the GET method and parsing the source code behind each web page that is assigned to a variable only requires 14 minutes. However when I tried the GET method with the url I used above, the source code did not contain the pop-up box information. my_url = "http://proptx.midland.com.hk/unit/index.jsp? est_id=E00005&unit_id=U000146982" Set my_object = CreateObject("MSXML2.XMLHTTP") my_object.Open "GET", my_url, False my_object.send my_var = RL.responsetext Set my_object = Nothing An alternative would be to still run IE, scrape the data to the clipboard, then, rather than paste the data into a spreadsheet, the clipboard contents can be transferred to a variable which could then be parsed (instr, mid, left, etc.) and the desired information extracted. I have tried this method and it works for the flats on the web page. Set my_object = CreateObject("htmlfile") my_var = my_object.ParentWindow.ClipboardData.GetData("text ") ...Ron PS...to use the clipboard transfer method the Microsoft Forms 2.0 Object Library must be selected...Ron |
#17
Posted to microsoft.public.excel.programming
|
|||
|
|||
Stuck at Trying to Extract Data from a Website using JSP
On Aug 8, 8:53*pm, "Herbert Chan" wrote:
How can I make a select all on the IE page and do the copy to clipboard? URL = "http://proptx.midland.com.hk/unit/index.jsp" Unit = "?est_id=E00005&unit_id=U000146982" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True IE.Navigate2 URL & Unit How to do the copy after the above? Herbert "ron" ... On Jul 27, 9:38 am, ron wrote: On Jul 26, 8:24 pm, "Herbert Chan" wrote...snip... Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC...I agree it is not a very elegant approach. Opening IE and moving from url to url takes time. I prefer to use the GET method whenever possible as it does not require IE to be open. I have a macro that checks real estate data at 800 different addresses. Using IE to move from page to page, the maco takes 45 minutes to run; using the GET method and parsing the source code behind each web page that is assigned to a variable only requires 14 minutes. However when I tried the GET method with the url I used above, the source code did not contain the pop-up box information. my_url = "http://proptx.midland.com.hk/unit/index.jsp? est_id=E00005&unit_id=U000146982" Set my_object = CreateObject("MSXML2.XMLHTTP") my_object.Open "GET", my_url, False my_object.send my_var = RL.responsetext Set my_object = Nothing An alternative would be to still run IE, scrape the data to the clipboard, then, rather than paste the data into a spreadsheet, the clipboard contents can be transferred to a variable which could then be parsed (instr, mid, left, etc.) and the desired information extracted. I have tried this method and it works for the flats on the web page. Set my_object = CreateObject("htmlfile") my_var = my_object.ParentWindow.ClipboardData.GetData("text ") ...Ron PS...to use the clipboard transfer method the Microsoft Forms 2.0 Object Library must be selected...Ron- Hide quoted text - - Show quoted text - Herbert...Try the following to select the entire page and then copy to the clipboard ie.ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DONTPROMPTUSER ie.ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT A reference to Microsoft Internet Controls must be set...Ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
HOW DO I ENTER DATA WHEN I SEEM TO BE STUCK IN A SUMMATION FORMULA | New Users to Excel | |||
How to extract one data from a website | Excel Discussion (Misc queries) | |||
code to extract data from a website runs of XP but not Vista | Excel Programming | |||
Excel Macro - Extract data from website | Excel Programming | |||
inporting data from website where website address frequently chang | Excel Programming |