Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
New Q to old Post: Stuck at Trying to Extract Data from a Website using JSP
Hope I am not irritating anyone, but I indeed want to know how to extract
the information contained in the tagname "HTML" of an IE page to a cell. My detailed question and previous discussion is copied below: =================== I've run your macro, and it seems that the tagname "html" contains all the text on the page, including the popup box. I know nothing about how to control IE. How can I access the content of the tagname "html" on the page? I've tried: Sheets("Sheet2").Range("a1").Value = IE.document.HTML But the above gives me an error. Hope you are still following the thread. Thanks. Herbert "Joel" ... The data is in different tables. The pop up window is displaying specific data from specific tables. The dollar amount of the bids are shown in the code below. I included a debug tool that I use which is in the sub DUMP. I usually run this code when I working with a webpage. I also add watch variables when I writing my code. Select variable like TABLE and right click variable to add watch. I thedn single step through the code using F8. The table vairable will have 18 tables. The index to the table starts at zero so item 1 in the watch is table(0). You will see a property in the watch window for tables call sourceindex. The source index starts at zero and is the same data as the row number from dump (offset by 1). The data you need to get is the innertext property. Sub GetHouse() 'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp" Unit = "?est_id=E00005&unit_id=U000146982" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL & Unit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Call dump(IE) Set Table = IE.document.getelementsbytagname("Table") a = 1 Set PopupWin = Table(2) b = 1 Set PopupWin = Table(3) b = 1 'code for extracting table RowCount = 1 For Each Row In PopupWin.Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell ' RowCount = RowCount + 1 Next Row End Sub Sub dump(IE) 'test code With Sheets("Sheet6") .Cells.ClearContents RowCount = 1 For Each itm In IE.document.all .Range("A" & RowCount) = itm.tagname .Range("B" & RowCount) = itm.ID .Range("C" & RowCount) = itm.classname .Range("D" & RowCount) = Left(itm.innertext, 1024) RowCount = RowCount + 1 Next itm End With End Sub "Herbert Chan" wrote: Thank you for your attempt indeed. For this particular website, there is actually no corresponding English version of the same data. The data provided under http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005. ron has pointed out that I can actually use a url such as http://proptx.midland.com.hk/unit/in..._id=U000146982 to get to the pop up directly. That's a big piece of information for me. After opening the webpage with the pop up, is there a way to directly reference to the pop up box without having to copy and paste the whole page back to Excel? Is the pop up frame one of the tables in the page? Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC "Joel" ... I'm having problems naviaging across pages. this code will work for 1 page. I found the english language page for this website. I found a method to get all the houses but only can span the 1st page. I used a 2nd IE to get the details. Found when the tagname was A and the innertext was "Details" I was able to get the URL of the details page. See if this helps. Sub GetHouses() Dim Districts() As Variant URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True Set IE2 = CreateObject("InternetExplorer.Application") IE2.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop a = 1 Set navigate = IE.document.getelementbyid("estatebox") Pages = navigate.all.Length - 1 For pagecount = 1 To Pages If pagecount < 1 Then Set navigate = IE.document.getelementbyid("estatebox") Set Form = IE.document.getelementsbytagname("Form") 'I keep on getting the 1st page with this code. 'I can put the page number in the box but the submit gets 'back to page one. 'change to next page navigate.all(0).Value = pagecount navigate.all(pagecount).Click Form(1).submit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop End If Set Body = IE.document.getelementbyid("csBody") 'get table Set Listings = Body.all(2) For Each itm In Listings.all If itm.tagname = "A" And _ itm.innertext = "Details" Then 'naviagate to detains in 2nd explorer URL2 = itm.href IE2.Navigate2 URL2 Do While IE2.readystate < 4 Or _ IE2.Busy = True DoEvents Loop End If Next itm Next pagecount End Sub "HC" wrote: Hello Joel, Thank you for your help. Maybe I need to elaborate what I want to achieve more. On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it shows there are Flat A to Flat H on each floor. The number is the size of that particular flat. When you click on, say "753'" in the cell denoting 25/F Flat A, an orange box pops up. Inside the box, the first row of purple text denotes the location of the flat, the second row of purple text states again the area of the flat, and the third row of purple text states that there were previously 3 transactions for this particular flat. First column is the date of transaction, second column is the sold price, and the third column is cost per square feet. I want to be able to progammatically extract these transaction records to Excel for charting. I am looking to achieve either of the following: 1. Find out the way to directly access the popup box. After studying the codes, I know that the ID of each flat (unit) is in the saved html file and that won't be too difficult to extract (I've done similar thing before, and I guess I will be able to figure that out). After I've got all the unit_id, I want to be able to repeatedly open the corresponding jsp page and extract the transaction records for the flats (units). 2. As you have enlightened me in your post, or maybe I can just start an instant of IE and then copy the data from IE to Excel. So do you know how I can access the value displayed in the popped up orange box? After some studying of the codes, it seems that the jsp page that shows the orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and it uses unit_id to denote each unit. However, I can't figure out further how to get to the data. Hope I've made myself clear. I'm totally stuck. Hope you are able to help. Thank you very much indeed. HC "Joel" ... Here is code to get you started. I don't know chinese and not sure what data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You can create a string to get the URL like this: ---------------------------------------------------------------- ID = "E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp Request = URL & "?est_id=" & ID IE.Navigate2 URL ------------------------------------------------------------------ Below is code to dump the info from the house you had listed. Sub GetHouse() URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Set Table = IE.document.getElementsByTagname("Table") RowCount = 1 For Each Row In Table(7).Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell RowCount = RowCount + 1 Next Row End Sub "HC" wrote: Hello, There is this page that lists out the past transaction records of houses: http://proptx.midland.com.hk/unit/in...?est_id=E00005 (it's in Chinese) If you click on a particular house, the past transaction records of that house is shown. I want to be able to extract the past transaction data and make charts to visualise the price trend of the houses. Now, I'm only able to use the "stupid" method of clicking on all the houses and typing in manually the transaction records in Excel and then chart the data. I wish to be able to extract the data to Excel automatically. I have studied the underlying jsp pages and it seems that the site uses http://proptx.midland.com.hk/unit/unit_tx.jsp to show the data. I have tried typing in http://proptx.midland.com.hk/unit/un..._id=U000146982 to see any information will come up, but there's nothing in the page. I'm totally stuck. As I want to monitor the trend of a number of developments, it will be very tedious to type up all the transactions in Excel. It seems the website has sort of exposed the data, but I just can't find a way to get the data out, at least one house at a time. I want to do webquery in Excel and then extract the data to a proper table. Hope some experts can point me in the right direction. Regards and thanks in advance, HC |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
New Q to old Post: Stuck at Trying to Extract Data from a Websiteusing JSP
Herbert
Take a look at innerHTML or outerHTML. Perhaps something like this. MsgBox doc.getelementsbytagname("HTML")(0).innerhtml By the way what is it you actually want to do in the other thread? If for example you want to get every SCRIPT element. For Each scr In doc.getelementsbytagname("SCRIPT") MsgBox scr.innerhtml Next scr |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
New Q to old Post: Stuck at Trying to Extract Data from a Website
This is what you asked for.
Set HTMLTAG = IE.document.getElementsByTagName("HTMP") RowCount = 1 for each itm in HTMLTag Sheets("Sheet2").Range("a" & rowcount).Value = itm.innertext RowCount = RowCount + 1 next itm This is what I think you really want. the is no HTML tag. there are four properties you can get. 1) innertext (in code below) 2) innerhtml 3) outertext 4) outerhtml If you take the entire string the excel sheet will get an error because the limits on the length of a text string. I only put the 1st 1024 character in the worksheet cell in the code below. RowCount = 1 with Sheets("Sheet2") for each itm in IE.document.all .Range("A" & rowcount).Value = itm.tagname .Range("B" & rowcount).Value = itm.classname .Range("C" & rowcount).Value = left(itm.innertext ,1024) RowCount = RowCount + 1 next itm end with "H Chan" wrote: Hope I am not irritating anyone, but I indeed want to know how to extract the information contained in the tagname "HTML" of an IE page to a cell. My detailed question and previous discussion is copied below: =================== I've run your macro, and it seems that the tagname "html" contains all the text on the page, including the popup box. I know nothing about how to control IE. How can I access the content of the tagname "html" on the page? I've tried: Sheets("Sheet2").Range("a1").Value = IE.document.HTML But the above gives me an error. Hope you are still following the thread. Thanks. Herbert "Joel" ... The data is in different tables. The pop up window is displaying specific data from specific tables. The dollar amount of the bids are shown in the code below. I included a debug tool that I use which is in the sub DUMP. I usually run this code when I working with a webpage. I also add watch variables when I writing my code. Select variable like TABLE and right click variable to add watch. I thedn single step through the code using F8. The table vairable will have 18 tables. The index to the table starts at zero so item 1 in the watch is table(0). You will see a property in the watch window for tables call sourceindex. The source index starts at zero and is the same data as the row number from dump (offset by 1). The data you need to get is the innertext property. Sub GetHouse() 'URL = "http://proptx.midland.com.hk/unit/index.jsp?est_id=E00005" URL = "http://proptx.midland.com.hk/unit/index.jsp" Unit = "?est_id=E00005&unit_id=U000146982" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True 'get web page IE.Navigate2 URL & Unit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop 'test code 'With Sheets("Sheet3") ' RowCount = 1 ' For Each itm In IE.document.all ' .Range("A" & RowCount) = itm.tagname ' .Range("B" & RowCount) = itm.ID ' .Range("C" & RowCount) = itm.classname ' .Range("D" & RowCount) = Left(itm.innertext, 1024) ' RowCount = RowCount + 1 ' Next itm 'End With Call dump(IE) Set Table = IE.document.getelementsbytagname("Table") a = 1 Set PopupWin = Table(2) b = 1 Set PopupWin = Table(3) b = 1 'code for extracting table RowCount = 1 For Each Row In PopupWin.Rows Colcount = 1 For Each cell In Row.Cells Cells(RowCount, Colcount) = cell.innertext Colcount = Colcount + 1 Next cell ' RowCount = RowCount + 1 Next Row End Sub Sub dump(IE) 'test code With Sheets("Sheet6") .Cells.ClearContents RowCount = 1 For Each itm In IE.document.all .Range("A" & RowCount) = itm.tagname .Range("B" & RowCount) = itm.ID .Range("C" & RowCount) = itm.classname .Range("D" & RowCount) = Left(itm.innertext, 1024) RowCount = RowCount + 1 Next itm End With End Sub "Herbert Chan" wrote: Thank you for your attempt indeed. For this particular website, there is actually no corresponding English version of the same data. The data provided under http://cybersearch.midland.com.hk/cybersearch/?lang=en is not the same as the one on http://proptx.midland.com.hk/unit/in...?est_id=E00005. ron has pointed out that I can actually use a url such as http://proptx.midland.com.hk/unit/in..._id=U000146982 to get to the pop up directly. That's a big piece of information for me. After opening the webpage with the pop up, is there a way to directly reference to the pop up box without having to copy and paste the whole page back to Excel? Is the pop up frame one of the tables in the page? Now that a way is found, I just wonder if there is a more elegant way to get to the data. HC "Joel" ... I'm having problems naviaging across pages. this code will work for 1 page. I found the english language page for this website. I found a method to get all the houses but only can span the 1st page. I used a 2nd IE to get the details. Found when the tagname was A and the innertext was "Details" I was able to get the URL of the details page. See if this helps. Sub GetHouses() Dim Districts() As Variant URL = "http://cybersearch.midland.com.hk/cybersearch/?lang=en" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True Set IE2 = CreateObject("InternetExplorer.Application") IE2.Visible = True 'get web page IE.Navigate2 URL Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop a = 1 Set navigate = IE.document.getelementbyid("estatebox") Pages = navigate.all.Length - 1 For pagecount = 1 To Pages If pagecount < 1 Then Set navigate = IE.document.getelementbyid("estatebox") Set Form = IE.document.getelementsbytagname("Form") 'I keep on getting the 1st page with this code. 'I can put the page number in the box but the submit gets 'back to page one. 'change to next page navigate.all(0).Value = pagecount navigate.all(pagecount).Click Form(1).submit Do While IE.readystate < 4 Or _ IE.Busy = True DoEvents Loop End If Set Body = IE.document.getelementbyid("csBody") 'get table Set Listings = Body.all(2) For Each itm In Listings.all If itm.tagname = "A" And _ itm.innertext = "Details" Then 'naviagate to detains in 2nd explorer URL2 = itm.href IE2.Navigate2 URL2 Do While IE2.readystate < 4 Or _ IE2.Busy = True DoEvents Loop End If Next itm Next pagecount End Sub "HC" wrote: Hello Joel, Thank you for your help. Maybe I need to elaborate what I want to achieve more. On the page: http://proptx.midland.com.hk/unit/in...?est_id=E00005, it shows there are Flat A to Flat H on each floor. The number is the size of that particular flat. When you click on, say "753'" in the cell denoting 25/F Flat A, an orange box pops up. Inside the box, the first row of purple text denotes the location of the flat, the second row of purple text states again the area of the flat, and the third row of purple text states that there were previously 3 transactions for this particular flat. First column is the date of transaction, second column is the sold price, and the third column is cost per square feet. I want to be able to progammatically extract these transaction records to Excel for charting. I am looking to achieve either of the following: 1. Find out the way to directly access the popup box. After studying the codes, I know that the ID of each flat (unit) is in the saved html file and that won't be too difficult to extract (I've done similar thing before, and I guess I will be able to figure that out). After I've got all the unit_id, I want to be able to repeatedly open the corresponding jsp page and extract the transaction records for the flats (units). 2. As you have enlightened me in your post, or maybe I can just start an instant of IE and then copy the data from IE to Excel. So do you know how I can access the value displayed in the popped up orange box? After some studying of the codes, it seems that the jsp page that shows the orange popped up box is http://proptx.midland.com.hk/unit/unit_tx.jsp, and it uses unit_id to denote each unit. However, I can't figure out further how to get to the data. Hope I've made myself clear. I'm totally stuck. Hope you are able to help. Thank you very much indeed. HC "Joel" ... Here is code to get you started. I don't know chinese and not sure what data you are looking for. I belive the ID=E0005 the part of the query that extracts a particular house. Not sure how to lookup the id's. You |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Stuck at Trying to Extract Data from a Website using JSP | Excel Programming | |||
How to Post data to website to VBA | Excel Programming | |||
How to extract one data from a website | Excel Discussion (Misc queries) | |||
code to extract data from a website runs of XP but not Vista | Excel Programming | |||
Excel Macro - Extract data from website | Excel Programming |