![]() |
Parse HMTL source to fill in cells?
I'm a newbie with VBA, but want to automate a procedure we are doing.
How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! |
Parse HMTL source to fill in cells?
See my posting
http://www.microsoft.com/office/comm...e-ea1664ee9d7d I wouldn't use class, instead use ID in your HTML. With ID is easy find find the items using the internet browser Set SearchResults = IE.document.getElementById("bl gb") Range("A1") = SearchResults.innertext "Jason" wrote: I'm a newbie with VBA, but want to automate a procedure we are doing. How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! |
Parse HMTL source to fill in cells?
On Wed, 17 Jun 2009 13:16:01 -0700, Jason
wrote: I'm a newbie with VBA, but want to automate a procedure we are doing. How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! In general, it is a simple problem if you define how you are identifying these three numbers, but you don't give sufficient information. I assume that the 85, 76 and 67 are merely examples of what could be in those locations, otherwise you could just write those number into the respective cells. A few of the possibilities from examining your data: Are the numbers identified by being the last three values in the source, and are they all integers? Are the values always on the 3rd, 6th and 9th line? Will the values always be integers; are could they be decimal values (or even fractions)? Or is there something else special about these particular numbers to identify them? For example, is what you really want in B1:B3 the numbers associated with the different td classes "bl gb", "gb", and "br gb" ??? --ron |
Parse HMTL source to fill in cells?
Jason,
Is your HTML source a document, or a webpage? There are all sorts of different ways to approach this - depending on what the source actually is. For Example, reading the document from a web page: Dim myStr As String Dim test As Variant Dim Val As Variant Dim IE As Object Dim i As Integer Dim j As Integer j = 1 Set IE = CreateObject("InternetExplorer.Application") IE.Navigate "http://somewebpage.shtml" myStr = IE.Document.body.innerText test = Split(myStr, "</td") For i = LBound(test) To UBound(test) If test(i) < "" Then Val = Split(test(i), "") Cells(j, 2).Value = CDbl(Replace(Replace(Val(UBound(Val)), " ", ""), "'", "")) j = j + 1 End If Next i HTH, Bernie MS Excel MVP "Jason" wrote in message ... I'm a newbie with VBA, but want to automate a procedure we are doing. How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! |
Parse HMTL source to fill in cells?
Sorry if I was too ambiguous with my original post...
This HTML source is from the following webpage on temperature data. http://www.wunderground.com/history/...tory.html?MR=1 In this case, it is for NYC on July 2, 2008 (from the ...NYC/2008/7/2/... in the address) 85 is the Max temperatue 76 is the mean 67 is the low temperature. Essentially we want to setup a process to 1. look up the date in column A 2. go the website and parse the data for that date 3. enter the max in column B, low in column C, mean in column D 4. then repeat this process for the next date in the next row Hopefully this helps. "Jason" wrote: I'm a newbie with VBA, but want to automate a procedure we are doing. How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! |
Parse HMTL source to fill in cells?
Jason
Something like this perhaps? Private Sub MaxMinMeanTemps() Dim IE As Object Dim doc As Object Dim tbls As Object Dim tbl As Object Dim tblData As Object Dim tblRow As Object Dim rwCells As Object Dim ws As Worksheet Dim rng As Range Dim strDate As String Dim strURL As String Dim I As Long Set ws = Worksheets(1) Set rng = ws.Range("A2") Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True While rng.Value < "" strDate = Year(rng.Value) & "/" & Month(rng.Value) & "/" & Day(rng.Value) strURL = "http://www.wunderground.com/history/airport/KNYC/" & strDate & "/DailyHistory.html" IE.Navigate strURL Do While IE.Busy: DoEvents: Loop Do While IE.ReadyState < 4: DoEvents: Loop Set doc = IE.Document Set tbls = doc.getElementsByTagName("TABLE") For Each tbl In tbls If tbl.className = "dataTable tm10" Then Set tblData = tbl Exit For End If Next For I = 2 To 4 Set tblRow = tblData.Rows(I) Set rwCells = tblRow.Cells rng.Offset(, I - 1) = rwCells(1).innertext Next I Set rng = rng.Offset(1) Wend IE.Quit: Set IE = Nothing End Sub "Jason" wrote in message ... Sorry if I was too ambiguous with my original post... This HTML source is from the following webpage on temperature data. http://www.wunderground.com/history/...tory.html?MR=1 In this case, it is for NYC on July 2, 2008 (from the ...NYC/2008/7/2/... in the address) 85 is the Max temperatue 76 is the mean 67 is the low temperature. Essentially we want to setup a process to 1. look up the date in column A 2. go the website and parse the data for that date 3. enter the max in column B, low in column C, mean in column D 4. then repeat this process for the next date in the next row Hopefully this helps. "Jason" wrote: I'm a newbie with VBA, but want to automate a procedure we are doing. How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! |
Parse HMTL source to fill in cells?
On Thu, 18 Jun 2009 06:52:00 -0700, Jason
wrote: Sorry if I was too ambiguous with my original post... This HTML source is from the following webpage on temperature data. http://www.wunderground.com/history/...tory.html?MR=1 In this case, it is for NYC on July 2, 2008 (from the ...NYC/2008/7/2/... in the address) 85 is the Max temperatue 76 is the mean 67 is the low temperature. Essentially we want to setup a process to 1. look up the date in column A 2. go the website and parse the data for that date 3. enter the max in column B, low in column C, mean in column D 4. then repeat this process for the next date in the next row Hopefully this helps. Much more clear description of your requirements. Perhaps the following Macro would help: To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor. Ensure your project is highlighted in the Project Explorer window. Then, from the top menu, select Insert/Module and paste the code below into the window that opens. To use this Macro (Sub), <alt-F8 opens the macro dialog box. Select the macro by name, and <RUN. As written, the entry cell is set to A2, and the results are in B2, C2 and D2. This can be easily changed. ========================================== Option Explicit Sub gettemps() Dim c As Range Dim myStr As String Dim test As Variant Dim Val As Variant Dim IE As Object Const sURL1 As String = "http://www.wunderground.com/history/airport/KNYC/" Dim sURLdate As String Const sURL2 As String = "/DailyHistory.html?MR=1" Set c = Range("A2") sURLdate = Format(c.Value2, "yyyy/m/d") Application.Cursor = xlWait Set IE = CreateObject("InternetExplorer.Application") IE.Navigate sURL1 & sURLdate & sURL2 While IE.ReadyState < 4 DoEvents Wend myStr = IE.Document.body.innerhtml c.Offset(0, 1).Value = RegexMid(myStr, "max") c.Offset(0, 2).Value = RegexMid(myStr, "min") c.Offset(0, 3).Value = RegexMid(myStr, "mean") Application.Cursor = xlDefault End Sub Private Function RegexMid(s As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Pattern = "\b" & sTempType & "\b\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If End Function ================================== --ron |
Parse HMTL source to fill in cells?
Norie,
Thanks, that did it. How can I get it to enter just the number, say "85" instead of "85 <degree signF"? I know that is simple, but I'm still learning. I changed Set rng = ws.Range("A2") to something similar to Joel's post RowCount = 1 Set rng = ws.Range("A" & RowCount) RowCount = RowCount + 1 so that I can enter in a whole list of dates and have it go through and do it for all the dates listed. Of course, now my mind is going... Any idea how I could have it look at the table, find the last date listed in column A, enter the next date below it, and fill in all the dates up to present day minus 1? Meaning from the last date until "yesterday". Of course "yesterday" would depend on when it is run. This way, we could run the script to fill in all our missing data "Norie" wrote: Jason Something like this perhaps? Private Sub MaxMinMeanTemps() Dim IE As Object Dim doc As Object Dim tbls As Object Dim tbl As Object Dim tblData As Object Dim tblRow As Object Dim rwCells As Object Dim ws As Worksheet Dim rng As Range Dim strDate As String Dim strURL As String Dim I As Long Set ws = Worksheets(1) Set rng = ws.Range("A2") Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True While rng.Value < "" strDate = Year(rng.Value) & "/" & Month(rng.Value) & "/" & Day(rng.Value) strURL = "http://www.wunderground.com/history/airport/KNYC/" & strDate & "/DailyHistory.html" IE.Navigate strURL Do While IE.Busy: DoEvents: Loop Do While IE.ReadyState < 4: DoEvents: Loop Set doc = IE.Document Set tbls = doc.getElementsByTagName("TABLE") For Each tbl In tbls If tbl.className = "dataTable tm10" Then Set tblData = tbl Exit For End If Next For I = 2 To 4 Set tblRow = tblData.Rows(I) Set rwCells = tblRow.Cells rng.Offset(, I - 1) = rwCells(1).innertext Next I Set rng = rng.Offset(1) Wend IE.Quit: Set IE = Nothing End Sub "Jason" wrote in message ... Sorry if I was too ambiguous with my original post... This HTML source is from the following webpage on temperature data. http://www.wunderground.com/history/...tory.html?MR=1 In this case, it is for NYC on July 2, 2008 (from the ...NYC/2008/7/2/... in the address) 85 is the Max temperatue 76 is the mean 67 is the low temperature. Essentially we want to setup a process to 1. look up the date in column A 2. go the website and parse the data for that date 3. enter the max in column B, low in column C, mean in column D 4. then repeat this process for the next date in the next row Hopefully this helps. "Jason" wrote: I'm a newbie with VBA, but want to automate a procedure we are doing. How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! |
Parse HMTL source to fill in cells?
On Thu, 18 Jun 2009 11:14:01 -0700, Jason
wrote: Ron, Thanks, I like it alot. If you look at my reply to Norie. I setup a way to repeat it for a list of dates, but could not do that with yours...I am sure I missing something besides just changing c=A2 to: RowCount = 1 Set c = Range("A" & RowCount) RowCount = RowCount + 1 Also, if you notice from my reply to Norie, I now want to add a way to fill in column A Thanks! Jason It's relatively easy, but perhaps not intuitive, to extend my macro to handle a range of cells: ================================= Option Explicit Sub gettemps() Dim c As Range, rng As Range Dim j As Long Dim myStr As String Dim test As Variant Dim Val As Variant Dim IE As Object Const sURL1 As String = "http://www.wunderground.com/history/airport/KNYC/" Dim sURLdate As String Const sURL2 As String = "/DailyHistory.html?MR=1" Application.Cursor = xlWait Set IE = CreateObject("InternetExplorer.Application") Set rng = Range("A2").CurrentRegion Set rng = rng.Resize(rng.Rows.Count, 1) For Each c In rng sURLdate = Format(c.Value2, "yyyy/m/d") IE.Navigate sURL1 & sURLdate & sURL2 While IE.ReadyState < 4 DoEvents Wend myStr = IE.Document.body.innerhtml c.Offset(0, 1).Value = RegexMid(myStr, "max") c.Offset(0, 2).Value = RegexMid(myStr, "min") c.Offset(0, 3).Value = RegexMid(myStr, "mean") j = j + 1 Next c Application.Cursor = xlDefault End Sub Private Function RegexMid(s As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Pattern = "\b" & sTempType & "\b\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If End Function ========================================= But this just reads in each page one at a time. Whether it is worthwhile to do the coding to try to download the custom range of dates from the site, and then parse out the appropriate data, depends on how often you'll be downloading multiple dates, and how many dates each time. --ron |
Parse HMTL source to fill in cells?
It works great!
However, once I saw it run, I tried to run it in an existing spreadsheet where the first cell would be A3838 instead of A2 and that did not go too well... Right now, I'm just trying to speed it up, which you are helping with immensely. I'm sure the next step would be to put all this into a database and have it run automatically daily. "Ron Rosenfeld" wrote: On Thu, 18 Jun 2009 11:14:01 -0700, Jason wrote: Ron, Thanks, I like it alot. If you look at my reply to Norie. I setup a way to repeat it for a list of dates, but could not do that with yours...I am sure I missing something besides just changing c=A2 to: RowCount = 1 Set c = Range("A" & RowCount) RowCount = RowCount + 1 Also, if you notice from my reply to Norie, I now want to add a way to fill in column A Thanks! Jason It's relatively easy, but perhaps not intuitive, to extend my macro to handle a range of cells: ================================= Option Explicit Sub gettemps() Dim c As Range, rng As Range Dim j As Long Dim myStr As String Dim test As Variant Dim Val As Variant Dim IE As Object Const sURL1 As String = "http://www.wunderground.com/history/airport/KNYC/" Dim sURLdate As String Const sURL2 As String = "/DailyHistory.html?MR=1" Application.Cursor = xlWait Set IE = CreateObject("InternetExplorer.Application") Set rng = Range("A2").CurrentRegion Set rng = rng.Resize(rng.Rows.Count, 1) For Each c In rng sURLdate = Format(c.Value2, "yyyy/m/d") IE.Navigate sURL1 & sURLdate & sURL2 While IE.ReadyState < 4 DoEvents Wend myStr = IE.Document.body.innerhtml c.Offset(0, 1).Value = RegexMid(myStr, "max") c.Offset(0, 2).Value = RegexMid(myStr, "min") c.Offset(0, 3).Value = RegexMid(myStr, "mean") j = j + 1 Next c Application.Cursor = xlDefault End Sub Private Function RegexMid(s As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Pattern = "\b" & sTempType & "\b\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If End Function ========================================= But this just reads in each page one at a time. Whether it is worthwhile to do the coding to try to download the custom range of dates from the site, and then parse out the appropriate data, depends on how often you'll be downloading multiple dates, and how many dates each time. --ron |
Parse HMTL source to fill in cells?
Jason
I'm not quite sure what you are asking and it seems actually be more to do with inserting dates into a worksheet. The code I posted assumes you have dates starting in row 2, column A. It goes down the column, gets the data, inserts it in the relevant columns (B,C and D) and then moves on. It stops when a blank cell is found. So in theory if you have all the dates you want, in whatever order it should work. If you need help filling in dates there are various ways to do that but they really depend on what you've currently got and what you want to end up with. Perhaps a topic for another thread. As for just getting the number, shouldn't be a problem - I'll see if I can do something. Would you like it in Fahrenheit or Celsius?:) PS If you just need the number for calculations then you could probably do that without code, a simple formula should suffice. "Jason" wrote in message ... Norie, Thanks, that did it. How can I get it to enter just the number, say "85" instead of "85 <degree signF"? I know that is simple, but I'm still learning. I changed Set rng = ws.Range("A2") to something similar to Joel's post RowCount = 1 Set rng = ws.Range("A" & RowCount) RowCount = RowCount + 1 so that I can enter in a whole list of dates and have it go through and do it for all the dates listed. Of course, now my mind is going... Any idea how I could have it look at the table, find the last date listed in column A, enter the next date below it, and fill in all the dates up to present day minus 1? Meaning from the last date until "yesterday". Of course "yesterday" would depend on when it is run. This way, we could run the script to fill in all our missing data "Norie" wrote: Jason Something like this perhaps? Private Sub MaxMinMeanTemps() Dim IE As Object Dim doc As Object Dim tbls As Object Dim tbl As Object Dim tblData As Object Dim tblRow As Object Dim rwCells As Object Dim ws As Worksheet Dim rng As Range Dim strDate As String Dim strURL As String Dim I As Long Set ws = Worksheets(1) Set rng = ws.Range("A2") Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True While rng.Value < "" strDate = Year(rng.Value) & "/" & Month(rng.Value) & "/" & Day(rng.Value) strURL = "http://www.wunderground.com/history/airport/KNYC/" & strDate & "/DailyHistory.html" IE.Navigate strURL Do While IE.Busy: DoEvents: Loop Do While IE.ReadyState < 4: DoEvents: Loop Set doc = IE.Document Set tbls = doc.getElementsByTagName("TABLE") For Each tbl In tbls If tbl.className = "dataTable tm10" Then Set tblData = tbl Exit For End If Next For I = 2 To 4 Set tblRow = tblData.Rows(I) Set rwCells = tblRow.Cells rng.Offset(, I - 1) = rwCells(1).innertext Next I Set rng = rng.Offset(1) Wend IE.Quit: Set IE = Nothing End Sub "Jason" wrote in message ... Sorry if I was too ambiguous with my original post... This HTML source is from the following webpage on temperature data. http://www.wunderground.com/history/...tory.html?MR=1 In this case, it is for NYC on July 2, 2008 (from the ...NYC/2008/7/2/... in the address) 85 is the Max temperatue 76 is the mean 67 is the low temperature. Essentially we want to setup a process to 1. look up the date in column A 2. go the website and parse the data for that date 3. enter the max in column B, low in column C, mean in column D 4. then repeat this process for the next date in the next row Hopefully this helps. "Jason" wrote: I'm a newbie with VBA, but want to automate a procedure we are doing. How would I write a function to look at the following HTML source: '<td<a href="/history/airport/KNYC/2008/7/2/DailyHistory.html"2</a</td ' <td class="bl gb" ' 85 '</td ' <td class="gb" ' 76 '</td ' <td class="br gb" ' 67 '</td And put the 85 into cell B1, 76 into B2, 67 into B3? Thanks! |
Parse HMTL source to fill in cells?
On Thu, 18 Jun 2009 13:17:01 -0700, Jason
wrote: It works great! Good to hear! Thanks for the feedback. However, once I saw it run, I tried to run it in an existing spreadsheet where the first cell would be A3838 instead of A2 and that did not go too well... You really need to be specific. And you will probably find that by doing so, and by clearly elaborating your problem, a solution will become apparent. I'm guessing that the "did not go too well" problem perhaps has to do with repetitive calls to the web page? If that is the case, then you probably are not properly setting rng. But with limited information, it's hard to advise further. Right now, I'm just trying to speed it up, which you are helping with immensely. I suppose by "it", you mean the process of entering these dates. I'm sure the next step would be to put all this into a database and have it run automatically daily. If you are going to run it every day, then calling the web page as I've done should be adequate. If you were going to run it weekly or monthly, then a call to the web page that downloads a customizable series of dates might be better. --ron |
Parse HMTL source to fill in cells?
Ron,
I changed Range("A2").CurrentRegion to Range("A3838").CurrentRegion and it started putting temperatures in B1, C1, D1, and then on down instead of B3838, C3838, D3838 and then down. My last request, is more of a wish, and not an urgent need, but I want to setup a sheet with a list of city airport codes, for example: Sheet name "Airports" In "Airports" Cells B1-B25 would contain a list of 25 airport codes in a specific order A modified version of your existing script would then -take the airport code in cell B1, likely NYC as we have already done -insert it into the website address for sURL1 http://www.wunderground.com/history/airport/k -run your script, inserting the temps in B*,C*,D* -then read the next airport from "Airports" B2, likely LAX -modify the sURL1 to -http://www.wunderground.com/history/airport/kLAX -insert the temperatures into E*,F*,G* Essentially column A is the date, then next 3 columns are the Max, Min, Mean for City 1, then the next 3 columns are the Max, Min, Mean for City 2 and so on... I think what we have now is already leaps and bounds ahead of the manual copying and pasting. So this will definitely work for us, if we need more I am sure I will post those needs under a new thread. However, I would like to know in your script how you got it to look at the website and pull the data...if you wouldn't mind adding a couple comments to your code that would great. This way, I can hopefully learn a little instead of just copying and pasting your work. Thank you immensely, Jason "Ron Rosenfeld" wrote: On Thu, 18 Jun 2009 13:17:01 -0700, Jason wrote: It works great! Good to hear! Thanks for the feedback. However, once I saw it run, I tried to run it in an existing spreadsheet where the first cell would be A3838 instead of A2 and that did not go too well... You really need to be specific. And you will probably find that by doing so, and by clearly elaborating your problem, a solution will become apparent. I'm guessing that the "did not go too well" problem perhaps has to do with repetitive calls to the web page? If that is the case, then you probably are not properly setting rng. But with limited information, it's hard to advise further. Right now, I'm just trying to speed it up, which you are helping with immensely. I suppose by "it", you mean the process of entering these dates. I'm sure the next step would be to put all this into a database and have it run automatically daily. If you are going to run it every day, then calling the web page as I've done should be adequate. If you were going to run it weekly or monthly, then a call to the web page that downloads a customizable series of dates might be better. --ron |
Parse HMTL source to fill in cells?
On Fri, 19 Jun 2009 07:14:02 -0700, Jason
wrote: Ron, I changed Range("A2").CurrentRegion to Range("A3838").CurrentRegion and it started putting temperatures in B1, C1, D1, and then on down instead of B3838, C3838, D3838 and then down. And what, exactly, is the range that results from that command? Take a look at HELP for the CurrentRegion property and you may be able to figure out for yourself what's going on and how to fix it. Or else you'll have to provide more details of your setup. My last request, is more of a wish, and not an urgent need, but I want to setup a sheet with a list of city airport codes, for example: Sheet name "Airports" In "Airports" Cells B1-B25 would contain a list of 25 airport codes in a specific order A modified version of your existing script would then -take the airport code in cell B1, likely NYC as we have already done -insert it into the website address for sURL1 http://www.wunderground.com/history/airport/k -run your script, inserting the temps in B*,C*,D* -then read the next airport from "Airports" B2, likely LAX -modify the sURL1 to -http://www.wunderground.com/history/airport/kLAX -insert the temperatures into E*,F*,G* Essentially column A is the date, then next 3 columns are the Max, Min, Mean for City 1, then the next 3 columns are the Max, Min, Mean for City 2 and so on... That should be easy to do. But perhaps that is something you'd rather work on? I know I learn a lot by doing... I think what we have now is already leaps and bounds ahead of the manual copying and pasting. So this will definitely work for us, if we need more I am sure I will post those needs under a new thread. However, I would like to know in your script how you got it to look at the website and pull the data...if you wouldn't mind adding a couple comments to your code that would great. This way, I can hopefully learn a little instead of just copying and pasting your work. That is a matter of knowing a little (unfortunately too little) about the InternetExplorer object (see http://msdn.microsoft.com/en-us/libr...84(VS.85).aspx Constructing the proper URL to download the web page This was done by examining the URL when I had navigated to the web page in question. Examining the HTML source to try to figure out an unambiguous way of defining the data I wanted to obtain, using the VBScript Regular Expression engine. The URL parts all have "URL" as part of their name. The Regular Expression pattern parts all have "Pat" in their name. Hope this helps. Thank you immensely, Jason "Ron Rosenfeld" wrote: On Thu, 18 Jun 2009 13:17:01 -0700, Jason wrote: It works great! Good to hear! Thanks for the feedback. However, once I saw it run, I tried to run it in an existing spreadsheet where the first cell would be A3838 instead of A2 and that did not go too well... You really need to be specific. And you will probably find that by doing so, and by clearly elaborating your problem, a solution will become apparent. I'm guessing that the "did not go too well" problem perhaps has to do with repetitive calls to the web page? If that is the case, then you probably are not properly setting rng. But with limited information, it's hard to advise further. Right now, I'm just trying to speed it up, which you are helping with immensely. I suppose by "it", you mean the process of entering these dates. I'm sure the next step would be to put all this into a database and have it run automatically daily. If you are going to run it every day, then calling the web page as I've done should be adequate. If you were going to run it weekly or monthly, then a call to the web page that downloads a customizable series of dates might be better. --ron --ron |
Parse HMTL source to fill in cells?
You want to extract some numbers from a web page. The web page is http://www.wunderground.com/history/...DailyHistory.h tml?MR=1 In this case, it is for NYC on July 2, 2008 (from the ...NYC/2008/7/2/... in the address) You want to extract: 85 is the Max temperatue 76 is the mean 67 is the low temperature. Here is a quick script in biterscripting. # Script WeatherHistory.txt var str html ; cat "http://www.wunderground.com/history/airport/KNYC/2008/7/2/DailyHistory. html?MR=1" $html while ( { sen -c "^<span class=\"b\"^" $html } 0 ) do stex -c "^<span class=\"b\"^]" $html null stex -c "]^</span^" $html done Script is in biterscripting. I tested this script. It works. (The markers are <span class="b"...</span.) To try, 1. Save the above script as C:\WeatherHistory.txt. 2. Download biterscripting from http://www.biterscripting.com. 3. Start biterscripting. Enter the following command. script "C:\WeatherHistory.txt" You will get the numbers you want. I am seeing the following. 76 75 85 .. .. .. Email me if you have questions. Patrick *** Sent via Developersdex http://www.developersdex.com *** |
Parse HMTL source to fill in cells?
Ron, Just wanted to say thanks for your help. I had to leave town for a few days, but will work on this and hopefully get the results I need. Thanks! "Ron Rosenfeld" wrote: On Fri, 19 Jun 2009 07:14:02 -0700, Jason wrote: Ron, I changed Range("A2").CurrentRegion to Range("A3838").CurrentRegion and it started putting temperatures in B1, C1, D1, and then on down instead of B3838, C3838, D3838 and then down. And what, exactly, is the range that results from that command? Take a look at HELP for the CurrentRegion property and you may be able to figure out for yourself what's going on and how to fix it. Or else you'll have to provide more details of your setup. My last request, is more of a wish, and not an urgent need, but I want to setup a sheet with a list of city airport codes, for example: Sheet name "Airports" In "Airports" Cells B1-B25 would contain a list of 25 airport codes in a specific order A modified version of your existing script would then -take the airport code in cell B1, likely NYC as we have already done -insert it into the website address for sURL1 http://www.wunderground.com/history/airport/k -run your script, inserting the temps in B*,C*,D* -then read the next airport from "Airports" B2, likely LAX -modify the sURL1 to -http://www.wunderground.com/history/airport/kLAX -insert the temperatures into E*,F*,G* Essentially column A is the date, then next 3 columns are the Max, Min, Mean for City 1, then the next 3 columns are the Max, Min, Mean for City 2 and so on... That should be easy to do. But perhaps that is something you'd rather work on? I know I learn a lot by doing... I think what we have now is already leaps and bounds ahead of the manual copying and pasting. So this will definitely work for us, if we need more I am sure I will post those needs under a new thread. However, I would like to know in your script how you got it to look at the website and pull the data...if you wouldn't mind adding a couple comments to your code that would great. This way, I can hopefully learn a little instead of just copying and pasting your work. That is a matter of knowing a little (unfortunately too little) about the InternetExplorer object (see http://msdn.microsoft.com/en-us/libr...84(VS.85).aspx Constructing the proper URL to download the web page This was done by examining the URL when I had navigated to the web page in question. Examining the HTML source to try to figure out an unambiguous way of defining the data I wanted to obtain, using the VBScript Regular Expression engine. The URL parts all have "URL" as part of their name. The Regular Expression pattern parts all have "Pat" in their name. Hope this helps. Thank you immensely, Jason "Ron Rosenfeld" wrote: On Thu, 18 Jun 2009 13:17:01 -0700, Jason wrote: It works great! Good to hear! Thanks for the feedback. However, once I saw it run, I tried to run it in an existing spreadsheet where the first cell would be A3838 instead of A2 and that did not go too well... You really need to be specific. And you will probably find that by doing so, and by clearly elaborating your problem, a solution will become apparent. I'm guessing that the "did not go too well" problem perhaps has to do with repetitive calls to the web page? If that is the case, then you probably are not properly setting rng. But with limited information, it's hard to advise further. Right now, I'm just trying to speed it up, which you are helping with immensely. I suppose by "it", you mean the process of entering these dates. I'm sure the next step would be to put all this into a database and have it run automatically daily. If you are going to run it every day, then calling the web page as I've done should be adequate. If you were going to run it weekly or monthly, then a call to the web page that downloads a customizable series of dates might be better. --ron --ron |
Parse HMTL source to fill in cells?
On Thu, 25 Jun 2009 10:43:02 -0700, Jason
wrote: Ron, Just wanted to say thanks for your help. I had to leave town for a few days, but will work on this and hopefully get the results I need. Thanks! You're very welcome. Thanks for the feedback. Let me know what you need. --ron ----------------------------------------------------------------------------- Our Peering Groups change Visit : http://spacesst.com/peerin |
Parse HMTL source to fill in cells?
Ron,
Okay, I'm back... I was able to actually take your code and reference the sheet with the 25 airport codes and add the data for each code in 3 columns. Of course, then I got greedy and thought I wanted something faster. So I tried taking your code and getting it to work for the Custom Date Range so that data could be entered in much quicker. I'm still learning, so I just want to have it work for 1 city and a hardset date range. I think I can get it to repeat for the different airports later. For this example, I am trying to run for Houston from 9/1/2008 to 6/28/2009. I have setup a series of dates in column A of my Excel spreadsheet, beginning with 9/1/2008 in A1 and daily increments down. I have claimed a small victory in that when run, it does go the site and grab the correct max, min and mean from the list. The problem is that it enters the max, min, and mean for 9/1/2008 as the temperatures for every day following. Hopefully it is something simple, that will allow it to move down to the next date and grab that temperature. Hopefully you can understand my explanation, if not, you can simply enter a list of dates from 9/1/2008 to 9/10/2008 in A1:A10 and run the code below and see the repeat. As always thanks! Option Explicit Sub getCityTemps() Dim AirCode As Range, ACrng As Range Dim c As Range, rng As Range Dim j As Long Dim sURLairport As String Dim myStr As String Dim test As Variant Dim Val As Variant Dim IE As Object 'Set ACrng = Sheets("City_Airport").Range("B2:B26") 'For Each AirCode In ACrng ' sURLairport = AirCode Const sURL1 As String = "http://www.wunderground.com/history/airport/KHOU/2008/9/1/CustomHistory.html?dayend=28&monthend=6&yearend=20 09&req_city=NA&req_state=NA&req_statename=NA" 'Const sURL2 As String = "/" Dim sURLdate As String 'Const sURL3 As String = "/DailyHistory.html?MR=1" Application.Cursor = xlWait Set IE = CreateObject("InternetExplorer.Application") IE.Navigate sURL1 While IE.ReadyState < 4 DoEvents Wend myStr = IE.Document.body.innerhtml Set rng = Range("A2").CurrentRegion Set rng = rng.Resize(rng.Rows.Count, 1) For Each c In rng sURLdate = Format(c.Value2, "yyyy/m/d") c.Offset(0, 1).Value = RegexMid(myStr, "bl gb") c.Offset(0, 2).Value = RegexMid(myStr, "br gb") c.Offset(0, 3).Value = RegexMid(myStr, "class=gb") j = j + 1 Next c 'i = i + 3 'Next AirCode Application.Cursor = xlDefault End Sub Private Function RegexMid(s As String, sTempType As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Pattern = "\b" & sTempType & "\b\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If End Function "Ron Rosenfeld" wrote: On Thu, 25 Jun 2009 10:43:02 -0700, Jason wrote: Ron, Just wanted to say thanks for your help. I had to leave town for a few days, but will work on this and hopefully get the results I need. Thanks! You're very welcome. Thanks for the feedback. Let me know what you need. --ron ----------------------------------------------------------------------------- Less Spam Better enjoyable experience Visit : news://spacesst.com |
Parse HMTL source to fill in cells?
On Mon, 29 Jun 2009 15:17:03 -0700, Jason
wrote: Of course, then I got greedy and thought I wanted something faster. So I tried taking your code and getting it to work for the Custom Date Range so that data could be entered in much quicker. I looked at that a bit when I was coming up with my original recommendations. But I did not pursue the Custom Date Range approach. I don't know when I'll have a chance to look at that in depth, though. This may be a busy week for me, but if I have time ... If I recall correctly, one problem is that the identifiers you are using: "bl gb" "br gb" "class=gb" may not uniquely identify that which you are looking for. --ron ----------------------------------------------------------------------------- Less Spam Better enjoyable experience Visit : news://spacesst.com |
Parse HMTL source to fill in cells?
Ron, No worries, thanks! Yeah, you are right about the un-unique identifiers. I am trying to come up with a way to iterate the code below for each date then look for those identifiers. I will take a look at the original link you sent a while back on the IE Object. <td<a href="/history/airport/KHOU/2008/9/1/DailyHistory.html"1</a</td <td class="bl gb" 91 </td <td class="gb" 84 </td <td class="br gb" 76 </td "Ron Rosenfeld" wrote: On Mon, 29 Jun 2009 15:17:03 -0700, Jason wrote: Of course, then I got greedy and thought I wanted something faster. So I tried taking your code and getting it to work for the Custom Date Range so that data could be entered in much quicker. I looked at that a bit when I was coming up with my original recommendations. But I did not pursue the Custom Date Range approach. I don't know when I'll have a chance to look at that in depth, though. This may be a busy week for me, but if I have time ... If I recall correctly, one problem is that the identifiers you are using: "bl gb" "br gb" "class=gb" may not uniquely identify that which you are looking for. --ron |
Parse HMTL source to fill in cells?
On Tue, 30 Jun 2009 09:19:01 -0700, Jason wrote: Ron, No worries, thanks! Yeah, you are right about the un-unique identifiers. I am trying to come up with a way to iterate the code below for each date then look for those identifiers. I will take a look at the original link you sent a while back on the IE Object. <td<a href="/history/airport/KHOU/2008/9/1/DailyHistory.html"1</a</td <td class="bl gb" 91 </td <td class="gb" 84 </td <td class="br gb" 76 </td At least in a single test, I believe the following modifications will uniquely identify the required segments. I suspect there are easier ways to do this, but ... Note that I changed the pattern; I also changed the RegexMid function and added some "clean-up" to both the main Sub and the private Sub. (If you don't explicitly quit IE, you wind up with multiple IE processes running; and it will eventually crash. In some other program, the limit was about sixteen). Anyway try this: ==================================== Option Explicit Sub getCityTemps() Dim AirCode As Range, ACrng As Range Dim c As Range, rng As Range Dim j As Long Dim sURLairport As String Dim myStr As String Dim test As Variant Dim Val As Variant Dim IE As Object 'Set ACrng = Sheets("City_Airport").Range("B2:B26") 'For Each AirCode In ACrng ' sURLairport = AirCode Const sURL1 As String = _ "http://www.wunderground.com/history/airport/KHOU/2008/9/1/CustomHistory.html?dayend=28&monthend=6&yearend=20 09&req_city=NA&req_state=NA&req_statename=NA" 'Const sURL2 As String = "/" Dim sURLdate As String 'Const sURL3 As String = "/DailyHistory.html?MR=1" Application.Cursor = xlWait Set IE = CreateObject("InternetExplorer.Application") IE.Navigate sURL1 While IE.ReadyState < 4 DoEvents Wend myStr = IE.Document.body.innerhtml Set rng = Range("A2").CurrentRegion Set rng = rng.Resize(rng.Rows.Count, 1) For Each c In rng sURLdate = Format(c.Value2, "yyyy/m/d") c.Offset(0, 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, 3).Value = RegexMid(myStr, sURLdate, "class=gb") Next c 'i = i + 3 'Next AirCode IE.Quit Set IE = Nothing Application.Cursor = xlDefault End Sub Private Function RegexMid(s As String, sDate As String, sTempType As String) _ As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType _ & "\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If Set re = Nothing End Function ==================================== --ron |
Parse HMTL source to fill in cells?
Ron, Worked like a champ! I was able to edit to include referencing the airport codes and get it to run for all the cities needed. I am still a bit confused on why I cannot get it to run on a worksheet that say, already has a couple hundred days of information listed. It will work when I want it to start with Row 1 and then go down, but if I wanted to start at Row 250 and go down it will not work. I have tried adjust rng and researching CurrentRegion, but keep getting stuck. Don't worry too much about this since it is not a big issue, just more of curiosity on my part. The existing code works great, since we can just run it in a blank sheet and then copy and and paste it into our existing data. THANKS! "Ron Rosenfeld" wrote: On Tue, 30 Jun 2009 09:19:01 -0700, Jason wrote: Ron, No worries, thanks! Yeah, you are right about the un-unique identifiers. I am trying to come up with a way to iterate the code below for each date then look for those identifiers. I will take a look at the original link you sent a while back on the IE Object. <td<a href="/history/airport/KHOU/2008/9/1/DailyHistory.html"1</a</td <td class="bl gb" 91 </td <td class="gb" 84 </td <td class="br gb" 76 </td At least in a single test, I believe the following modifications will uniquely identify the required segments. I suspect there are easier ways to do this, but ... Note that I changed the pattern; I also changed the RegexMid function and added some "clean-up" to both the main Sub and the private Sub. (If you don't explicitly quit IE, you wind up with multiple IE processes running; and it will eventually crash. In some other program, the limit was about sixteen). Anyway try this: ==================================== Option Explicit Sub getCityTemps() Dim AirCode As Range, ACrng As Range Dim c As Range, rng As Range Dim j As Long Dim sURLairport As String Dim myStr As String Dim test As Variant Dim Val As Variant Dim IE As Object 'Set ACrng = Sheets("City_Airport").Range("B2:B26") 'For Each AirCode In ACrng ' sURLairport = AirCode Const sURL1 As String = _ "http://www.wunderground.com/history/airport/KHOU/2008/9/1/CustomHistory.html?dayend=28&monthend=6&yearend=20 09&req_city=NA&req_state=NA&req_statename=NA" 'Const sURL2 As String = "/" Dim sURLdate As String 'Const sURL3 As String = "/DailyHistory.html?MR=1" Application.Cursor = xlWait Set IE = CreateObject("InternetExplorer.Application") IE.Navigate sURL1 While IE.ReadyState < 4 DoEvents Wend myStr = IE.Document.body.innerhtml Set rng = Range("A2").CurrentRegion Set rng = rng.Resize(rng.Rows.Count, 1) For Each c In rng sURLdate = Format(c.Value2, "yyyy/m/d") c.Offset(0, 1).Value = RegexMid(myStr, sURLdate, "bl gb") c.Offset(0, 2).Value = RegexMid(myStr, sURLdate, "br gb") c.Offset(0, 3).Value = RegexMid(myStr, sURLdate, "class=gb") Next c 'i = i + 3 'Next AirCode IE.Quit Set IE = Nothing Application.Cursor = xlDefault End Sub Private Function RegexMid(s As String, sDate As String, sTempType As String) _ As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.MultiLine = True re.Global = True re.Pattern = "\b" & sDate & "/DailyHistory[\s\S]+?" & sTempType _ & "\D+(\d+)" If re.test(s) = True Then Set mc = re.Execute(s) RegexMid = mc(0).submatches(0) End If Set re = Nothing End Function ==================================== --ron |
Parse HMTL source to fill in cells?
On Tue, 30 Jun 2009 12:27:00 -0700, Jason wrote: Ron, Worked like a champ! I was able to edit to include referencing the airport codes and get it to run for all the cities needed. I am still a bit confused on why I cannot get it to run on a worksheet that say, already has a couple hundred days of information listed. It will work when I want it to start with Row 1 and then go down, but if I wanted to start at Row 250 and go down it will not work. I have tried adjust rng and researching CurrentRegion, but keep getting stuck. Don't worry too much about this since it is not a big issue, just more of curiosity on my part. The existing code works great, since we can just run it in a blank sheet and then copy and and paste it into our existing data. THANKS! You're very welcome. Glad to help. Thanks for the feedback. Hopefully, the HTML code won't change in such a way as to break your routine. I suspect your problem is here, or whatever is the equivalent code for your real data. ------------------- Set rng = Range("A2").CurrentRegion Set rng = rng.Resize(rng.Rows.Count, 1) For Each c In rng --------------------- Something like the following would look for the first row that is not completely filled in, and set rg = to start at that row, and continue down as far as you have entered dates. For example, if you have dates entered A2:A30; and data in B2:D11, then rg would be set to A12:A30. This routine assumes that there is nothing in Column E or below row 30 that might extend CurrentRegion. You'd have to set this up for your own data range. You also need to make sure that the initial setting of rg is only where your data might be. But there are a variety of ways to show that. ============================ Option Explicit Sub SetRange() Dim rg As Range, c As Range Dim StartRow As Long Set rg = Range("A2").CurrentRegion 'look for last filled in row Set c = rg.SpecialCells(xlCellTypeBlanks).Areas(1) Set c = c.Resize(1, 1) StartRow = c.Row - rg.Row Set rg = rg.Offset(rowoffset:=StartRow).Resize( _ rowsize:=rg.Rows.Count - StartRow, columnsize:=1) Debug.Print rg.Address End Sub =============================== --ron |
Parse HMTL source to fill in cells?
I guess this wear my newness comes in. Am I supposed to run this sub, then run the original code or add this in? I've tried running this new sub then running the original code, but it seems to clear out the first 3 columns. Again, I know we are close and this is not major, but it is more of a learning thing now. It's not urgent, but if you would prefer me to just email you the spreadsheet, I can do that as well if that would be quicker. |
Parse HMTL source to fill in cells?
On Tue, 30 Jun 2009 20:30:01 -0700, Jason wrote: I guess this wear my newness comes in. Am I supposed to run this sub, then run the original code or add this in? I've tried running this new sub then running the original code, but it seems to clear out the first 3 columns. Again, I know we are close and this is not major, but it is more of a learning thing now. It's not urgent, but if you would prefer me to just email you the spreadsheet, I can do that as well if that would be quicker. It's just an example of code you can use to set rg to the range you want to process. You would include it in your own code, modified to appropriately select your range. You will need to study and understand what it is doing, though, in order to modify it. --ron |
Parse HMTL source to fill in cells?
Okay, I'll tinker with the range and hopefully get it to work. I believe the problem lies in some blank cells above the data where the header and title information are located. I've already managed to use the code to get ideas for other tasks and had one final question in regards to Reg Exp syntax. I modified the Pattern you included in the original code to grab a number from a different site. re.Pattern = "\b" & "nbsp;" & sYear & "[\s\S]+?" & sMonthTotal & "\D+(\d+)" Where the sMonthTotal is a monthly total number However the number includes commas and so the modified pattern only grabs the numbers before the first comma. Example: -The number is 13,456,876 then the next line is 457,754 -The code grabs 13 and 457 for the next one Thanks again! I am working with it and have no doubt I will benefit greatly from your expertise. |
Parse HMTL source to fill in cells?
Ron, Okay I got the range to accurately work. The problem was the first 2 rows had merged cells in them. If I delete the merged cells it works fine. Which I can make work. Is there a way to get the search for the first blank row to ignore merged cells or not start the search until Row 4? I tried changing the Range"A2" to Range "A4", Range "A10" etc, but no luck. Thanks! Jason |
Parse HMTL source to fill in cells?
On Wed, 1 Jul 2009 11:41:01 -0700, Jason wrote: Ron, Okay I got the range to accurately work. The problem was the first 2 rows had merged cells in them. If I delete the merged cells it works fine. Which I can make work. Is there a way to get the search for the first blank row to ignore merged cells or not start the search until Row 4? I tried changing the Range"A2" to Range "A4", Range "A10" etc, but no luck. Thanks! Jason I'll look into that. Merged cells, in general are a PITA. --ron |
Parse HMTL source to fill in cells?
On Wed, 1 Jul 2009 07:58:14 -0700, Jason wrote: Okay, I'll tinker with the range and hopefully get it to work. I believe the problem lies in some blank cells above the data where the header and title information are located. I've already managed to use the code to get ideas for other tasks and had one final question in regards to Reg Exp syntax. I modified the Pattern you included in the original code to grab a number from a different site. re.Pattern = "\b" & "nbsp;" & sYear & "[\s\S]+?" & sMonthTotal & "\D+(\d+)" Where the sMonthTotal is a monthly total number However the number includes commas and so the modified pattern only grabs the numbers before the first comma. Example: -The number is 13,456,876 then the next line is 457,754 -The code grabs 13 and 457 for the next one Thanks again! I am working with it and have no doubt I will benefit greatly from your expertise. I have not tested this, but you might try using "[\d,]+" in place of the "\d". It is not as robust as a regex that looks for proper comma placement, but after an adult beverage, it might be good enough. --ron |
Parse HMTL source to fill in cells?
On Wed, 1 Jul 2009 11:41:01 -0700, Jason wrote: Ron, Okay I got the range to accurately work. The problem was the first 2 rows had merged cells in them. If I delete the merged cells it works fine. Which I can make work. Is there a way to get the search for the first blank row to ignore merged cells or not start the search until Row 4? I tried changing the Range"A2" to Range "A4", Range "A10" etc, but no luck. Thanks! Jason Here's one way to start at Row 4, even if there is confusing stuff in rows 1-3: ========================== Sub SetRange() Dim rg As Range, c As Range Const FirstValidRow As Long = 4 Dim StartRow As Long Set rg = Range("A2").CurrentRegion Set rg = rg.Resize(rowsize:=rg.Rows.Count - _ FirstValidRow + rg.Row) Set rg = rg.Offset(rowoffset:=FirstValidRow - rg.Row) 'look for last filled in row Set c = rg.SpecialCells(xlCellTypeBlanks).Areas(1) Set c = c.Resize(1, 1) StartRow = c.Row - rg.Row Set rg = rg.Offset(rowoffset:=StartRow).Resize( _ rowsize:=rg.Rows.Count - StartRow, columnsize:=1) Debug.Print rg.Address End Sub ============================== So far as ignoring merged cells, and whether that would be effective, it could be done. How to do it would depend on your precise layout. But there are VBA properties and methods available to deal with merged cells. --ron |
All times are GMT +1. The time now is 11:22 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com