![]() |
How to scrape web page data in VBA
I need EXCEL to open a web address and parse out a few data items from that
web location. What EXCEL macro function(s) can be used to open a web URL and parse the page source? The web pages in question are not set up to allow the external date functions to find the needed data. |
How to scrape web page data in VBA
Use joels dump sub to ascertain the tag name or ID for the data you want:
with worksheets ("sheet1") RowCount = 1 for each itm in IE.Document.all Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.id 'comment out if errors Range("D" & RowCount) = itm.name 'comment out if errors Range("E" & RowCount) = left(itm.innertext,1024) RowCount = Rowcount + 1 next itm end with once you have this information you can reference the data in your sub look at the following piece of zip code sub as an example Set Form = IE.document.getElementsByTagname("Form") Set zip5 = IE.document.getElementById("zip5") zip5.Value = ZIPCODE Set ZipCodebutton = Form(0).onsubmit Form(0).submit Do While IE.busy = True DoEvents Loop Set Table = IE.document.getElementsByTagname("Table") Location = Table(0).Rows(2).innertext IE.Quit MsgBox ("Zip code = " & ZIPCODE & " City/State = " & Location) good luck "Mike" wrote: I need EXCEL to open a web address and parse out a few data items from that web location. What EXCEL macro function(s) can be used to open a web URL and parse the page source? The web pages in question are not set up to allow the external date functions to find the needed data. |
How to scrape web page data in VBA
Here is one I used to parse out some data. I never got it working as well as
I wanted; if you are working with simple pages it shouldn't be too hard, but I was trying to parse a page that the designer specifically didn't want parsed, and I was having trouble getting Excel to grab 100% of the page content for parsing. The code below grabbed the active page and used Regex to parse it. If you use Regex you will need to set a reference to it in the VBE. HTH, Keith Sub GetTheData() Dim strHTML As String Dim strPartURL As String Dim strPageHTML As String strPartURL = "providername" Set objShell = CreateObject("Shell.Application") Set objShellWindows = objShell.Windows If objShellWindows.Count = 0 Then Exit Sub End If For i = 0 To objShellWindows.Count - 1 Set objIE = objShellWindows.Item(i) If InStr(objIE.LocationURL, strPartURL) Then 'Creates a TextRange object for the element. Set rng = objIE.document.body.createTextRange ' Set rng2 = objIE.document.body.parentelement ' Str2 = rng2.Text strPageHTML = rng.Text End If Next i Dim re As RegExp Dim s As String Dim ObjID As String Dim matches As MatchCollection Dim mcmatch As Match s1 = strPageHTML 's = Replace(s, Chr(10), Chr(32)) 's = Replace(s, Chr(13), Chr(32)) 's = Replace(s, "Here to View More Results", "CustomDeletedString") 's = Replace(s, " View ", " ViewView") Debug.Print s1 'grab the long string to find LSVs Set re = New RegExp re.IgnoreCase = True re.Global = True re.MultiLine = True re.Pattern = "Family[\s\S]*?View" Set s2 = re.Execute(s) Debug.Print s2 Set re2 = New RegExp re.IgnoreCase = True re.Global = True re.MultiLine = True re.Pattern = "Family|View[\s\S]*?View" '(gives full string ) "Family|View[\s\S]*?View Set matches = re.Execute(s) For Each mcmatch In matches tempstr = mcmatch Trim (tempstr) tempstr = Left(tempstr, Len(tempstr) - 4) tempstr = Trim(Right(tempstr, Len(tempstr) - 4)) MsgBox (tempstr) 'MsgBox Asc(Mid(tempstr, 2, 1)) Next Set objShellWindows = Nothing Set objShell = Nothing End Sub "Mike" wrote: I need EXCEL to open a web address and parse out a few data items from that web location. What EXCEL macro function(s) can be used to open a web URL and parse the page source? The web pages in question are not set up to allow the external date functions to find the needed data. |
How to scrape web page data in VBA
On Mar 31, 1:47*pm, Mike wrote:
I need EXCEL to open a web address and *parse out a few data items from that web location. *What EXCEL macro function(s) can be used to open a web URL and parse the page source? The web pages in question are not set up to allow the external date functions to find the needed data. Here's another method. It doesn't involve opening and closing IE so it runs faster. All of the web page information can be extracted from the source code (using instr, mid, etc) which is contained in my_var...Ron Sub Test() my_url = "http://www.google.com" Set my_obj = CreateObject("MSXML2.XMLHTTP") my_obj.Open "GET", my_url, False my_obj.send my_var = my_obj.responsetext Set my_obj = Nothing End Sub |
How to scrape web page data in VBA
Ron's code is much better than mine if you have a static web address that you
can poll directly. Mine was more complicated because I had to follow a complicated login and navigation process to reach the page I needed to parse. If you have the option, use the simpler code :) "ker_01" wrote: Here is one I used to parse out some data. I never got it working as well as I wanted; if you are working with simple pages it shouldn't be too hard, but I was trying to parse a page that the designer specifically didn't want parsed, and I was having trouble getting Excel to grab 100% of the page content for parsing. The code below grabbed the active page and used Regex to parse it. If you use Regex you will need to set a reference to it in the VBE. HTH, Keith Sub GetTheData() Dim strHTML As String Dim strPartURL As String Dim strPageHTML As String strPartURL = "providername" Set objShell = CreateObject("Shell.Application") Set objShellWindows = objShell.Windows If objShellWindows.Count = 0 Then Exit Sub End If For i = 0 To objShellWindows.Count - 1 Set objIE = objShellWindows.Item(i) If InStr(objIE.LocationURL, strPartURL) Then 'Creates a TextRange object for the element. Set rng = objIE.document.body.createTextRange ' Set rng2 = objIE.document.body.parentelement ' Str2 = rng2.Text strPageHTML = rng.Text End If Next i Dim re As RegExp Dim s As String Dim ObjID As String Dim matches As MatchCollection Dim mcmatch As Match s1 = strPageHTML 's = Replace(s, Chr(10), Chr(32)) 's = Replace(s, Chr(13), Chr(32)) 's = Replace(s, "Here to View More Results", "CustomDeletedString") 's = Replace(s, " View ", " ViewView") Debug.Print s1 'grab the long string to find LSVs Set re = New RegExp re.IgnoreCase = True re.Global = True re.MultiLine = True re.Pattern = "Family[\s\S]*?View" Set s2 = re.Execute(s) Debug.Print s2 Set re2 = New RegExp re.IgnoreCase = True re.Global = True re.MultiLine = True re.Pattern = "Family|View[\s\S]*?View" '(gives full string ) "Family|View[\s\S]*?View Set matches = re.Execute(s) For Each mcmatch In matches tempstr = mcmatch Trim (tempstr) tempstr = Left(tempstr, Len(tempstr) - 4) tempstr = Trim(Right(tempstr, Len(tempstr) - 4)) MsgBox (tempstr) 'MsgBox Asc(Mid(tempstr, 2, 1)) Next Set objShellWindows = Nothing Set objShell = Nothing End Sub "Mike" wrote: I need EXCEL to open a web address and parse out a few data items from that web location. What EXCEL macro function(s) can be used to open a web URL and parse the page source? The web pages in question are not set up to allow the external date functions to find the needed data. |
All times are GMT +1. The time now is 09:47 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com