How to scrape web page data in VBA
Ron's code is much better than mine if you have a static web address that you
can poll directly. Mine was more complicated because I had to follow a
complicated login and navigation process to reach the page I needed to parse.
If you have the option, use the simpler code :)
"ker_01" wrote:
Here is one I used to parse out some data. I never got it working as well as
I wanted; if you are working with simple pages it shouldn't be too hard, but
I was trying to parse a page that the designer specifically didn't want
parsed, and I was having trouble getting Excel to grab 100% of the page
content for parsing. The code below grabbed the active page and used Regex to
parse it. If you use Regex you will need to set a reference to it in the VBE.
HTH,
Keith
Sub GetTheData()
Dim strHTML As String
Dim strPartURL As String
Dim strPageHTML As String
strPartURL = "providername"
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
If objShellWindows.Count = 0 Then
Exit Sub
End If
For i = 0 To objShellWindows.Count - 1
Set objIE = objShellWindows.Item(i)
If InStr(objIE.LocationURL, strPartURL) Then
'Creates a TextRange object for the element.
Set rng = objIE.document.body.createTextRange
' Set rng2 = objIE.document.body.parentelement
' Str2 = rng2.Text
strPageHTML = rng.Text
End If
Next i
Dim re As RegExp
Dim s As String
Dim ObjID As String
Dim matches As MatchCollection
Dim mcmatch As Match
s1 = strPageHTML
's = Replace(s, Chr(10), Chr(32))
's = Replace(s, Chr(13), Chr(32))
's = Replace(s, "Here to View More Results", "CustomDeletedString")
's = Replace(s, " View ", " ViewView")
Debug.Print s1
'grab the long string to find LSVs
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family[\s\S]*?View"
Set s2 = re.Execute(s)
Debug.Print s2
Set re2 = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family|View[\s\S]*?View" '(gives full string )
"Family|View[\s\S]*?View
Set matches = re.Execute(s)
For Each mcmatch In matches
tempstr = mcmatch
Trim (tempstr)
tempstr = Left(tempstr, Len(tempstr) - 4)
tempstr = Trim(Right(tempstr, Len(tempstr) - 4))
MsgBox (tempstr)
'MsgBox Asc(Mid(tempstr, 2, 1))
Next
Set objShellWindows = Nothing
Set objShell = Nothing
End Sub
"Mike" wrote:
I need EXCEL to open a web address and parse out a few data items from that
web location. What EXCEL macro function(s) can be used to open a web URL and
parse the page source?
The web pages in question are not set up to allow the external date
functions to find the needed data.
|