ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   How to scrape web page data in VBA (https://www.excelbanter.com/excel-programming/441204-how-scrape-web-page-data-vba.html)

Mike

How to scrape web page data in VBA
 
I need EXCEL to open a web address and parse out a few data items from that
web location. What EXCEL macro function(s) can be used to open a web URL and
parse the page source?

The web pages in question are not set up to allow the external date
functions to find the needed data.

Atishoo

How to scrape web page data in VBA
 
Use joels dump sub to ascertain the tag name or ID for the data you want:
with worksheets ("sheet1")
RowCount = 1
for each itm in IE.Document.all
Range("A" & RowCount) = itm.tagname
Range("B" & RowCount) = itm.classname
Range("C" & RowCount) = itm.id 'comment out if errors
Range("D" & RowCount) = itm.name 'comment out if errors
Range("E" & RowCount) = left(itm.innertext,1024)
RowCount = Rowcount + 1
next itm
end with
once you have this information you can reference the data in your sub
look at the following piece of zip code sub as an example

Set Form = IE.document.getElementsByTagname("Form")

Set zip5 = IE.document.getElementById("zip5")
zip5.Value = ZIPCODE


Set ZipCodebutton = Form(0).onsubmit

Form(0).submit
Do While IE.busy = True
DoEvents
Loop

Set Table = IE.document.getElementsByTagname("Table")
Location = Table(0).Rows(2).innertext
IE.Quit
MsgBox ("Zip code = " & ZIPCODE & " City/State = " & Location)

good luck

"Mike" wrote:

I need EXCEL to open a web address and parse out a few data items from that
web location. What EXCEL macro function(s) can be used to open a web URL and
parse the page source?

The web pages in question are not set up to allow the external date
functions to find the needed data.


ker_01

How to scrape web page data in VBA
 
Here is one I used to parse out some data. I never got it working as well as
I wanted; if you are working with simple pages it shouldn't be too hard, but
I was trying to parse a page that the designer specifically didn't want
parsed, and I was having trouble getting Excel to grab 100% of the page
content for parsing. The code below grabbed the active page and used Regex to
parse it. If you use Regex you will need to set a reference to it in the VBE.
HTH,
Keith

Sub GetTheData()

Dim strHTML As String
Dim strPartURL As String
Dim strPageHTML As String

strPartURL = "providername"

Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows

If objShellWindows.Count = 0 Then
Exit Sub
End If

For i = 0 To objShellWindows.Count - 1
Set objIE = objShellWindows.Item(i)
If InStr(objIE.LocationURL, strPartURL) Then
'Creates a TextRange object for the element.
Set rng = objIE.document.body.createTextRange
' Set rng2 = objIE.document.body.parentelement
' Str2 = rng2.Text
strPageHTML = rng.Text
End If
Next i

Dim re As RegExp
Dim s As String
Dim ObjID As String
Dim matches As MatchCollection
Dim mcmatch As Match

s1 = strPageHTML
's = Replace(s, Chr(10), Chr(32))
's = Replace(s, Chr(13), Chr(32))
's = Replace(s, "Here to View More Results", "CustomDeletedString")
's = Replace(s, " View ", " ViewView")
Debug.Print s1

'grab the long string to find LSVs
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family[\s\S]*?View"

Set s2 = re.Execute(s)
Debug.Print s2

Set re2 = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family|View[\s\S]*?View" '(gives full string )
"Family|View[\s\S]*?View

Set matches = re.Execute(s)
For Each mcmatch In matches
tempstr = mcmatch
Trim (tempstr)
tempstr = Left(tempstr, Len(tempstr) - 4)
tempstr = Trim(Right(tempstr, Len(tempstr) - 4))
MsgBox (tempstr)
'MsgBox Asc(Mid(tempstr, 2, 1))
Next

Set objShellWindows = Nothing
Set objShell = Nothing
End Sub





"Mike" wrote:

I need EXCEL to open a web address and parse out a few data items from that
web location. What EXCEL macro function(s) can be used to open a web URL and
parse the page source?

The web pages in question are not set up to allow the external date
functions to find the needed data.


ron

How to scrape web page data in VBA
 
On Mar 31, 1:47*pm, Mike wrote:
I need EXCEL to open a web address and *parse out a few data items from that
web location. *What EXCEL macro function(s) can be used to open a web URL and
parse the page source?

The web pages in question are not set up to allow the external date
functions to find the needed data.


Here's another method. It doesn't involve opening and closing IE so
it runs faster. All of the web page information can be extracted from
the source code (using instr, mid, etc) which is contained in
my_var...Ron

Sub Test()
my_url = "http://www.google.com"
Set my_obj = CreateObject("MSXML2.XMLHTTP")
my_obj.Open "GET", my_url, False
my_obj.send
my_var = my_obj.responsetext
Set my_obj = Nothing
End Sub

ker_01

How to scrape web page data in VBA
 
Ron's code is much better than mine if you have a static web address that you
can poll directly. Mine was more complicated because I had to follow a
complicated login and navigation process to reach the page I needed to parse.
If you have the option, use the simpler code :)

"ker_01" wrote:

Here is one I used to parse out some data. I never got it working as well as
I wanted; if you are working with simple pages it shouldn't be too hard, but
I was trying to parse a page that the designer specifically didn't want
parsed, and I was having trouble getting Excel to grab 100% of the page
content for parsing. The code below grabbed the active page and used Regex to
parse it. If you use Regex you will need to set a reference to it in the VBE.
HTH,
Keith

Sub GetTheData()

Dim strHTML As String
Dim strPartURL As String
Dim strPageHTML As String

strPartURL = "providername"

Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows

If objShellWindows.Count = 0 Then
Exit Sub
End If

For i = 0 To objShellWindows.Count - 1
Set objIE = objShellWindows.Item(i)
If InStr(objIE.LocationURL, strPartURL) Then
'Creates a TextRange object for the element.
Set rng = objIE.document.body.createTextRange
' Set rng2 = objIE.document.body.parentelement
' Str2 = rng2.Text
strPageHTML = rng.Text
End If
Next i

Dim re As RegExp
Dim s As String
Dim ObjID As String
Dim matches As MatchCollection
Dim mcmatch As Match

s1 = strPageHTML
's = Replace(s, Chr(10), Chr(32))
's = Replace(s, Chr(13), Chr(32))
's = Replace(s, "Here to View More Results", "CustomDeletedString")
's = Replace(s, " View ", " ViewView")
Debug.Print s1

'grab the long string to find LSVs
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family[\s\S]*?View"

Set s2 = re.Execute(s)
Debug.Print s2

Set re2 = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family|View[\s\S]*?View" '(gives full string )
"Family|View[\s\S]*?View

Set matches = re.Execute(s)
For Each mcmatch In matches
tempstr = mcmatch
Trim (tempstr)
tempstr = Left(tempstr, Len(tempstr) - 4)
tempstr = Trim(Right(tempstr, Len(tempstr) - 4))
MsgBox (tempstr)
'MsgBox Asc(Mid(tempstr, 2, 1))
Next

Set objShellWindows = Nothing
Set objShell = Nothing
End Sub





"Mike" wrote:

I need EXCEL to open a web address and parse out a few data items from that
web location. What EXCEL macro function(s) can be used to open a web URL and
parse the page source?

The web pages in question are not set up to allow the external date
functions to find the needed data.



All times are GMT +1. The time now is 09:47 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com