ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Unable to Retrieve Complete Web Page (https://www.excelbanter.com/excel-programming/433256-unable-retrieve-complete-web-page.html)

Alan[_8_]

Unable to Retrieve Complete Web Page
 
I am trying to completely retrieve a web page and search it using
VBScript regular expressions. However, I do not get the complete web
page.

Am I running into some VBA string length limit or what? Is there some
way around it?

My Sub may be found below.

Alan

Sub GetGoogleHomePage()

Dim oIE As SHDocVw.InternetExplorer
Dim sPage As String

' Create a new (hidden) instance of IE
Set oIE = New SHDocVw.InternetExplorer

' Open the web page
oIE.Navigate "http://www.google.com"

' Wait for the page to complete loading
Do Until oIE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop

' Retrieve the text of the web page
sPage = oIE.Document.body.InnerHTML

' Display the HTML
Debug.Print sPage

End Sub

Leith Ross[_787_]

Unable to Retrieve Complete Web Page
 

Alan;477519 Wrote:
I am trying to completely retrieve a web page and search it using
VBScript regular expressions. However, I do not get the complete web
page.

Am I running into some VBA string length limit or what? Is there some
way around it?

My Sub may be found below.

Alan

Sub GetGoogleHomePage()

Dim oIE As SHDocVw.InternetExplorer
Dim sPage As String

' Create a new (hidden) instance of IE
Set oIE = New SHDocVw.InternetExplorer

' Open the web page
oIE.Navigate "http://www.google.com"

' Wait for the page to complete loading
Do Until oIE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop

' Retrieve the text of the web page
sPage = oIE.Document.body.InnerHTML

' Display the HTML
Debug.Print sPage

End Sub


Hello Alan,

Here is another method that overcomes the string/character limitations.
This uses the WinHTTP COM object to retrieve and store the web page's
source code into a text file (.txt). The file created is "C:\temp
URL.txt". You can change the path and file name to what you want.

================================
'Written: September 04, 2009
'Author: Leith Ross
'Summary: Saves a web page's source code to a text file.

Sub SaveServerDataAsFile()

'Create an array to hold the response data.
Dim d() As Byte
Dim objReq As Object

On Error Resume Next
Set objReq = CreateObject("WinHttp.WinHttpRequest.5.1")
If objReq Is Nothing Then
Set objReq = CreateObject("WinHttp.WinHttpRequest.5")
End If
Err.Clear
On Error GoTo 0

'Assemble an HTTP Request.
objReq.Open "GET", "http://www.thecodecage.com/", False

'Send the HTTP Request.
objReq.Send

'Show status and content type.
MsgBox objReq.Status & " - " & objReq.StatusText

'Put response data into a file.
Open "C:\temp URL.txt" For Binary As #1
d() = objReq.ResponseBody
Put #1, 1, d()
Close #1

End Sub
================================


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744


Alan[_8_]

Unable to Retrieve Complete Web Page
 
Leith,
It seems like I will run into the same problem (too long)
or have problems with text broken over multiple lines when I read the
data back from the file.

But, I'll give this a try.

Could you please explain the "WinHttp.WinHttpRequest.5.1" vs.
"WinHttp.WinHttpRequest.5"?

Thanks, Alan


Leith Ross[_789_]

Unable to Retrieve Complete Web Page
 

Alan;477872 Wrote:
Leith,
It seems like I will run into the same problem (too long)
or have problems with text broken over multiple lines when I read the
data back from the file.

But, I'll give this a try.

Could you please explain the "WinHttp.WinHttpRequest.5.1" vs.
"WinHttp.WinHttpRequest.5"?

Thanks, Alan


Hello Alan,

Here is link to page that explains the differences in detail

'WinHTTP Versions (Windows)'
(http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx)


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744


Alan[_8_]

Unable to Retrieve Complete Web Page
 

I do seem to have the same truncation problem when I read the firl
back in with this code:

' Read each line of the file, looking for the description
Dim myFileName As String
Dim myLine As String
Dim FileNum As Long

myFileName = ThisWorkbook.Path & "\URL.txt"
FileNum = FreeFile
Close FileNum
Open myFileName For Input As FileNum
count = 0
Do While Not EOF(FileNum)
count = count + 1
Line Input #FileNum, myLine
Debug.Print myLine
Debug.Print "===========================================" &
vbCrLf
Debug.Print count & vbCrLf
Debug.Print vbCrLf &
"===========================================" & vbCrLf
myLine = ExtractCoDescr(myLine)
If Len(myLine) 0 Then
GetCoDescription = myLine
Close FileNum
Exit Do
End If
Loop
Close FileNum

Leith Ross[_790_]

Unable to Retrieve Complete Web Page
 

Alan;477883 Wrote:
I do seem to have the same truncation problem when I read the firl
back in with this code:

' Read each line of the file, looking for the description
Dim myFileName As String
Dim myLine As String
Dim FileNum As Long

myFileName = ThisWorkbook.Path & "\URL.txt"
FileNum = FreeFile
Close FileNum
Open myFileName For Input As FileNum
count = 0
Do While Not EOF(FileNum)
count = count + 1
Line Input #FileNum, myLine
Debug.Print myLine
Debug.Print "===========================================" &
vbCrLf
Debug.Print count & vbCrLf
Debug.Print vbCrLf &
"===========================================" & vbCrLf
myLine = ExtractCoDescr(myLine)
If Len(myLine) 0 Then
GetCoDescription = myLine
Close FileNum
Exit Do
End If
Loop
Close FileNum


Hello Alan,

The code I wrote downloads the web page in binary format using unsigned
bytes. This is all stored into memory before being save to a disk file
with a ".txt" extension. Web page size is only limited by available
memory. The advantage of binary is that all information is brought into
the file, not just text and this could be what is causing your
truncation problems. The file reading method you are using expects the
file data to be in a specific format. What information are you trying to
locate or extract from the file?


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744


Alan[_8_]

Unable to Retrieve Complete Web Page
 
I am trying to extract text following a series of HTML tags and
keywords.

If you can explain how I get started on properly reading it, I
would appreciate it.

Alan


Leith Ross[_792_]

Unable to Retrieve Complete Web Page
 

Alan;478340 Wrote:
I am trying to extract text following a series of HTML tags and
keywords.

If you can explain how I get started on properly reading it, I
would appreciate it.

Alan


Hello Alan,

The easiest method would to be to use Word as the file editor. You can
view the file best by going to View Web Layout and search using Find
option.


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744


Alan[_8_]

Unable to Retrieve Complete Web Page
 
I used a VBScript TextStream object. That worked.

Alan

Don Guillett

Unable to Retrieve Complete Web Page
 
Care to post it for the archives

--
Don Guillett
Microsoft MVP Excel
SalesAid Software

"Alan" wrote in message
...
I used a VBScript TextStream object. That worked.

Alan




All times are GMT +1. The time now is 10:28 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com