Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 117
Default Unable to Retrieve Complete Web Page

I am trying to completely retrieve a web page and search it using
VBScript regular expressions. However, I do not get the complete web
page.

Am I running into some VBA string length limit or what? Is there some
way around it?

My Sub may be found below.

Alan

Sub GetGoogleHomePage()

Dim oIE As SHDocVw.InternetExplorer
Dim sPage As String

' Create a new (hidden) instance of IE
Set oIE = New SHDocVw.InternetExplorer

' Open the web page
oIE.Navigate "http://www.google.com"

' Wait for the page to complete loading
Do Until oIE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop

' Retrieve the text of the web page
sPage = oIE.Document.body.InnerHTML

' Display the HTML
Debug.Print sPage

End Sub
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Unable to Retrieve Complete Web Page


Alan;477519 Wrote:
I am trying to completely retrieve a web page and search it using
VBScript regular expressions. However, I do not get the complete web
page.

Am I running into some VBA string length limit or what? Is there some
way around it?

My Sub may be found below.

Alan

Sub GetGoogleHomePage()

Dim oIE As SHDocVw.InternetExplorer
Dim sPage As String

' Create a new (hidden) instance of IE
Set oIE = New SHDocVw.InternetExplorer

' Open the web page
oIE.Navigate "http://www.google.com"

' Wait for the page to complete loading
Do Until oIE.ReadyState = READYSTATE_COMPLETE
DoEvents
Loop

' Retrieve the text of the web page
sPage = oIE.Document.body.InnerHTML

' Display the HTML
Debug.Print sPage

End Sub


Hello Alan,

Here is another method that overcomes the string/character limitations.
This uses the WinHTTP COM object to retrieve and store the web page's
source code into a text file (.txt). The file created is "C:\temp
URL.txt". You can change the path and file name to what you want.

================================
'Written: September 04, 2009
'Author: Leith Ross
'Summary: Saves a web page's source code to a text file.

Sub SaveServerDataAsFile()

'Create an array to hold the response data.
Dim d() As Byte
Dim objReq As Object

On Error Resume Next
Set objReq = CreateObject("WinHttp.WinHttpRequest.5.1")
If objReq Is Nothing Then
Set objReq = CreateObject("WinHttp.WinHttpRequest.5")
End If
Err.Clear
On Error GoTo 0

'Assemble an HTTP Request.
objReq.Open "GET", "http://www.thecodecage.com/", False

'Send the HTTP Request.
objReq.Send

'Show status and content type.
MsgBox objReq.Status & " - " & objReq.StatusText

'Put response data into a file.
Open "C:\temp URL.txt" For Binary As #1
d() = objReq.ResponseBody
Put #1, 1, d()
Close #1

End Sub
================================


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744

  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 117
Default Unable to Retrieve Complete Web Page

Leith,
It seems like I will run into the same problem (too long)
or have problems with text broken over multiple lines when I read the
data back from the file.

But, I'll give this a try.

Could you please explain the "WinHttp.WinHttpRequest.5.1" vs.
"WinHttp.WinHttpRequest.5"?

Thanks, Alan

  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Unable to Retrieve Complete Web Page


Alan;477872 Wrote:
Leith,
It seems like I will run into the same problem (too long)
or have problems with text broken over multiple lines when I read the
data back from the file.

But, I'll give this a try.

Could you please explain the "WinHttp.WinHttpRequest.5.1" vs.
"WinHttp.WinHttpRequest.5"?

Thanks, Alan


Hello Alan,

Here is link to page that explains the differences in detail

'WinHTTP Versions (Windows)'
(http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx)


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744

  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 117
Default Unable to Retrieve Complete Web Page


I do seem to have the same truncation problem when I read the firl
back in with this code:

' Read each line of the file, looking for the description
Dim myFileName As String
Dim myLine As String
Dim FileNum As Long

myFileName = ThisWorkbook.Path & "\URL.txt"
FileNum = FreeFile
Close FileNum
Open myFileName For Input As FileNum
count = 0
Do While Not EOF(FileNum)
count = count + 1
Line Input #FileNum, myLine
Debug.Print myLine
Debug.Print "===========================================" &
vbCrLf
Debug.Print count & vbCrLf
Debug.Print vbCrLf &
"===========================================" & vbCrLf
myLine = ExtractCoDescr(myLine)
If Len(myLine) 0 Then
GetCoDescription = myLine
Close FileNum
Exit Do
End If
Loop
Close FileNum


  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Unable to Retrieve Complete Web Page


Alan;477883 Wrote:
I do seem to have the same truncation problem when I read the firl
back in with this code:

' Read each line of the file, looking for the description
Dim myFileName As String
Dim myLine As String
Dim FileNum As Long

myFileName = ThisWorkbook.Path & "\URL.txt"
FileNum = FreeFile
Close FileNum
Open myFileName For Input As FileNum
count = 0
Do While Not EOF(FileNum)
count = count + 1
Line Input #FileNum, myLine
Debug.Print myLine
Debug.Print "===========================================" &
vbCrLf
Debug.Print count & vbCrLf
Debug.Print vbCrLf &
"===========================================" & vbCrLf
myLine = ExtractCoDescr(myLine)
If Len(myLine) 0 Then
GetCoDescription = myLine
Close FileNum
Exit Do
End If
Loop
Close FileNum


Hello Alan,

The code I wrote downloads the web page in binary format using unsigned
bytes. This is all stored into memory before being save to a disk file
with a ".txt" extension. Web page size is only limited by available
memory. The advantage of binary is that all information is brought into
the file, not just text and this could be what is causing your
truncation problems. The file reading method you are using expects the
file data to be in a specific format. What information are you trying to
locate or extract from the file?


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744

  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 117
Default Unable to Retrieve Complete Web Page

I am trying to extract text following a series of HTML tags and
keywords.

If you can explain how I get started on properly reading it, I
would appreciate it.

Alan

  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default Unable to Retrieve Complete Web Page


Alan;478340 Wrote:
I am trying to extract text following a series of HTML tags and
keywords.

If you can explain how I get started on properly reading it, I
would appreciate it.

Alan


Hello Alan,

The easiest method would to be to use Word as the file editor. You can
view the file best by going to View Web Layout and search using Find
option.


--
Leith Ross

Sincerely,
Leith Ross

'The Code Cage' (http://www.thecodecage.com/)
------------------------------------------------------------------------
Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75
View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744

  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 117
Default Unable to Retrieve Complete Web Page

I used a VBScript TextStream object. That worked.

Alan
  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 10,124
Default Unable to Retrieve Complete Web Page

Care to post it for the archives

--
Don Guillett
Microsoft MVP Excel
SalesAid Software

"Alan" wrote in message
...
I used a VBScript TextStream object. That worked.

Alan


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I retrieve a page number in Excel? rambo1989 Excel Worksheet Functions 1 December 6th 07 01:10 PM
Cannot print a complete page of labels C in Cleveland Excel Discussion (Misc queries) 1 September 15th 06 10:55 PM
Retrieve data from company intranet page [email protected] Excel Programming 8 August 22nd 06 04:58 PM
programmatically retrieve links from web page Loane Sharp[_2_] Excel Programming 2 January 26th 06 03:15 PM
Unable to print a complete workbook Bradley Excel Discussion (Misc queries) 3 September 30th 05 02:57 PM


All times are GMT +1. The time now is 04:07 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"