Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
I am trying to completely retrieve a web page and search it using
VBScript regular expressions. However, I do not get the complete web page. Am I running into some VBA string length limit or what? Is there some way around it? My Sub may be found below. Alan Sub GetGoogleHomePage() Dim oIE As SHDocVw.InternetExplorer Dim sPage As String ' Create a new (hidden) instance of IE Set oIE = New SHDocVw.InternetExplorer ' Open the web page oIE.Navigate "http://www.google.com" ' Wait for the page to complete loading Do Until oIE.ReadyState = READYSTATE_COMPLETE DoEvents Loop ' Retrieve the text of the web page sPage = oIE.Document.body.InnerHTML ' Display the HTML Debug.Print sPage End Sub |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
Alan;477519 Wrote: I am trying to completely retrieve a web page and search it using VBScript regular expressions. However, I do not get the complete web page. Am I running into some VBA string length limit or what? Is there some way around it? My Sub may be found below. Alan Sub GetGoogleHomePage() Dim oIE As SHDocVw.InternetExplorer Dim sPage As String ' Create a new (hidden) instance of IE Set oIE = New SHDocVw.InternetExplorer ' Open the web page oIE.Navigate "http://www.google.com" ' Wait for the page to complete loading Do Until oIE.ReadyState = READYSTATE_COMPLETE DoEvents Loop ' Retrieve the text of the web page sPage = oIE.Document.body.InnerHTML ' Display the HTML Debug.Print sPage End Sub Hello Alan, Here is another method that overcomes the string/character limitations. This uses the WinHTTP COM object to retrieve and store the web page's source code into a text file (.txt). The file created is "C:\temp URL.txt". You can change the path and file name to what you want. ================================ 'Written: September 04, 2009 'Author: Leith Ross 'Summary: Saves a web page's source code to a text file. Sub SaveServerDataAsFile() 'Create an array to hold the response data. Dim d() As Byte Dim objReq As Object On Error Resume Next Set objReq = CreateObject("WinHttp.WinHttpRequest.5.1") If objReq Is Nothing Then Set objReq = CreateObject("WinHttp.WinHttpRequest.5") End If Err.Clear On Error GoTo 0 'Assemble an HTTP Request. objReq.Open "GET", "http://www.thecodecage.com/", False 'Send the HTTP Request. objReq.Send 'Show status and content type. MsgBox objReq.Status & " - " & objReq.StatusText 'Put response data into a file. Open "C:\temp URL.txt" For Binary As #1 d() = objReq.ResponseBody Put #1, 1, d() Close #1 End Sub ================================ -- Leith Ross Sincerely, Leith Ross 'The Code Cage' (http://www.thecodecage.com/) ------------------------------------------------------------------------ Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744 |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
Leith,
It seems like I will run into the same problem (too long) or have problems with text broken over multiple lines when I read the data back from the file. But, I'll give this a try. Could you please explain the "WinHttp.WinHttpRequest.5.1" vs. "WinHttp.WinHttpRequest.5"? Thanks, Alan |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
Alan;477872 Wrote: Leith, It seems like I will run into the same problem (too long) or have problems with text broken over multiple lines when I read the data back from the file. But, I'll give this a try. Could you please explain the "WinHttp.WinHttpRequest.5.1" vs. "WinHttp.WinHttpRequest.5"? Thanks, Alan Hello Alan, Here is link to page that explains the differences in detail 'WinHTTP Versions (Windows)' (http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx) -- Leith Ross Sincerely, Leith Ross 'The Code Cage' (http://www.thecodecage.com/) ------------------------------------------------------------------------ Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744 |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
I do seem to have the same truncation problem when I read the firl back in with this code: ' Read each line of the file, looking for the description Dim myFileName As String Dim myLine As String Dim FileNum As Long myFileName = ThisWorkbook.Path & "\URL.txt" FileNum = FreeFile Close FileNum Open myFileName For Input As FileNum count = 0 Do While Not EOF(FileNum) count = count + 1 Line Input #FileNum, myLine Debug.Print myLine Debug.Print "===========================================" & vbCrLf Debug.Print count & vbCrLf Debug.Print vbCrLf & "===========================================" & vbCrLf myLine = ExtractCoDescr(myLine) If Len(myLine) 0 Then GetCoDescription = myLine Close FileNum Exit Do End If Loop Close FileNum |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
Alan;477883 Wrote: I do seem to have the same truncation problem when I read the firl back in with this code: ' Read each line of the file, looking for the description Dim myFileName As String Dim myLine As String Dim FileNum As Long myFileName = ThisWorkbook.Path & "\URL.txt" FileNum = FreeFile Close FileNum Open myFileName For Input As FileNum count = 0 Do While Not EOF(FileNum) count = count + 1 Line Input #FileNum, myLine Debug.Print myLine Debug.Print "===========================================" & vbCrLf Debug.Print count & vbCrLf Debug.Print vbCrLf & "===========================================" & vbCrLf myLine = ExtractCoDescr(myLine) If Len(myLine) 0 Then GetCoDescription = myLine Close FileNum Exit Do End If Loop Close FileNum Hello Alan, The code I wrote downloads the web page in binary format using unsigned bytes. This is all stored into memory before being save to a disk file with a ".txt" extension. Web page size is only limited by available memory. The advantage of binary is that all information is brought into the file, not just text and this could be what is causing your truncation problems. The file reading method you are using expects the file data to be in a specific format. What information are you trying to locate or extract from the file? -- Leith Ross Sincerely, Leith Ross 'The Code Cage' (http://www.thecodecage.com/) ------------------------------------------------------------------------ Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744 |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
I am trying to extract text following a series of HTML tags and
keywords. If you can explain how I get started on properly reading it, I would appreciate it. Alan |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
Alan;478340 Wrote: I am trying to extract text following a series of HTML tags and keywords. If you can explain how I get started on properly reading it, I would appreciate it. Alan Hello Alan, The easiest method would to be to use Word as the file editor. You can view the file best by going to View Web Layout and search using Find option. -- Leith Ross Sincerely, Leith Ross 'The Code Cage' (http://www.thecodecage.com/) ------------------------------------------------------------------------ Leith Ross's Profile: http://www.thecodecage.com/forumz/member.php?userid=75 View this thread: http://www.thecodecage.com/forumz/sh...d.php?t=131744 |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
I used a VBScript TextStream object. That worked.
Alan |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Unable to Retrieve Complete Web Page
Care to post it for the archives
-- Don Guillett Microsoft MVP Excel SalesAid Software "Alan" wrote in message ... I used a VBScript TextStream object. That worked. Alan |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How do I retrieve a page number in Excel? | Excel Worksheet Functions | |||
Cannot print a complete page of labels | Excel Discussion (Misc queries) | |||
Retrieve data from company intranet page | Excel Programming | |||
programmatically retrieve links from web page | Excel Programming | |||
Unable to print a complete workbook | Excel Discussion (Misc queries) |