Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Robert Baer wrote:
Excel macros are SO... undocumented. Sure they are. Plenty of documentation for individual keywords. You're not looking for that, you're looking for a broader concept than individual keywords. Need a WORKING example for reading the HTML source a URL (say http://www.oil4lessllc.org/gTX.htm) Downloading a web page (or any URI, really) is easy: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub foo() Dim tmp, result, contents tmp = Environ("TEMP") & Format$(Now, "yyyymmdd-hhmmss-") & "gTX.htm" 'download result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _ tmp, 0, 0) If result < 0 Then 'failed to download; error handler here Else 'read from file Open tmp For Binary As 1 contents = Space$(LOF(1)) Get #1, 1, contents Close 1 'parse file here '[...] 'cleanup End If Kill tmp End Sub (Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of Excel.) Dealing with the data downloaded is very dependant on what the page contains and what you want to extract from it. That page you mentioned contains 2 images and 2 Javascript arrays; assuming you want the data from the arrays, you could search for "awls[" or "aprd[" and get your data that way. Rather than downloading to a file, it is possible to download straight to memory, but I find it's simpler to use a temp file. Among other things, downloading to memory requires opening and closing the connection yourself; URLDownloadToFile handles that for you. -- - We lose the ones we love; we cannot change it. Put it aside. - How? How can I do what is needed, when all I feel is... hate? |
#2
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
I wrote:
tmp = Environ("TEMP") & Format$(Now, "yyyymmdd-hhmmss-") & "gTX.htm" Damn, missed a backslash. tmp = Environ("TEMP") & "\" & Format$(Now, "yyyymmdd-hhmmss-") & "gTX.htm" -- I'm willing to table this discussion for now, and sneak out, if you are. |
#3
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say http://www.oil4lessllc.org/gTX.htm) Thanks. |
#4
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Auric__ wrote:
Robert Baer wrote: Excel macros are SO... undocumented. Sure they are. Plenty of documentation for individual keywords. You're not looking for that, you're looking for a broader concept than individual keywords. Need a WORKING example for reading the HTML source a URL (say http://www.oil4lessllc.org/gTX.htm) Downloading a web page (or any URI, really) is easy: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub foo() Dim tmp, result, contents tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm" 'download result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _ tmp, 0, 0) If result< 0 Then 'failed to download; error handler here Else 'read from file Open tmp For Binary As 1 contents = Space$(LOF(1)) Get #1, 1, contents Close 1 'parse file here '[...] 'cleanup End If Kill tmp End Sub (Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of Excel.) Dealing with the data downloaded is very dependant on what the page contains and what you want to extract from it. That page you mentioned contains 2 images and 2 Javascript arrays; assuming you want the data from the arrays, you could search for "awls[" or "aprd[" and get your data that way. Rather than downloading to a file, it is possible to download straight to memory, but I find it's simpler to use a temp file. Among other things, downloading to memory requires opening and closing the connection yourself; URLDownloadToFile handles that for you. Thanks. I know that the parsing is very dependent on the contents and what one wants. That is why i gave an example having at least one data array; similar to what i may be parsing. I too, like temp files because i can open them in random and/or binary mode if need. A bit of fun to read such backwards (like BMP files). Thanks again. |
#5
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Auric__ wrote:
I wrote: tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm" Damn, missed a backslash. tmp = Environ("TEMP")& "\"& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm" Thanks for the correction to the "path" of destruction. |
#6
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Auric__ wrote:
Robert Baer wrote: Excel macros are SO... undocumented. Sure they are. Plenty of documentation for individual keywords. You're not looking for that, you're looking for a broader concept than individual keywords. Need a WORKING example for reading the HTML source a URL (say http://www.oil4lessllc.org/gTX.htm) Downloading a web page (or any URI, really) is easy: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub foo() Dim tmp, result, contents tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm" 'download result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _ tmp, 0, 0) If result< 0 Then 'failed to download; error handler here Else 'read from file Open tmp For Binary As 1 contents = Space$(LOF(1)) Get #1, 1, contents Close 1 'parse file here '[...] 'cleanup End If Kill tmp End Sub (Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of Excel.) Dealing with the data downloaded is very dependant on what the page contains and what you want to extract from it. That page you mentioned contains 2 images and 2 Javascript arrays; assuming you want the data from the arrays, you could search for "awls[" or "aprd[" and get your data that way. Rather than downloading to a file, it is possible to download straight to memory, but I find it's simpler to use a temp file. Among other things, downloading to memory requires opening and closing the connection yourself; URLDownloadToFile handles that for you. Well, i am in a pickle. Firstly, i did a bit of experimenting, and i discovers a few things. 1) The variable "tmp" is nice, but does not have to have the date and time; that would fill the HD since i have thousands of files to process. Fix is easy - just have a constant name for use. 2) The file "tmp" is created ONLY if the value of "result" is zero. 3) The problem i have seems to be due to the fact that the online files have no filetype: "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5" And i need to process page numbers from 5 to 999 (the reason for the program). I have processed page numbers from 1 to 4 by hand...PITA. Ideas for a fix? Thanks. |
#7
![]() |
|||
|
|||
![]()
Phim SEX không che , Phim SEX Nh*t Bản , Phim SEX loạn luân , Phim SEX HD
Phim SEX HD <<<< Tuyển t*p phim sex chất lượng cao mới nhất, những bộ phim heo hay nhất 2016, xem phim sex HD full 1080p trên web không quảng cáo, chúc các bạn xem ... |
#8
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Robert Baer wrote:
Auric__ wrote: Robert Baer wrote: Excel macros are SO... undocumented. Sure they are. Plenty of documentation for individual keywords. You're not looking for that, you're looking for a broader concept than individual keywords. Need a WORKING example for reading the HTML source a URL (say http://www.oil4lessllc.org/gTX.htm) Downloading a web page (or any URI, really) is easy: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub foo() Dim tmp, result, contents tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm" 'download result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _ tmp, 0, 0) If result< 0 Then 'failed to download; error handler here Else 'read from file Open tmp For Binary As 1 contents = Space$(LOF(1)) Get #1, 1, contents Close 1 'parse file here '[...] 'cleanup End If Kill tmp End Sub (Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of Excel.) Dealing with the data downloaded is very dependant on what the page contains and what you want to extract from it. That page you mentioned contains 2 images and 2 Javascript arrays; assuming you want the data from the arrays, you could search for "awls[" or "aprd[" and get your data that way. Rather than downloading to a file, it is possible to download straight to memory, but I find it's simpler to use a temp file. Among other things, downloading to memory requires opening and closing the connection yourself; URLDownloadToFile handles that for you. Well, i am in a pickle. Firstly, i did a bit of experimenting, and i discovered a few things. 1) The variable "tmp" is nice, but does not have to have the date and time; that would fill the HD since i have thousands of files to process. * Error: if i have umpteen files to process, then i must use their unique names. Fix is easy - just have a constant name for use. * Error: i can issue a KILL each time (a "KILL *.*" does not work). 2) The file "tmp" is created ONLY if the value of "result" is zero. 3) The problem i have seems to be due to the fact that the online files have no filetype: "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5" And i need to process page numbers from 5 to 999 (the reason for the program). I have processed page numbers from 1 to 4 by hand...PITA. Ideas for a fix? Thanks. |
#9
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Robert Baer wrote:
Robert Baer wrote: Auric__ wrote: Robert Baer wrote: Excel macros are SO... undocumented. Sure they are. Plenty of documentation for individual keywords. You're not looking for that, you're looking for a broader concept than individual keywords. Need a WORKING example for reading the HTML source a URL (say http://www.oil4lessllc.org/gTX.htm) Downloading a web page (or any URI, really) is easy: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub foo() Dim tmp, result, contents tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm" 'download result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _ tmp, 0, 0) If result< 0 Then 'failed to download; error handler here Else 'read from file Open tmp For Binary As 1 contents = Space$(LOF(1)) Get #1, 1, contents Close 1 'parse file here '[...] 'cleanup End If Kill tmp End Sub (Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of Excel.) Dealing with the data downloaded is very dependant on what the page contains and what you want to extract from it. That page you mentioned contains 2 images and 2 Javascript arrays; assuming you want the data from the arrays, you could search for "awls[" or "aprd[" and get your data that way. Rather than downloading to a file, it is possible to download straight to memory, but I find it's simpler to use a temp file. Among other things, downloading to memory requires opening and closing the connection yourself; URLDownloadToFile handles that for you. Well, i am in a pickle. Firstly, i did a bit of experimenting, and i discovered a few things. 1) The variable "tmp" is nice, but does not have to have the date and time; that would fill the HD since i have thousands of files to process. * Error: if i have umpteen files to process, then i must use their unique names. Fix is easy - just have a constant name for use. * Error: i can issue a KILL each time (a "KILL *.*" does not work). 2) The file "tmp" is created ONLY if the value of "result" is zero. 3) The problem i have seems to be due to the fact that the online files have no filetype: "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5" And i need to process page numbers from 5 to 999 (the reason for the program). I have processed page numbers from 1 to 4 by hand...PITA. Ideas for a fix? Thanks. Well, it is rather bad.. I created a file with no filetype and put it on the web. The process URLDownloadToFile worked, with "tmp" and "result" being correct. BUT.... The line "Open tmp For Binary As 1" barfed with error message "Bad filename or number" (remember, it worked with my gTX.htm file). So, there is "something" about that site that seems to prevent read access. Is there a way to get a clue (and maybe fix it)? And assuming a fix, what can i do about the OPEN command/syntax? // What i did in Excel: S$ = "D:\Website\Send .Hot\****" tmp = Environ("TEMP") & "\" & S$ result = URLDownloadToFile(0, S$, tmp, 0, 0) 'The file is at "http://www.oil4lessllc.com/****" ; an Excel file |
#10
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Well, i am in a pickle.
Firstly, i did a bit of experimenting, and i discovers a few things. 1) The variable "tmp" is nice, but does not have to have the date and time; that would fill the HD since i have thousands of files to process. So you did not 'catch' that the tmp file is deleted when this line... Kill tmp ...gets executed! -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#11
![]() |
|||
|
|||
![]()
Phim SEX không che , Phim SEX Nh*t Bản , Phim SEX loạn luân , Phim SEX HD
Phim SEX online <<<< Xem phim sex chọn lọc chất lượng cao mới nhất 2016, Phim sex online siêu dâm dục, phim sex hd siêu nét vừa xem vừa thủ dâm cực khoái. |
#12
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Robert Baer wrote:
And assuming a fix, what can i do about the OPEN command/syntax? // What i did in Excel: S$ = "D:\Website\Send .Hot\****" tmp = Environ("TEMP") & "\" & S$ The contents of the variable S$ at this point: S$ = "C:\Users\auric\D:\Website\Send .Hot\****" Do you see the problem? Also, as Garry pointed out, cleanup should happen automatically. The "Kill" keyword deletes files. Try this code: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Function downloadFile(what As String) As String 'returns tmp file's path on success or empty string on failure Dim tmp, result, contents, fnum tmp = Environ("TEMP") & "\" & Format$(Now, "yyyymmdd-hhmmss-") & _ "downloaded.tmp" 'download result = URLDownloadToFile(0, what, tmp, 0, 0) If result < 0 Then 'failed to download; error handler here, if any On Error Resume Next 'can be avoided by checking if the file exists Kill tmp 'cleanup On Error GoTo 0 downloadFile = "" Else 'read from file fnum = FreeFile Open tmp For Binary As fnum contents = Space$(LOF(fnum)) Get #fnum, 1, contents Close fnum downloadFile = tmp End If End Function Sub foo() Dim what As String, files_to_get As Variant, L0 'specify files to download files_to_get = Array("http://www.oil4lessllc.com/****", _ "http://www.oil4lessllc.org/gTX.htm") For L0 = LBound(files_to_get) To UBound(files_to_get) what = downloadFile(files_to_get(L0)) If Len(what) Then '****parse file here***** Kill what 'cleanup End If Next End Sub -- THIS IS NOT AN APPROPRIATE ANSWER. |
#13
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Auric__ wrote:
Robert Baer wrote: And assuming a fix, what can i do about the OPEN command/syntax? // What i did in Excel: S$ = "D:\Website\Send .Hot\****" tmp = Environ("TEMP")& "\"& S$ The contents of the variable S$ at this point: S$ = "C:\Users\auric\D:\Website\Send .Hot\****" Do you see the problem? Also, as Garry pointed out, cleanup should happen automatically. The "Kill" keyword deletes files. Try this code: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Function downloadFile(what As String) As String 'returns tmp file's path on success or empty string on failure Dim tmp, result, contents, fnum tmp = Environ("TEMP")& "\"& Format$(Now, "yyyymmdd-hhmmss-")& _ "downloaded.tmp" 'download result = URLDownloadToFile(0, what, tmp, 0, 0) If result< 0 Then 'failed to download; error handler here, if any On Error Resume Next 'can be avoided by checking if the file exists Kill tmp 'cleanup On Error GoTo 0 downloadFile = "" Else 'read from file fnum = FreeFile Open tmp For Binary As fnum contents = Space$(LOF(fnum)) Get #fnum, 1, contents Close fnum downloadFile = tmp End If End Function Sub foo() Dim what As String, files_to_get As Variant, L0 'specify files to download files_to_get = Array("http://www.oil4lessllc.com/****", _ "http://www.oil4lessllc.org/gTX.htm") For L0 = LBound(files_to_get) To UBound(files_to_get) what = downloadFile(files_to_get(L0)) If Len(what) Then '****parse file here***** Kill what 'cleanup End If Next End Sub Grumble..do not understand well enough to get working. Now i do not know what i had that fully worked with the gTX.htm file. The following "almost" works; it fails on the open. Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub foo() Dim tmp, result, content$ SRC1$ = "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5" SRC3$ = "https://www.nsncenter.com/NSNSearch?q=5960" SRC2$ = "https://www.nsncenter.com/NSN/5960" 'example only SRCA$ = "https://www.nsncenter.com/" SRCB$ = "NSNSearch?q=5960%20regulator&PageNumber=5" FSC$ = "/NSN/5960" 'Kill "C:\DOCUME~1\ROBERT~1\LOCALS~1\Temp\SRC1$" S$ = "D:\Website\Send .Hot\****" ' "result" passes 'S$ = "http://www.oil4lessllc.com/REACH.PDF" ' "result" passes 'S$ = "http://www.oil4lessllc.org/gTX.htm" ' "result" passes tmp = Environ("TEMP") & "\" & S$ ' The file "tmp" exists ONLY if the value of "result" is zero 'download result = URLDownloadToFile(0, S$, tmp, 0, 0) If result < 0 Then 'failed to download; error handler here Stop Else 'read from file Open tmp For Binary As 1 ' Bad file name or number content$ = Space$(LOF(1)) lim = LOF(1) Get #1, 1, content$ Close 1 'parse file here For pick = 1 To lim Step 50 L1$ = Mid$(content$, pick, 50) x = x Next pick '[...] 'cleanup End If Kill tmp End Sub |
#14
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
Well, i am in a pickle. Firstly, i did a bit of experimenting, and i discovers a few things. 1) The variable "tmp" is nice, but does not have to have the date and time; that would fill the HD since i have thousands of files to process. So you did not 'catch' that the tmp file is deleted when this line... Kill tmp ..gets executed! I know about that; i have been stepping thru the execution (F8), and stop long before that. Then i go to CMD prompt and check files and DEL *.* when appropriate for next test. |
#15
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
but does not have to have the date and time; that would fill the HD
since i have thousands of files to process. Filenames have nothing to do with storage space; -it's the file size! Given Auric_'s suggestion creates text files, the size of 999 txt files would hardly be more the 1MB total! If you append each page to the 1st file then all pages could be in 1 file... -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#16
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
After looking at your link to p5, I see what you mean by the amount of
storage space, but filename is not a factor. Parsing will certainly downsize each file considerably... -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#17
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Robert Baer wrote:
Auric__ wrote: Robert Baer wrote: And assuming a fix, what can i do about the OPEN command/syntax? // What i did in Excel: S$ = "D:\Website\Send .Hot\****" tmp = Environ("TEMP")& "\"& S$ The contents of the variable S$ at this point: S$ = "C:\Users\auric\D:\Website\Send .Hot\****" Do you see the problem? Also, as Garry pointed out, cleanup should happen automatically. The "Kill" keyword deletes files. Try this code: [snip] Grumble..do not understand well enough to get working. Now i do not know what i had that fully worked with the gTX.htm file. The following "almost" works; it fails on the open. You know, one of us is confused, and I'm not entirely sure it isn't me. I've given you (theoretically) working code twice now, and yet you insist on making some pretty radical changes that DON'T ****ING WORK! So, let's step back from the coding for a moment, and let's have you explain ***EXACTLY*** what it is you want done. Give examples like, "given data X, I want to do Y, with result Z." Unless I get a clearer explanation of what you're trying to do, I'm done with this thread. -- She died and we did nothing. |
#18
![]() |
|||
|
|||
![]()
Ch*o bạn!
Có bạn phải đang tìm kiếm cho mình mã giảm giá adayroi không? Hãy truy c*p ngay MaGiamGiaAdayroi.com để nh*n th*t nhiều mã giảm giá miễn ph* nhé |
#19
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
I have also found the following:
1) one does not have to use FreeFile for a file number (when all else is OK). True, however not considered 'best practice'. Freefile() ensures a unique ID is assigned to your var. 2) cannot use "contents" for string storage space. Why not? It's not a VB[A] keyword and so qualifies for use as a var. 3) one cannot mix use of "/" and "\" in a string for a given file name. Not sure why you'd use "/" in a path string! Forward slash is not a legal filename/path character. Backslash is the default Windows path delimiter. If choosing folders, the last backslah is not followed by a filename. 4) one cannot have a space in the file name, so that gives a serious problem for some web URLs (work-around anyone?) Web paths 'pad' spaces so the string is contiguous. I believe the pad string is "%20" OR "+". 5) method fails for "https:" (work-around anyone?) What method? -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#20
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
but does not have to have the date and time; that would fill the HD since i have thousands of files to process. Filenames have nothing to do with storage space; -it's the file size! Given Auric_'s suggestion creates text files, the size of 999 txt files would hardly be more the 1MB total! If you append each page to the 1st file then all pages could be in 1 file... True, BUT the files can be large: "result = URLDownloadToFile(0, S$, tmp, 0, 0)" creates a file in TEMP the size of the source - which can be multi-megabtes; 999 of them can eat the HD space fast. Hopefully a URL file size does not exceed the space limit allowed in Excel 2003 string space (anyone know what that might be?). I have found that the stringvalue AKA TEMP filename can be fixed to anything reasonable, and does not have to include parts/substrings/subsets of the file one wants to download. I can be "a good thing" (quoting Martha Stewart) to delete the file when done. I have also found the following: 1) one does not have to use FreeFile for a file number (when all else is OK). 2) cannot use "contents" for string storage space. 3) one cannot mix use of "/" and "\" in a string for a given file name. 4) one cannot have a space in the file name, so that gives a serious problem for some web URLs (work-around anyone?) 5) method fails for "https:" (work-around anyone?) |
#21
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
After looking at your link to p5, I see what you mean by the amount of storage space, but filename is not a factor. Parsing will certainly downsize each file considerably... One has to open the file to parse, so that is not logical. The function URLDownloadToFile gives zero options - it copies ALL of the source into a TEMP file; one hopes that the source is not equal to or larger than 4GB in size! For pages on the web, that is extremely unlikely; webpage size max limit prolly is 10MB; maybe 300K worst case on the average. So, once in TEMP, it can be opened for input (text), for random (may specify buffer size), or for binary (may specify buffer size). Here,one can optimize read speed VS string space used. Your last statement contradicts your first statement. Sounds like you need to do some *extensive* research into standard VB file I/O procedures and general parsing techniques! -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#22
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
5) method fails for "https:" (work-around anyone?)
These URLs usually require some kind of 'login' be done, which needs to be included in the URL string. -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#23
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
After looking at your link to p5, I see what you mean by the amount of storage space, but filename is not a factor. Parsing will certainly downsize each file considerably... One has to open the file to parse, so that is not logical. The function URLDownloadToFile gives zero options - it copies ALL of the source into a TEMP file; one hopes that the source is not equal to or larger than 4GB in size! For pages on the web, that is extremely unlikely; webpage size max limit prolly is 10MB; maybe 300K worst case on the average. So, once in TEMP, it can be opened for input (text), for random (may specify buffer size), or for binary (may specify buffer size). Here,one can optimize read speed VS string space used. |
#24
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
I have also found the following: 1) one does not have to use FreeFile for a file number (when all else is OK). True, however not considered 'best practice'. Freefile() ensures a unique ID is assigned to your var. 2) cannot use "contents" for string storage space. Why not? It's not a VB[A] keyword and so qualifies for use as a var. * Get run-time error 458, "variable uses an Automation type not supported in Visual Basic". 3) one cannot mix use of "/" and "\" in a string for a given file name. Not sure why you'd use "/" in a path string! Forward slash is not a legal filename/path character. Backslash is the default Windows path delimiter. If choosing folders, the last backslah is not followed by a filename. I note that in Windoze, that "\" is used,and on the web, "/" is used. 4) one cannot have a space in the file name, so that gives a serious problem for some web URLs (work-around anyone?) Web paths 'pad' spaces so the string is contiguous. I believe the pad string is "%20" OR "+". Yes; "%20" is used and seems to act like a space and seems to kill the URLDownloadToFile function usefulness. 5) method fails for "https:" (work-around anyone?) What method? * the function URLDownloadToFile. Is "method" the wrong term? |
#25
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
GS wrote: After looking at your link to p5, I see what you mean by the amount of storage space, but filename is not a factor. Parsing will certainly downsize each file considerably... One has to open the file to parse, so that is not logical. The function URLDownloadToFile gives zero options - it copies ALL of the source into a TEMP file; one hopes that the source is not equal to or larger than 4GB in size! For pages on the web, that is extremely unlikely; webpage size max limit prolly is 10MB; maybe 300K worst case on the average. So, once in TEMP, it can be opened for input (text), for random (may specify buffer size), or for binary (may specify buffer size). Here,one can optimize read speed VS string space used. Your last statement contradicts your first statement. Sounds like you need to do some *extensive* research into standard VB file I/O procedures and general parsing techniques! Perhaps you have some things confused. In that program, "tmp" is a string used for the name of a (hopefully) to-be created file. THAT file can be large,as it MUST "hold" the contents of the URL being transferred. Like i said, the function URLDownloadToFile gives zero options - it copies ALL of the source into a TEMP file (named via "tmp"). Then and only then one can do I/O. I am an expert on on file I/O and parsing; read backwards, read (and parse) TEXT files using the file in binary or random mode; "inserting" and/or "clipping" stuff into/out of the middle of a file, etc. When pressed, i could even write a file backwards, but i have yet to see any reason to try that. |
#26
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
5) method fails for "https:" (work-around anyone?) These URLs usually require some kind of 'login' be done, which needs to be included in the URL string. NO login required; try it. |
#27
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
I note that in Windoze, that "\" is used,and on the web, "/" is
used. For clarity.., a Windows file path is not the same as a URL. Windows file paths allow spaces; URLs do not. Windows path delimiter is "\"; Web path delimiter is "/". The function URLDownloadToFile() downloads szURL to szFilename. Once downloaded, szFilename needs to be opened, parsed, and result stored locally. Then the next file needs to be downloaded, parsed, and stored. And so on until all files have been downloaded and parsed. Since the actual page contents comprise only a small portion of the files being downloaded, there size should be considerably smaller after parsing. If you extract the data (text) only (no images) and save this to a txt file you should be able to 'append' to a single file which would result in occupying far less disc space. (For example, pg5 is less than 1kb) I suspect, though, that you need the image to identify the item source (Raytheon, RCA, Lucent Tech, NAWC, MIL STD, etc) because this info is not stored in the image file metadata. Otherwise, the txt file after parsing pg5's text is the following 53 lines: NSN 5960-00-509-3171 5960-00-509-3171 ELECTRON TUBE NSN 5960-00-569-9531 5960-00-569-9531 ELECTRON TUBE NSN 5960-00-553-3770 5960-00-553-3770 ELECTRON TUBE NSN 5960-00-682-8624 5960-00-682-8624 ELECTRON TUBE NSN 5960-00-808-6928 5960-00-808-6928 ELECTRON TUBE NSN 5960-00-766-1953 5960-00-766-1953 ELECTRON TUBE NSN 5960-00-850-6169 5960-00-850-6169 ELECTRON TUBE NSN 5960-00-679-8153 5960-00-679-8153 ELECTRON TUBE NSN 5960-00-134-6884 5960-00-134-6884 ELECTRON TUBE NSN 5960-00-061-8610 5960-00-061-8610 ELECTRON TUBE 5960-00-067-9636 ELECTRON TUBE The file size is 711 bytes, and lists 11 items. Note the last item has no image and so no filler text (NSN line). This inconsistency makes parsing the contents difficult since you don't know which items do not have images. If you copy/paste pg5 into Excel you get both text and image. You could then do something to construct the info in a database fashion... Col Headers: Source :: PartNum :: Description ...and put the data in the respective columns. This seems very inefficient but is probably less daunting than what you've been doing manually thus far. Auto Complete should be helpful with this, and you could sort the list by Source. Note that clicking the image or part# on the worksheet takes you to the same page as does clicking it on the web page. In the case of pg5, the data will occupy 11 rows. Seems like your approach is the long way; -I'd find a better data source myself! Perhaps subscribe to an electronics database utility (such as my CAD software would use) that I can update by downloading a single db file<g -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#28
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
Auric__ wrote:
Robert Baer wrote: Auric__ wrote: Robert Baer wrote: And assuming a fix, what can i do about the OPEN command/syntax? // What i did in Excel: S$ = "D:\Website\Send .Hot\****" tmp = Environ("TEMP")& "\"& S$ The contents of the variable S$ at this point: S$ = "C:\Users\auric\D:\Website\Send .Hot\****" Do you see the problem? Also, as Garry pointed out, cleanup should happen automatically. The "Kill" keyword deletes files. Try this code: [snip] Grumble..do not understand well enough to get working. Now i do not know what i had that fully worked with the gTX.htm file. The following "almost" works; it fails on the open. You know, one of us is confused, and I'm not entirely sure it isn't me. I've given you (theoretically) working code twice now, and yet you insist on making some pretty radical changes that DON'T ****ING WORK! So, let's step back from the coding for a moment, and let's have you explain ***EXACTLY*** what it is you want done. Give examples like, "given data X, I want to do Y, with result Z." Unless I get a clearer explanation of what you're trying to do, I'm done with this thread. I wish to read and parse every page of "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" where the page number goes from 5 to 999. On each page, find "<a href="/NSN/5960 [it is longer, but that is the start]. Given the full number (eg: <a href="/NSN/5960-00-831-8683"), open a new related page "https://www.nsncenter.com/NSN/5960-00-831-8683" and find the line ending "(MCRL)". Read abut 4 lines to <a href="/PartNumber/ which is <a href="/PartNumber/GV4S1400" in this case. save/write that line plus the next three; close this secondary online URL and step to next "<a href="/NSN/5960 to process the same way. Continue to end of the page, close that URL and open the next page. Crude code: CLOSE ' PRS5960.BAS (QuickBasic) ' watch linewrap below.. SRC1$ = "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" SRC2$ = "https://www.nsncenter.com/NSN/5960" 'example only FSC$ = "/NSN/5960" OPEN "FSC5960.TXT" FOR APPEND AS #9 ' Let page number run from 05 to 39 to read existing files FOR PG = 5 TO 39 A$ = "" FPG$ = RIGHT$("0" + MID$(STR$(PG), 2), 2) ' These files, FPG$ + ".txt" are copies from the web OPEN FPG$ + ".txt" FOR INPUT AS #1 ON ERROR GOTO END1 PRINT FPG$ + ".txt", 'is screen note to me WHILE NOT EOF(1) WHILE INSTR(A$, FSC$) = 0 'skip 7765 lines of junk LINE INPUT #1, A$ 'look for <a href="/NSN/5960-00-754-5782" Class= ETC WEND P = INSTR(A$, FSC$) + 9: FPG2$ = SRC2$ + MID$(A$, P, 12) NSN$ = "5960" + MID$(A$, P, 12) PRINT NSN$ 'is screen note to me AHREF$ = ".." + FSC$ + MID$(A$, P, 12) 'Need URL FPG2$ or .. a href to get balance of data ' See comments above this program PRINT #9, NSN$ LINE INPUT #1, A$ WEND END1: RESUME LAB LAB: CLOSE #1 NEXT PG CLOSE SYSTEM ** Note the Function URLDownloadToFile does not allow spaces; there is one in the "page" URL. Problem #2, the Function URLDownloadToFile does not allow https website URLs. Other than those problems, i have everything else working fine. |
#29
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
So what you also want is the linked file (web page) the image or part#
links to! Here's what I got from https://www.nsncenter.com/NSN/5960-00-831-8683 (pg4): 1st occurance of <a href="/NSN/5960 is at line 7878; 1st occurance of (MCRL) is at line 7931; 1st occurance after that of <a href="/PartNumber" is this at line 7951; <td align="center" style="vertical-align: middle;"<a href="/PartNumber/GV4S1400"GV4S1400</a</td and the next 3 lines a <td style="width: 125px; height: 60px; vertical-align: middle;" align="center" nowrap <a href="/CAGE/63060"63060</a </td <td align="center" style="vertical-align: middle;" <a href="/CAGE/63060"<img class="img-thumbnail" src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45 width=90 /</a </td <td text-align="center" style="vertical-align: middle;"<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a</td So you want to go to the next page linked to and repeat the process? At this point my Excel sheet has been modified as follows: Source | NSN Item# | Description | Part# | MCRL# Tektronix | 5960-00-831-8683 | ELECTRON TUBE | GV4S1400 | 4932653 <a href="/CAGE/63060"63060</a <a href="/CAGE/63060"<img class="img-thumbnail" src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45 width=90 /</a <a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a General Dynamics | 5960-00-853-8207 | ELECTRON TUBE | 295-29434 | 5074477 line1 line2 line3 ...and so on. So far, I'm working with text files and so I'm inclined to append each item to a file named "ElectronTube_NSN5960.txt". File contents for the 2 items above would be structured so the 1st line contains headings (data fields) so it works properly with ADODB. (Note that I use a comma as delimiter, and the file does not contain any blank lines)... Source,NSN Item#,Description,Part#,MCRL# Tektronix,5960-00-831-8683,ELECTRON TUBE,GV4S1400,4932653 <a href="/CAGE/63060"63060</a <a href="/CAGE/63060"<img class="img-thumbnail" src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45 width=90 /</a <a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a General Dynamics,5960-00-853-8207,ELECTRON TUBE,295-29434,5074477 <a href="/CAGE/1VPW8"1VPW8</a <a href="/CAGE/1VPW8"<img class="img-thumbnail" src="https://az774353.vo.msecnd.net/cage/90/1vpw8.jpg" alt="CAGE 1VPW8" height=45 width=90 /</a <a title="CAGE 1VPW8" href="/CAGE/1VPW8"GENERAL DYNAMICS C4 SYSTEMS, INC.</a ...where I have parsed off the CSS formatting text and html tags outside <a...</a from the 3 lines. I'd likely convert the UCase to proer case as well. The file size is 653 bytes meaning a full page would be about 4kb; 1000 pages being about 4mb. That's 44 lines per page after the fields line. A file this size can be easily handled via ADO recordset or std VB file I/O functions/methods. Loading into an array (vData) puts fields in vData(0) and records starting at vData(1), and looping would Step 4. I really don't have the time/energy (I have Lou Gehrig's) to get any more involved with your project due to current commitments. I just felt it might be worth explaining how I'd handle your task in the hopes it would be helpful to you reaching a viable solution. I bid you good wishes going forward... -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#30
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
I note that in Windoze, that "\" is used,and on the web, "/" is used. For clarity.., a Windows file path is not the same as a URL. Windows file paths allow spaces; URLs do not. * Incorrect! See -------------------------vvv https://www.nsncenter.com/NSNSearch?...r&PageNumber=5 Windows path delimiter is "\"; Web path delimiter is "/". The function URLDownloadToFile() downloads szURL to szFilename. * IF and when it works. Once downloaded, szFilename needs to be opened, parsed, and result stored locally. Then the next file needs to be downloaded, parsed, and stored. And so on until all files have been downloaded and parsed. * This i knew from the git-go; nice to be clarified. Since the actual page contents comprise only a small portion of the files being downloaded, there size should be considerably smaller after parsing. If you extract the data (text) only (no images) and save this to a txt file you should be able to 'append' to a single file which would result in occupying far less disc space. (For example, pg5 is less than 1kb) I suspect, though, that you need the image to identify the item source (Raytheon, RCA, Lucent Tech, NAWC, MIL STD, etc) because this info is not stored in the image file metadata. Otherwise, the txt file after parsing pg5's text is the following 53 lines: NSN 5960-00-509-3171 5960-00-509-3171 ELECTRON TUBE NSN 5960-00-569-9531 5960-00-569-9531 ELECTRON TUBE NSN 5960-00-553-3770 5960-00-553-3770 ELECTRON TUBE NSN 5960-00-682-8624 5960-00-682-8624 ELECTRON TUBE NSN 5960-00-808-6928 5960-00-808-6928 ELECTRON TUBE NSN 5960-00-766-1953 5960-00-766-1953 ELECTRON TUBE NSN 5960-00-850-6169 5960-00-850-6169 ELECTRON TUBE NSN 5960-00-679-8153 5960-00-679-8153 ELECTRON TUBE NSN 5960-00-134-6884 5960-00-134-6884 ELECTRON TUBE NSN 5960-00-061-8610 5960-00-061-8610 ELECTRON TUBE 5960-00-067-9636 ELECTRON TUBE The file size is 711 bytes, and lists 11 items. Note the last item has no image and so no filler text (NSN line). This inconsistency makes parsing the contents difficult since you don't know which items do not have images. * I think you may have pulled the info from what you saw on that page, and not from the source. In one of my responses, i gave QBASIC code for parsing, and as i remember, there were about 7760 lines of junk before one sees <a href="/NSN/5960; which gives the full NSN code. Use of that allows one to get the second URL, eg: https://www.nsncenter.com/NSN/5960-00-754-5782 NO image reliance at all. There are 11 entries per page,and no inconsistencies with my method of search in the page. If you copy/paste pg5 into Excel you get both text and image. You could then do something to construct the info in a database fashion... * That would only make things more difficult. A copy to a local file is sufficient for a simple parsing as described here and elsewhere in this thread. Col Headers: Source :: PartNum :: Description ..and put the data in the respective columns. This seems very inefficient but is probably less daunting than what you've been doing manually thus far. Auto Complete should be helpful with this, and you could sort the list by Source. Note that clicking the image or part# on the worksheet takes you to the same page as does clicking it on the web page. In the case of pg5, the data will occupy 11 rows. * Manual: Right click, select View Page Source, Save as to HD by changing Filetype from HTM to TXT and changing fiiename to add page number (013 for example). Like i said,parsing of that file is simple and easy; getting 35 pages copied that way did not take long, but there are 999 of them... Seems like your approach is the long way; -I'd find a better data source myself! Perhaps subscribe to an electronics database utility (such as my CAD software would use) that I can update by downloading a single db file<g * I have asked, and received zero response. |
#31
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
So what you also want is the linked file (web page) the image or part# links to! Here's what I got from https://www.nsncenter.com/NSN/5960-00-831-8683 (pg4): 1st occurance of <a href="/NSN/5960 is at line 7878; 1st occurance of (MCRL) is at line 7931; 1st occurance after that of <a href="/PartNumber" is this at line 7951; <td align="center" style="vertical-align: middle;"<a href="/PartNumber/GV4S1400"GV4S1400</a</td and the next 3 lines a <td style="width: 125px; height: 60px; vertical-align: middle;" align="center" nowrap <a href="/CAGE/63060"63060</a </td <td align="center" style="vertical-align: middle;" <a href="/CAGE/63060"<img class="img-thumbnail" src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45 width=90 /</a </td <td text-align="center" style="vertical-align: middle;"<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a</td So you want to go to the next page linked to and repeat the process? At this point my Excel sheet has been modified as follows: Source | NSN Item# | Description | Part# | MCRL# Tektronix | 5960-00-831-8683 | ELECTRON TUBE | GV4S1400 | 4932653 <a href="/CAGE/63060"63060</a <a href="/CAGE/63060"<img class="img-thumbnail" src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45 width=90 /</a <a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a General Dynamics | 5960-00-853-8207 | ELECTRON TUBE | 295-29434 | 5074477 line1 line2 line3 ..and so on. So far, I'm working with text files and so I'm inclined to append each item to a file named "ElectronTube_NSN5960.txt". File contents for the 2 items above would be structured so the 1st line contains headings (data fields) so it works properly with ADODB. (Note that I use a comma as delimiter, and the file does not contain any blank lines)... Source,NSN Item#,Description,Part#,MCRL# Tektronix,5960-00-831-8683,ELECTRON TUBE,GV4S1400,4932653 <a href="/CAGE/63060"63060</a <a href="/CAGE/63060"<img class="img-thumbnail" src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45 width=90 /</a <a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a General Dynamics,5960-00-853-8207,ELECTRON TUBE,295-29434,5074477 <a href="/CAGE/1VPW8"1VPW8</a <a href="/CAGE/1VPW8"<img class="img-thumbnail" src="https://az774353.vo.msecnd.net/cage/90/1vpw8.jpg" alt="CAGE 1VPW8" height=45 width=90 /</a <a title="CAGE 1VPW8" href="/CAGE/1VPW8"GENERAL DYNAMICS C4 SYSTEMS, INC.</a ..where I have parsed off the CSS formatting text and html tags outside <a...</a from the 3 lines. I'd likely convert the UCase to proer case as well. The file size is 653 bytes meaning a full page would be about 4kb; 1000 pages being about 4mb. That's 44 lines per page after the fields line. A file this size can be easily handled via ADO recordset or std VB file I/O functions/methods. Loading into an array (vData) puts fields in vData(0) and records starting at vData(1), and looping would Step 4. I really don't have the time/energy (I have Lou Gehrig's) to get any more involved with your project due to current commitments. I just felt it might be worth explaining how I'd handle your task in the hopes it would be helpful to you reaching a viable solution. I bid you good wishes going forward... * Thanks for the guide. You are getting all of the right stuff from what i would call the second file. The first file is "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" & PageNum where PagNum (in ASCII) goes from "1" to "999". Note the (implied?) space in the URL. I think that by now you have it all figured out. In snooping around,i have just stumbled on the ADODB scheme,and what little i have found so far it looks very promising. Only one example which does not work (examples NEVER work) and zero explanations so far. It seems that with the proper code, that ADODB would allow me to copy those first files to a HD. Would you be so kind as to share your working ADODB code? Or did you hand-copy the source like i did? Thanks again. |
#32
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
You are getting all of the right stuff from what i would call the
second file. The first file is "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" & PageNum where PagNum (in ASCII) goes from "1" to "999". Note the (implied?) space in the URL. I got Source, NSN Part#, Description from the 1st file. The NSN Item# links to the 2nd file. <snip Would you be so kind as to share your working ADODB code? Or did you hand-copy the source like i did? I use std VB file I/O not ADODB. Initial procedure was to copy/paste page source into Textpad and save as Tmp.txt. Then load the file into an array and parse from there. I thought I'd take a look at going with a userform and MS Web Browser control for more flexible programming opts, but haven't had the time. I assume this would definitely give you an advantage over trying to automate IE, but I need to research using it. I do have URL functions built into my fpSpread.ocx for doing this stuff, but that's an expensive 3rd party AX component. Otherwise, doing this from Excel isn't something I'm familiar with. -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#33
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
I thought I'd take a look at going with a userform and MS Web Browser
control for more flexible programming opts While I'm on pause waiting to consult with client on my current project... This is doable; -I have a userform with web browser, a textbox, and some buttons. The Web Browser doesn't display AddressBar/StatusBar for some reason, even though these props are set 'True'. (Initial URL (pg1) is hard-coded as a result) You navigate to parent pages using Next/Last buttons, starting with pg1 on load. Optionally, you can enter a page# in a GoTo box. The browser lets you select links, and you load its current document page source into txtViewSrc via btnViewSrc. This action also Splits() page source into vPgSrc for locating search text selected in cboSearchTxt. The cboSearchTxt_Change event auto-locates your desired lines at present, but I will have it appending them to file shortly. This file will be structured same as illustrated earlier. I think this could be fully automated after I see how the links are define in their innerHTML. For now, I'll provide a button to write found lines because it gives an opportunity to preview the data going into your file. This will happen via loading found lines into vaLinesOut() which is sized 0to3. This will make the search sequence important so the output file has its lines in the correct order (top-to-bottom in page source). I use my own file read/write procedures because they're configured for large amounts of data in 1 shot to/from dbase.txt files, and so are included in the userform class. While there's still a manual element to this, it's going to be orders of magnitude less daunting and more efficient that what you do now. It seems highly likely over time that this entire task can be fully automated just by entering the URL for pg1! -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#34
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
You are getting all of the right stuff from what i would call the second file. The first file is "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" & PageNum where PagNum (in ASCII) goes from "1" to "999". Note the (implied?) space in the URL. I got Source, NSN Part#, Description from the 1st file. The NSN Item# links to the 2nd file. <snip Would you be so kind as to share your working ADODB code? Or did you hand-copy the source like i did? I use std VB file I/O not ADODB. Initial procedure was to copy/paste page source into Textpad and save as Tmp.txt. Then load the file into an array and parse from there. I thought I'd take a look at going with a userform and MS Web Browser control for more flexible programming opts, but haven't had the time. I assume this would definitely give you an advantage over trying to automate IE, but I need to research using it. I do have URL functions built into my fpSpread.ocx for doing this stuff, but that's an expensive 3rd party AX component. Otherwise, doing this from Excel isn't something I'm familiar with. Check. I know QBASIC fairly well, so a lot of that knowledge crosses over to VB. Someone here was kind enough to give me a full working program that can be used to copy a URL source to a temp file on the HD. Once available all else is very simple and straight forward. The rub is that function (or something it uses) does not allow a space in the URL,AND also does not allow https. So, i need two work-arounds, and the https part would seem to be the worst. I do not know how it works, what DLLs/libraries it calls; no useful information is available. It is: Declare Function URLDownloadToFile Lib "urlmon" _ Alias "URLDownloadToFileA" (ByVal pCaller As Long, _ ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Only the well-known keywords can be found; 'urlmon', 'pCaller', 'szURL', and 'szFileName' are unknowns and not findable in the so-called VB help. And there are no examples; the few ranDUMB ones are incomplete and/or do not work.. I do not see how you use std VB file I/O; AFAIK one cannot open a web page as if it was a file. |
#35
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
I thought I'd take a look at going with a userform and MS Web Browser control for more flexible programming opts While I'm on pause waiting to consult with client on my current project... This is doable; -I have a userform with web browser, a textbox, and some buttons. The Web Browser doesn't display AddressBar/StatusBar for some reason, even though these props are set 'True'. (Initial URL (pg1) is hard-coded as a result) You navigate to parent pages using Next/Last buttons, starting with pg1 on load. Optionally, you can enter a page# in a GoTo box. The browser lets you select links, and you load its current document page source into txtViewSrc via btnViewSrc. This action also Splits() page source into vPgSrc for locating search text selected in cboSearchTxt. The cboSearchTxt_Change event auto-locates your desired lines at present, but I will have it appending them to file shortly. This file will be structured same as illustrated earlier. I think this could be fully automated after I see how the links are define in their innerHTML. For now, I'll provide a button to write found lines because it gives an opportunity to preview the data going into your file. This will happen via loading found lines into vaLinesOut() which is sized 0to3. This will make the search sequence important so the output file has its lines in the correct order (top-to-bottom in page source). I use my own file read/write procedures because they're configured for large amounts of data in 1 shot to/from dbase.txt files, and so are included in the userform class. While there's still a manual element to this, it's going to be orders of magnitude less daunting and more efficient that what you do now. It seems highly likely over time that this entire task can be fully automated just by entering the URL for pg1! Way beyond me. If in HTML one can copy an <a href="..." to the hard drive, then that is all i need. |
#36
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
I thought I'd take a look at going with a userform and MS Web Browser control for more flexible programming opts While I'm on pause waiting to consult with client on my current project... This is doable; -I have a userform with web browser, a textbox, and some buttons. The Web Browser doesn't display AddressBar/StatusBar for some reason, even though these props are set 'True'. (Initial URL (pg1) is hard-coded as a result) You navigate to parent pages using Next/Last buttons, starting with pg1 on load. Optionally, you can enter a page# in a GoTo box. The browser lets you select links, and you load its current document page source into txtViewSrc via btnViewSrc. This action also Splits() page source into vPgSrc for locating search text selected in cboSearchTxt. The cboSearchTxt_Change event auto-locates your desired lines at present, but I will have it appending them to file shortly. This file will be structured same as illustrated earlier. I think this could be fully automated after I see how the links are define in their innerHTML. For now, I'll provide a button to write found lines because it gives an opportunity to preview the data going into your file. This will happen via loading found lines into vaLinesOut() which is sized 0to3. This will make the search sequence important so the output file has its lines in the correct order (top-to-bottom in page source). I use my own file read/write procedures because they're configured for large amounts of data in 1 shot to/from dbase.txt files, and so are included in the userform class. While there's still a manual element to this, it's going to be orders of magnitude less daunting and more efficient that what you do now. It seems highly likely over time that this entire task can be fully automated just by entering the URL for pg1! Way beyond me. If in HTML one can copy an <a href="..." to the hard drive, then that is all i need. Just another approach, since you seem to be having difficulty getting URLDownloadToFile() to work. My approach reads innerHTML of web pages and outputs to txt file. Not sure why you want to grab html and save to disc given the file size is a concern. My approach puts parsed data from all 999 pages into a txt file less than 4mb in size. Once the individual steps have been optimized, automating the entire process will be easy. (I'll leave that part for you to do however you want it to work) I will post the contents of my fParseWebPages.frm file. You will need to set a ref to the Microsoft Web Browser to use it. -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#37
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
I felt you were going about this task wrongly. That's why I did my own
approach; -to illustrate an easier way to get the job done. And it's doing just that! I'll try to get the working solution to you over this long weekend. -- Garry Free usenet access at http://www.eternal-september.org Classic VB Users Regroup! comp.lang.basic.visual.misc microsoft.public.vb.general.discussion --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
#38
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
GS wrote: I thought I'd take a look at going with a userform and MS Web Browser control for more flexible programming opts While I'm on pause waiting to consult with client on my current project... This is doable; -I have a userform with web browser, a textbox, and some buttons. The Web Browser doesn't display AddressBar/StatusBar for some reason, even though these props are set 'True'. (Initial URL (pg1) is hard-coded as a result) You navigate to parent pages using Next/Last buttons, starting with pg1 on load. Optionally, you can enter a page# in a GoTo box. The browser lets you select links, and you load its current document page source into txtViewSrc via btnViewSrc. This action also Splits() page source into vPgSrc for locating search text selected in cboSearchTxt. The cboSearchTxt_Change event auto-locates your desired lines at present, but I will have it appending them to file shortly. This file will be structured same as illustrated earlier. I think this could be fully automated after I see how the links are define in their innerHTML. For now, I'll provide a button to write found lines because it gives an opportunity to preview the data going into your file. This will happen via loading found lines into vaLinesOut() which is sized 0to3. This will make the search sequence important so the output file has its lines in the correct order (top-to-bottom in page source). I use my own file read/write procedures because they're configured for large amounts of data in 1 shot to/from dbase.txt files, and so are included in the userform class. While there's still a manual element to this, it's going to be orders of magnitude less daunting and more efficient that what you do now. It seems highly likely over time that this entire task can be fully automated just by entering the URL for pg1! Way beyond me. If in HTML one can copy an <a href="..." to the hard drive, then that is all i need. Just another approach, since you seem to be having difficulty getting URLDownloadToFile() to work. * snooped around the Excel VB help to find alternate ways (i remember seeing other ways). Here was the first step: SRC1$ = "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5" ''test.. With Worksheets(1) .Hyperlinks.Add .Range("E5"), SRC1$ End With ** That places the URL in cell E5 and it looked good. Clicking once on it "to follow", i get error message "Unable to open...Cannot download the information you requested." This may have the same 2 problems,the (implied) space,and the https. So, to test those,i added a folder on one of my websites: http://www.oil4lessllc.com/Try%20.space/ ...and that works. Therefore, this scheme is one better than the function. Next, i tested the https with: https://duckduckgo.com/ . I get that error message. So i am still dead in the water. My approach reads innerHTML of web pages and outputs to txt file. * THAT is what i cannot do; AFAIK neither HTML5 nor javascript can write to a file. Not sure why you want to grab html and save to disc given the file size is a concern. * Well, when i first started, i got a bit scared because of those 7750 (appx) lines; first time i saw a web site that huge. Now that i have processed 30 pages, i know better. My approach puts parsed data from all 999 pages into a txt file less than 4mb in size. Once the individual steps have been optimized, automating the entire process will be easy. (I'll leave that part for you to do however you want it to work) * So far, in that direction, i have: <header <a href="https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5" id="NSN5860" hello<BR</a <script var T = document.anchors; document.getElementById("NSN5860").innerHTML = T; // open an output stream in a new window document.open("text/html",replace); document.writeln(T+"|<BR"); // display in new window document.close(); </script <!-- above gets [object HTMLCollection] on new screen/window, where that/object item is the href pointer -- I will post the contents of my fParseWebPages.frm file. You will need to set a ref to the Microsoft Web Browser to use it. |
#39
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
Well, i am in a pickle. Firstly, i did a bit of experimenting, and i discovers a few things. 1) The variable "tmp" is nice, but does not have to have the date and time; that would fill the HD since i have thousands of files to process. So you did not 'catch' that the tmp file is deleted when this line... Kill tmp ..gets executed! Been thru this already..also makes no sense to have a name miles long. |
#40
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
What is with this a*hole posting this sh*t here?
|
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
EOF Parse Text file | Excel Programming | |||
Parse a txt file and save as csv? | Excel Programming | |||
parse from txt file | Excel Programming | |||
Parse File Location | Excel Worksheet Functions | |||
REQ: Simplest way to parse (read) HTML formatted data in via Excel VBA (or VB6) | Excel Programming |