View Single Post
  #28   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
Robert Baer Robert Baer is offline
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

Auric__ wrote:
Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:

And assuming a fix, what can i do about the OPEN command/syntax?
// What i did in Excel:
S$ = "D:\Website\Send .Hot\****"
tmp = Environ("TEMP")& "\"& S$

The contents of the variable S$ at this point:

S$ = "C:\Users\auric\D:\Website\Send .Hot\****"

Do you see the problem?

Also, as Garry pointed out, cleanup should happen automatically. The

"Kill"
keyword deletes files.

Try this code:


[snip]

Grumble..do not understand well enough to get working.
Now i do not know what i had that fully worked with the gTX.htm file.

The following "almost" works; it fails on the open.


You know, one of us is confused, and I'm not entirely sure it isn't me. I've
given you (theoretically) working code twice now, and yet you insist on
making some pretty radical changes that DON'T ****ING WORK!

So, let's step back from the coding for a moment, and let's have you explain
***EXACTLY*** what it is you want done. Give examples like, "given data X, I
want to do Y, with result Z."

Unless I get a clearer explanation of what you're trying to do, I'm done
with this thread.

I wish to read and parse every page of
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber="
where the page number goes from 5 to 999.
On each page, find "<a href="/NSN/5960 [it is longer, but that is the
start].
Given the full number (eg: <a href="/NSN/5960-00-831-8683"), open a
new related page "https://www.nsncenter.com/NSN/5960-00-831-8683" and
find the line ending "(MCRL)".
Read abut 4 lines to <a href="/PartNumber/ which is <a
href="/PartNumber/GV4S1400" in this case.
save/write that line plus the next three; close this secondary online
URL and step to next "<a href="/NSN/5960 to process the same way.
Continue to end of the page, close that URL and open the next page.

Crude code:
CLOSE ' PRS5960.BAS (QuickBasic)
' watch linewrap below..
SRC1$ = "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber="
SRC2$ = "https://www.nsncenter.com/NSN/5960" 'example only
FSC$ = "/NSN/5960"
OPEN "FSC5960.TXT" FOR APPEND AS #9
' Let page number run from 05 to 39 to read existing files
FOR PG = 5 TO 39
A$ = ""
FPG$ = RIGHT$("0" + MID$(STR$(PG), 2), 2)
' These files, FPG$ + ".txt" are copies from the web
OPEN FPG$ + ".txt" FOR INPUT AS #1
ON ERROR GOTO END1
PRINT FPG$ + ".txt", 'is screen note to me
WHILE NOT EOF(1)
WHILE INSTR(A$, FSC$) = 0 'skip 7765 lines of junk
LINE INPUT #1, A$ 'look for <a href="/NSN/5960-00-754-5782" Class= ETC
WEND
P = INSTR(A$, FSC$) + 9: FPG2$ = SRC2$ + MID$(A$, P, 12)
NSN$ = "5960" + MID$(A$, P, 12)
PRINT NSN$ 'is screen note to me
AHREF$ = ".." + FSC$ + MID$(A$, P, 12)
'Need URL FPG2$ or .. a href to get balance of data
' See comments above this program
PRINT #9, NSN$
LINE INPUT #1, A$
WEND
END1:
RESUME LAB
LAB:
CLOSE #1
NEXT PG
CLOSE
SYSTEM
**
Note the Function URLDownloadToFile does not allow spaces; there is
one in the "page" URL.
Problem #2, the Function URLDownloadToFile does not allow https
website URLs.
Other than those problems, i have everything else working fine.