View Single Post
  #48   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
Robert Baer Robert Baer is offline
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

GS wrote:
You are correct, #1) do not need that line 3, and #2) do not need the
extended info.


Ok then, fieldnames will be: Item#,Part#,MCRL,CAGE,Source

File name(s) for PageNumber=1 I would use 5960_001.TXT,..to
PageNumber=999 I would use 5960_999.TXT and that would preserve order.
*OR*
Reading & parsing from PageNumber=1 to PageNumber=999,one could append
to same file (name NSN_5960.TXT); might as well - makes it easier to
pour into a single Excel file.
Either way is fine.


Ok, then ouput filename will be: NSN_5960.txt

I have found a way to get rid of items that are not strictly electron
tubes and/or not regulators; that way you do not have to parse out
these "unfit" items from first page description. Use:
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=1"

Naturally, PageNumber still goes from 1 to 999.
Note the implied "(", ")" and " "; human-readable "5960 regulator and
(ELECTRON TUBE)".
As far as i can tell, using that shows no undesirable parts.


Works nice! Now I get 11 5960 items per parent page.

Thanks!
PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish


What is WGET?

WGET is a command line program that will copy contents of an URL to
the hard drive; it has various options, for SSL, i think for some
processing, for giving the output file a specific name, for recursion, etc.
Was still trying to find ways to copy the online file to the hard drive.

I still do not understand what magic you used.

Now, the nitty-gritty; in exchange for that nicely parsed file, what
do i owe you?