View Single Post
  #59   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
Robert Baer Robert Baer is offline
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

Auric__ wrote:
Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:

PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish.

Then you must be using a bad version, or perhaps have something wrong
with your .wgetrc. I've been using wget for around 10 years, and never
had anything like those issues unless I pass bad options.

Know nothing about .wgetrc;


Don't worry about it. It can be used to set default behaviors but every entry
can be replicated via switches.

am in Win2K cmd line, and the batch file
used is:
H:
CD\Win2K_WORK\OIL4LESS\LLCDOCS\FED app\FBA stuff

wget --no-check-certificate --output-document=5960_002.TXT
--output-file=log002.TXT
https://www.nsncenter.com/NSNSearch?...d%20%22ELECTRO
N%20TUBE%22&PageNumber=2


That wget line performs as expected for me: 5960_002.TXT contains valid HTML
(although I haven't made any attempt to check the data; it looks like most of
the page is CSS) and log002.TXT is a typical wget log of a successful
transfer.

As for truncating the filenames, if I remove the --output-document switch,
the filename I get is

NSNSearch@q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=2

PAUSE

The SourceForge site offered a Zip which was supposed to be complete,


If you're talking about GNUwin32, that version is years out of date.

but none of the created folders had an EXE (tried Win2K, WinXP, Win7).
Found SofTonic offering only a plain jane wget.exe, which i am using,
so that may be a buggered version.


Never even heard of them.

Suggestions?


I'm using 1.16.3. No idea where I got it.

The batch file that I use for downloading looks like this:

call wget --no-check-certificate -x -c -e robots=off -i new.txt %*

-x Always create directories (e.g. http://a.b.c/1/2.txt - .\a.b.c\1\2.txt).
-c Continue interrupted downloads.
-e Do this .wgetrc thing (in this case, ignore the robots.txt file).
-i Read list of filenames from the following file ("new.txt" because that's
the default name for a new file in my file manager).

I use the -i switch so I don't have to worry about escaping characters or %
vs %%. Whatever's in the text file is exactly what it looks for. (If you go
this route, it's one file per line.)

You must have a different version of Wget; whatever i do on the
command line,including the "trick" of restrict-file-names=nocontrol, i
get a buggered path name plus the response &PageNumber not recognized.
Exactly same results in Win2K, WinXP or in Win7.

Yes, i used GNUwin32 as SourceForge "complete" of Wget had no EXE.
Is there some other (compiled, complete) source i should get?