View Single Post
  #2   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
Robert Baer Robert Baer is offline
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

GS wrote:
So what you also want is the linked file (web page) the image or part#
links to! Here's what I got from
https://www.nsncenter.com/NSN/5960-00-831-8683 (pg4):

1st occurance of <a href="/NSN/5960 is at line 7878;

1st occurance of (MCRL) is at line 7931;

1st occurance after that of <a href="/PartNumber" is this at line 7951;
<td align="center" style="vertical-align: middle;"<a
href="/PartNumber/GV4S1400"GV4S1400</a</td

and the next 3 lines a
<td style="width: 125px; height: 60px; vertical-align: middle;"
align="center" nowrap&nbsp;&nbsp;<a
href="/CAGE/63060"63060</a&nbsp;&nbsp;</td
<td align="center" style="vertical-align: middle;"&nbsp;&nbsp;<a
href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a&nbsp;&nbsp;</td
<td text-align="center" style="vertical-align: middle;"<a title="CAGE
63060" href="/CAGE/63060"HEICO OHMITE LLC</a</td


So you want to go to the next page linked to and repeat the process?

At this point my Excel sheet has been modified as follows:

Source | NSN Item# | Description | Part# | MCRL#
Tektronix | 5960-00-831-8683 | ELECTRON TUBE | GV4S1400 | 4932653
<a href="/CAGE/63060"63060</a
<a href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a
<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a

General Dynamics | 5960-00-853-8207 | ELECTRON TUBE | 295-29434 | 5074477
line1
line2
line3

..and so on.

So far, I'm working with text files and so I'm inclined to append each
item to a file named "ElectronTube_NSN5960.txt". File contents for the 2
items above would be structured so the 1st line contains headings (data
fields) so it works properly with ADODB. (Note that I use a comma as
delimiter, and the file does not contain any blank lines)...

Source,NSN Item#,Description,Part#,MCRL#
Tektronix,5960-00-831-8683,ELECTRON TUBE,GV4S1400,4932653
<a href="/CAGE/63060"63060</a
<a href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a
<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a
General Dynamics,5960-00-853-8207,ELECTRON TUBE,295-29434,5074477
<a href="/CAGE/1VPW8"1VPW8</a
<a href="/CAGE/1VPW8"<img class="img-thumbnail"
src="https://az774353.vo.msecnd.net/cage/90/1vpw8.jpg" alt="CAGE 1VPW8"
height=45 width=90 /</a
<a title="CAGE 1VPW8" href="/CAGE/1VPW8"GENERAL DYNAMICS C4 SYSTEMS,
INC.</a

..where I have parsed off the CSS formatting text and html tags outside
<a...</a from the 3 lines. I'd likely convert the UCase to proer case
as well.

The file size is 653 bytes meaning a full page would be about 4kb; 1000
pages being about 4mb. That's 44 lines per page after the fields line.

A file this size can be easily handled via ADO recordset or std VB file
I/O functions/methods. Loading into an array (vData) puts fields in
vData(0) and records starting at vData(1), and looping would Step 4.

I really don't have the time/energy (I have Lou Gehrig's) to get any
more involved with your project due to current commitments. I just felt
it might be worth explaining how I'd handle your task in the hopes it
would be helpful to you reaching a viable solution. I bid you good
wishes going forward...

* Thanks for the guide.


You are getting all of the right stuff from what i would call the
second file.
The first file is
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" &
PageNum where PagNum (in ASCII) goes from "1" to "999".
Note the (implied?) space in the URL.

I think that by now you have it all figured out.

In snooping around,i have just stumbled on the ADODB scheme,and what
little i have found so far it looks very promising.
Only one example which does not work (examples NEVER work) and zero
explanations so far.
It seems that with the proper code, that ADODB would allow me to copy
those first files to a HD.

Would you be so kind as to share your working ADODB code?
Or did you hand-copy the source like i did?

Thanks again.