Home |
Search |
Today's Posts |
#11
![]()
Posted to microsoft.public.excel.programming,microsoft.public.excel
|
|||
|
|||
![]()
GS wrote:
I note that in Windoze, that "\" is used,and on the web, "/" is used. For clarity.., a Windows file path is not the same as a URL. Windows file paths allow spaces; URLs do not. * Incorrect! See -------------------------vvv https://www.nsncenter.com/NSNSearch?...r&PageNumber=5 Windows path delimiter is "\"; Web path delimiter is "/". The function URLDownloadToFile() downloads szURL to szFilename. * IF and when it works. Once downloaded, szFilename needs to be opened, parsed, and result stored locally. Then the next file needs to be downloaded, parsed, and stored. And so on until all files have been downloaded and parsed. * This i knew from the git-go; nice to be clarified. Since the actual page contents comprise only a small portion of the files being downloaded, there size should be considerably smaller after parsing. If you extract the data (text) only (no images) and save this to a txt file you should be able to 'append' to a single file which would result in occupying far less disc space. (For example, pg5 is less than 1kb) I suspect, though, that you need the image to identify the item source (Raytheon, RCA, Lucent Tech, NAWC, MIL STD, etc) because this info is not stored in the image file metadata. Otherwise, the txt file after parsing pg5's text is the following 53 lines: NSN 5960-00-509-3171 5960-00-509-3171 ELECTRON TUBE NSN 5960-00-569-9531 5960-00-569-9531 ELECTRON TUBE NSN 5960-00-553-3770 5960-00-553-3770 ELECTRON TUBE NSN 5960-00-682-8624 5960-00-682-8624 ELECTRON TUBE NSN 5960-00-808-6928 5960-00-808-6928 ELECTRON TUBE NSN 5960-00-766-1953 5960-00-766-1953 ELECTRON TUBE NSN 5960-00-850-6169 5960-00-850-6169 ELECTRON TUBE NSN 5960-00-679-8153 5960-00-679-8153 ELECTRON TUBE NSN 5960-00-134-6884 5960-00-134-6884 ELECTRON TUBE NSN 5960-00-061-8610 5960-00-061-8610 ELECTRON TUBE 5960-00-067-9636 ELECTRON TUBE The file size is 711 bytes, and lists 11 items. Note the last item has no image and so no filler text (NSN line). This inconsistency makes parsing the contents difficult since you don't know which items do not have images. * I think you may have pulled the info from what you saw on that page, and not from the source. In one of my responses, i gave QBASIC code for parsing, and as i remember, there were about 7760 lines of junk before one sees <a href="/NSN/5960; which gives the full NSN code. Use of that allows one to get the second URL, eg: https://www.nsncenter.com/NSN/5960-00-754-5782 NO image reliance at all. There are 11 entries per page,and no inconsistencies with my method of search in the page. If you copy/paste pg5 into Excel you get both text and image. You could then do something to construct the info in a database fashion... * That would only make things more difficult. A copy to a local file is sufficient for a simple parsing as described here and elsewhere in this thread. Col Headers: Source :: PartNum :: Description ..and put the data in the respective columns. This seems very inefficient but is probably less daunting than what you've been doing manually thus far. Auto Complete should be helpful with this, and you could sort the list by Source. Note that clicking the image or part# on the worksheet takes you to the same page as does clicking it on the web page. In the case of pg5, the data will occupy 11 rows. * Manual: Right click, select View Page Source, Save as to HD by changing Filetype from HTM to TXT and changing fiiename to add page number (013 for example). Like i said,parsing of that file is simple and easy; getting 35 pages copied that way did not take long, but there are 999 of them... Seems like your approach is the long way; -I'd find a better data source myself! Perhaps subscribe to an electronics database utility (such as my CAD software would use) that I can update by downloading a single db file<g * I have asked, and received zero response. |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
EOF Parse Text file | Excel Programming | |||
Parse a txt file and save as csv? | Excel Programming | |||
parse from txt file | Excel Programming | |||
Parse File Location | Excel Worksheet Functions | |||
REQ: Simplest way to parse (read) HTML formatted data in via Excel VBA (or VB6) | Excel Programming |