![]() |
Scraping/listing document URLs on a server that don't have web pages/existing links?
We have a server-side database that includes BLOBs of MS word documents.
Some of those documents have always been available via URL hyperlinks on an intranet web page. We've asked the administrator to expose a new group of documents from that database, although the links to those newly exposed documents won't be built into the existing web interface (they should still be available via individual intranet URLs). As the first step in a new project, I'd like to scrape all of those document URLs. I assume it would be something like a recursive tree search, since there is a heirarchical order of the documents in the existing web interface- so for example, /WPSAC/1-1000/356/356.doc /WPSAC/1-1000/781/781.doc /WPSAC/1001-2000/2294/2294.doc /WPSAC/1001-2000/2770-2790/specials/2776.doc /WPSAC/Revised/single_entry/B438.doc etc. I haven't worked with web stuff at all (although I'm decent with VBA)- any pointers on where to start, to build a list in Excel where each sequential cell contains a link to the next word document? My next step on the project will be to build a user interface so a user can select a document number and have it automatically load the link, but I need to get the links themselves first. Thanks! Keith |
Scraping/listing document URLs on a server that don't have web pages/existing links?
Sorry, I forgot to mention I'm using XL2003 on WinXP.
"Keith R" wrote in message ... We have a server-side database that includes BLOBs of MS word documents. Some of those documents have always been available via URL hyperlinks on an intranet web page. We've asked the administrator to expose a new group of documents from that database, although the links to those newly exposed documents won't be built into the existing web interface (they should still be available via individual intranet URLs). As the first step in a new project, I'd like to scrape all of those document URLs. I assume it would be something like a recursive tree search, since there is a heirarchical order of the documents in the existing web interface- so for example, /WPSAC/1-1000/356/356.doc /WPSAC/1-1000/781/781.doc /WPSAC/1001-2000/2294/2294.doc /WPSAC/1001-2000/2770-2790/specials/2776.doc /WPSAC/Revised/single_entry/B438.doc etc. I haven't worked with web stuff at all (although I'm decent with VBA)- any pointers on where to start, to build a list in Excel where each sequential cell contains a link to the next word document? My next step on the project will be to build a user interface so a user can select a document number and have it automatically load the link, but I need to get the links themselves first. Thanks! Keith |
Scraping/listing document URLs on a server that don't have web pages/existing links?
Keith,
From your description it sounds as though the docs are kept in a database (which flavour?) and not on the web server filesystem. If this is the case then you won't be able to "scrape" the URL's: there is no "Dir()" equivalent in this case. If web links aren't to be created for the new docs it's not clear how they're being exposed to you. If they're in a database then potentially you could use something like ADO to search and index the docs from Excel. Even then, creating a hyperlink clickable in XL would require some type of scripting set up on the server to deliver the requested file from the DB table. Tim "Keith R" wrote in message ... We have a server-side database that includes BLOBs of MS word documents. Some of those documents have always been available via URL hyperlinks on an intranet web page. We've asked the administrator to expose a new group of documents from that database, although the links to those newly exposed documents won't be built into the existing web interface (they should still be available via individual intranet URLs). As the first step in a new project, I'd like to scrape all of those document URLs. I assume it would be something like a recursive tree search, since there is a heirarchical order of the documents in the existing web interface- so for example, /WPSAC/1-1000/356/356.doc /WPSAC/1-1000/781/781.doc /WPSAC/1001-2000/2294/2294.doc /WPSAC/1001-2000/2770-2790/specials/2776.doc /WPSAC/Revised/single_entry/B438.doc etc. I haven't worked with web stuff at all (although I'm decent with VBA)- any pointers on where to start, to build a list in Excel where each sequential cell contains a link to the next word document? My next step on the project will be to build a user interface so a user can select a document number and have it automatically load the link, but I need to get the links themselves first. Thanks! Keith |
All times are GMT +1. The time now is 07:54 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com