ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Scraping/listing document URLs on a server that don't have web pages/existing links? (https://www.excelbanter.com/excel-programming/406065-scraping-listing-document-urls-server-dont-have-web-pages-existing-links.html)

Keith R[_2_]

Scraping/listing document URLs on a server that don't have web pages/existing links?
 
We have a server-side database that includes BLOBs of MS word documents.
Some of those documents have always been available via URL hyperlinks on an
intranet web page. We've asked the administrator to expose a new group of
documents from that database, although the links to those newly exposed
documents won't be built into the existing web interface (they should still
be available via individual intranet URLs).

As the first step in a new project, I'd like to scrape all of those document
URLs. I assume it would be something like a recursive tree search, since
there is a heirarchical order of the documents in the existing web
interface- so for example,
/WPSAC/1-1000/356/356.doc
/WPSAC/1-1000/781/781.doc
/WPSAC/1001-2000/2294/2294.doc
/WPSAC/1001-2000/2770-2790/specials/2776.doc
/WPSAC/Revised/single_entry/B438.doc
etc.

I haven't worked with web stuff at all (although I'm decent with VBA)- any
pointers on where to start, to build a list in Excel where each sequential
cell contains a link to the next word document?

My next step on the project will be to build a user interface so a user can
select a document number and have it automatically load the link, but I need
to get the links themselves first.

Thanks!
Keith



Keith R[_2_]

Scraping/listing document URLs on a server that don't have web pages/existing links?
 
Sorry, I forgot to mention I'm using XL2003 on WinXP.

"Keith R" wrote in message
...
We have a server-side database that includes BLOBs of MS word documents.
Some of those documents have always been available via URL hyperlinks on
an intranet web page. We've asked the administrator to expose a new group
of documents from that database, although the links to those newly exposed
documents won't be built into the existing web interface (they should
still be available via individual intranet URLs).

As the first step in a new project, I'd like to scrape all of those
document URLs. I assume it would be something like a recursive tree
search, since there is a heirarchical order of the documents in the
existing web interface- so for example,
/WPSAC/1-1000/356/356.doc
/WPSAC/1-1000/781/781.doc
/WPSAC/1001-2000/2294/2294.doc
/WPSAC/1001-2000/2770-2790/specials/2776.doc
/WPSAC/Revised/single_entry/B438.doc
etc.

I haven't worked with web stuff at all (although I'm decent with VBA)- any
pointers on where to start, to build a list in Excel where each sequential
cell contains a link to the next word document?

My next step on the project will be to build a user interface so a user
can select a document number and have it automatically load the link, but
I need to get the links themselves first.

Thanks!
Keith




Tim Williams

Scraping/listing document URLs on a server that don't have web pages/existing links?
 
Keith,

From your description it sounds as though the docs are kept in a database
(which flavour?) and not on the web server filesystem. If this is the case
then you won't be able to "scrape" the URL's: there is no "Dir()" equivalent
in this case.

If web links aren't to be created for the new docs it's not clear how
they're being exposed to you. If they're in a database then potentially you
could use something like ADO to search and index the docs from Excel. Even
then, creating a hyperlink clickable in XL would require some type of
scripting set up on the server to deliver the requested file from the DB
table.

Tim


"Keith R" wrote in message
...
We have a server-side database that includes BLOBs of MS word documents.
Some of those documents have always been available via URL hyperlinks on
an intranet web page. We've asked the administrator to expose a new group
of documents from that database, although the links to those newly exposed
documents won't be built into the existing web interface (they should
still be available via individual intranet URLs).

As the first step in a new project, I'd like to scrape all of those
document URLs. I assume it would be something like a recursive tree
search, since there is a heirarchical order of the documents in the
existing web interface- so for example,
/WPSAC/1-1000/356/356.doc
/WPSAC/1-1000/781/781.doc
/WPSAC/1001-2000/2294/2294.doc
/WPSAC/1001-2000/2770-2790/specials/2776.doc
/WPSAC/Revised/single_entry/B438.doc
etc.

I haven't worked with web stuff at all (although I'm decent with VBA)- any
pointers on where to start, to build a list in Excel where each sequential
cell contains a link to the next word document?

My next step on the project will be to build a user interface so a user
can select a document number and have it automatically load the link, but
I need to get the links themselves first.

Thanks!
Keith





All times are GMT +1. The time now is 07:54 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com