Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
This is a followup to a post from yesterday (Thanks to Tim Williams for
responding). I have more information now, and felt it warranted a second try to see if there is way to do this now that we've gotten the documents exposed via the web interface. Using XL2003 on WinXP. We have a corporate web application that exposes various documents in multiple levels of subdirectories. My belief is that these are stored in a database, but now they are directly accessible via web links through this web application, so where they come from hopefully doesn't affect what I am trying to accomplish. Starting from the main page of the web application, I need to scrape the entire directory tree and capture some of the details (javascript links to ..doc and .pdf files that can be opened through IE6 via 'dedicated' URLs for each document). I'm sure I'll have more questions once I start dissecting the HTML, but for starters I need to understand how to even scrape multiple levels within the directory tree of a website. I've copied in some of the URLS (changed slightly for corporate security) to give a sense of what I'm working with. Top of tree: http://ourserver.com/rtsa-bin/PermaS...=M%20S%20-%20L I can click a link to go to the next level of subfolder: http://ourserver.com/rtsa-bin/PermaS...ne&pagetitle=M Third level of folder: http://ourserver.com/rtsa-bin/PermaS...ec&pagetitle=M and so on. A sample link for a single document within one of the pages in the web tree/directory is: javascript:openDocument('0900043d802b3528'); where clicking that link ultimately opens: http://ourserver.com/Documentation/03451TRs142.pdf Ultimately I need to recreate all the links in an Excel workbook so users can click on a hyperlink and access the relevant document. An Excel hyperlink that uses the javascript:opendocument command is totally fine with me, but first I need to collect them all. Alternatively I'll have to figure out how to cycle through each javascript command anyway, then identify the URL it opened (which sounds harder). Any advice or code snippets greatly appreciated- I haven't done anything with HTML at all. Thanks, Keith |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Embedded Excel sheets over 2 pages | Excel Worksheet Functions | |||
Embedded external links | Excel Discussion (Misc queries) | |||
Scraping/listing document URLs on a server that don't have web pages/existing links? | Excel Programming | |||
How do I make an embedded excel spreadsheet flow over pages in wo | Excel Discussion (Misc queries) | |||
Running an add-in recursively on embedded files | Excel Programming |