Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Automated Web Query / IQY / organizing results
Hello
I am new to this forum. I am also new on VBA. Thank you for you indications that will help me find my way through. I want to extract data from .htm files. All the files are in the sam folder. Their names are "index-1.htm" to "index-6500.htm". (There ar about 4500 files, meaning that some numbers are not used.) Th interesting data are located in the middle of the html file, inside tw tables. Ultimately, I need the data inside a single excel spreadsheet, dat from each page into a single line. How can I do this? Until now, I have try to use the query tool from, but I don't know ho to configure it precisely. The Data are held in two tables, 36 and 37. My iqy files looks like this: ____________________________ WEB 1 file:///E:/JDN/index-23.htm Selection=36,37 Formatting=None PreFormattedTextToColumns=True ConsecutiveDelimitersAsOne=True SingleBlockTextImport=False DisableDateRecognition=False DisableRedirections=False ________________________________ I don't really know where to go from there. Could you help me validat the different steps that I have planned and give me some hints to fin my way? 1- From the final worksheet, launch a macro that opens the index-1.ht file. If the record does not exist, it opens index-2.htm, and so on. 2- Import the data into a temp worksheet. (Where data can be process) 3- Erase unnecessary rows and cells. (i.e. the last rows of the secon table needs to be erased) 4- Identify the data form the table depending on the title of the row next to it (city, country, etc...). Typically, the second table ma change in length: some records contain more fields than others (business phone / home phone / Cell phone, etc.). When not used, the don't appear on the .htm page. At that point the data are more or less held in a table, 2 columns X 1 to 25 rows. The first column contains the type of data held in th second column. 5- Moved the cells to a second table depending of the title next to it Leave blank cells when the field has not been used. 6- Transpose the second column into a rows on the final sheet, clea the temp sheet. 7- Close the index-2.htm files and try to open the index-3.htm file. Done. Is this a correct approach to the problem? Is there any other easie way? Also, the .htm files used to be on a web site. I used Teleport Pro t download all of them, so it could be opened faster. Each .htm file weight about 75Ko as there are many recursive info (top menu, lef menu, roght menu, bottom menu) on each file. Do you know of an automated solution to remove all the recursive lines ? Your help is apreciated. Best regards, Bertran -- Message posted from http://www.ExcelForum.com |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Reading results for automated Myers Briggs Type Indicator? | Excel Worksheet Functions | |||
Web Query Sleep for until results come | Excel Worksheet Functions | |||
Set a PivotCache as the results of a query | Excel Programming | |||
MS Query Limit results | Excel Programming | |||
Query Results | Excel Programming |