![]() |
Automating Web Query import
I use hyperlinks for chemicals that access a webpage for the individual
compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
Ther are tow basic mathods that can be used.
1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
Hi Joel,
Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
Thhe code below was very simple. I did it in about 15 minutes. It gets the
names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
I had to switch to method 2 to get each chemical webpage. Try this code.
Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
Joel,
This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
Right now I'm not sure which is the better method to use to get the data from
the tables. It depends on which items from the tables you ae looking for. I used method 1 (web query) to download everything from the table. Some of the data was hard to manipulate into ccolumn format on a spreadsheet. We would have to do something like I did to get the chemical names. first perform a query into a temporary page and then move the data to a summary page. When I work with the Internet Explorer it is harder to figure out how to get the data, but the coding usually is simplier. I use a few tricks to get the data. 1) I look at the source code for the webpage by using the Internet Explorer menu View - Source. I first look for the data I'm trying to extract. I look in particular for tags. a tag with look like the following <abc .................................................. ..................... /abc The tag would be "abc". The tags are usually nested and there is always an opening (<) and closing () character. The closing marker. The closing marker may not have the tag name and may just lmay be a forward slah followed by an angle bracket (/). I also look for classname which are in the source id="msviRegionId" The Clssname is always id= followed by the name in double quotes. Using code you can get these items using the folowwing two statements Set ClassB = IE.document.getelementsbytagname("abc") Set RegionID = IE.document.getElementById("msviRegionId") Each will return multiple items and to get each item you can use a look like this for each itm in ClassB 'add code here next itm 2) The data is always in the innertext property. It depends how the data is organized which method(s) I use to get the data. I usually dump the data first to a spreadsheet using the following code RowCount = 1 for each itm in IE.Document.All Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm Note: innertext can sometimes by very long a cause memory errors. I sometimes have to limit the data using left Range("B" & RowCount) = left(itm.classname,1024) or RowCount = 1 for each itm in ClassB Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm 3) I also you break point and watch items. I will put a break in one of the FOR loop above by left click with mouse on the line of code to bring the cursor to the line. Then Pressing F9 to add Break Point. Then I add a WATCH item for debugging. I will highlight ITM with the mouse and then right click. Next I will select ADD WATCH and press OK on dialog window to add the watch. Finally, I will press the plus (+) sign on the watch window to see the data. Sometimes you can find the data in linked lists under "children". There are other properties that are sometimes useful like href (I used thsi to get the URL addresses of the chemicals). "Roger on Excel" wrote: Joel, This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
Thanks for looking at this Joel,
I will take a look at what you suggest later today. It sounds quite complicated, but I am keen to learn more so will work through your suggestions and see what I get. Best regards, Roger "Joel" wrote: Right now I'm not sure which is the better method to use to get the data from the tables. It depends on which items from the tables you ae looking for. I used method 1 (web query) to download everything from the table. Some of the data was hard to manipulate into ccolumn format on a spreadsheet. We would have to do something like I did to get the chemical names. first perform a query into a temporary page and then move the data to a summary page. When I work with the Internet Explorer it is harder to figure out how to get the data, but the coding usually is simplier. I use a few tricks to get the data. 1) I look at the source code for the webpage by using the Internet Explorer menu View - Source. I first look for the data I'm trying to extract. I look in particular for tags. a tag with look like the following <abc .................................................. .................... /abc The tag would be "abc". The tags are usually nested and there is always an opening (<) and closing () character. The closing marker. The closing marker may not have the tag name and may just lmay be a forward slah followed by an angle bracket (/). I also look for classname which are in the source id="msviRegionId" The Clssname is always id= followed by the name in double quotes. Using code you can get these items using the folowwing two statements Set ClassB = IE.document.getelementsbytagname("abc") Set RegionID = IE.document.getElementById("msviRegionId") Each will return multiple items and to get each item you can use a look like this for each itm in ClassB 'add code here next itm 2) The data is always in the innertext property. It depends how the data is organized which method(s) I use to get the data. I usually dump the data first to a spreadsheet using the following code RowCount = 1 for each itm in IE.Document.All Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm Note: innertext can sometimes by very long a cause memory errors. I sometimes have to limit the data using left Range("B" & RowCount) = left(itm.classname,1024) or RowCount = 1 for each itm in ClassB Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm 3) I also you break point and watch items. I will put a break in one of the FOR loop above by left click with mouse on the line of code to bring the cursor to the line. Then Pressing F9 to add Break Point. Then I add a WATCH item for debugging. I will highlight ITM with the mouse and then right click. Next I will select ADD WATCH and press OK on dialog window to add the watch. Finally, I will press the plus (+) sign on the watch window to see the data. Sometimes you can find the data in linked lists under "children". There are other properties that are sometimes useful like href (I used thsi to get the URL addresses of the chemicals). "Roger on Excel" wrote: Joel, This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
I'm working on it. Send you something around lunch time
"Roger on Excel" wrote: Thanks for looking at this Joel, I will take a look at what you suggest later today. It sounds quite complicated, but I am keen to learn more so will work through your suggestions and see what I get. Best regards, Roger "Joel" wrote: Right now I'm not sure which is the better method to use to get the data from the tables. It depends on which items from the tables you ae looking for. I used method 1 (web query) to download everything from the table. Some of the data was hard to manipulate into ccolumn format on a spreadsheet. We would have to do something like I did to get the chemical names. first perform a query into a temporary page and then move the data to a summary page. When I work with the Internet Explorer it is harder to figure out how to get the data, but the coding usually is simplier. I use a few tricks to get the data. 1) I look at the source code for the webpage by using the Internet Explorer menu View - Source. I first look for the data I'm trying to extract. I look in particular for tags. a tag with look like the following <abc .................................................. .................... /abc The tag would be "abc". The tags are usually nested and there is always an opening (<) and closing () character. The closing marker. The closing marker may not have the tag name and may just lmay be a forward slah followed by an angle bracket (/). I also look for classname which are in the source id="msviRegionId" The Clssname is always id= followed by the name in double quotes. Using code you can get these items using the folowwing two statements Set ClassB = IE.document.getelementsbytagname("abc") Set RegionID = IE.document.getElementById("msviRegionId") Each will return multiple items and to get each item you can use a look like this for each itm in ClassB 'add code here next itm 2) The data is always in the innertext property. It depends how the data is organized which method(s) I use to get the data. I usually dump the data first to a spreadsheet using the following code RowCount = 1 for each itm in IE.Document.All Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm Note: innertext can sometimes by very long a cause memory errors. I sometimes have to limit the data using left Range("B" & RowCount) = left(itm.classname,1024) or RowCount = 1 for each itm in ClassB Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm 3) I also you break point and watch items. I will put a break in one of the FOR loop above by left click with mouse on the line of code to bring the cursor to the line. Then Pressing F9 to add Break Point. Then I add a WATCH item for debugging. I will highlight ITM with the mouse and then right click. Next I will select ADD WATCH and press OK on dialog window to add the watch. Finally, I will press the plus (+) sign on the watch window to see the data. Sometimes you can find the data in linked lists under "children". There are other properties that are sometimes useful like href (I used thsi to get the URL addresses of the chemicals). "Roger on Excel" wrote: Joel, This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
The best way of doing this way to use Internet Explorer to get the chemical
names and webpages. It was better to use web query to get the actual data. Here is all the code. It was mostly copying and modifying the same code over and over and over again. I have an extra line of code "Exit For" to allow only the first chemical to run. Make sure you verify what I did before removing line and running 5600 chemicals Sub Getchemicals2() Found = False For Each sht In Sheets If sht.Name = "Chemicals" Then Found = True Exit For End If Next sht If Found = False Then Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Else Sheets("Chemicals").Cells.ClearContents End If Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub Sub GetData() Found = False For Each sht In Sheets If sht.Name = "Temp" Then Found = True Exit For End If Next sht If Found = False Then Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Else Set TempSht = Sheets("Temp") TempSht.Cells.ClearContents End If Found = False For Each sht In Sheets If sht.Name = "Data" Then Found = True Exit For End If Next sht If Found = False Then Set DataSht = Sheets.Add(after:=Sheets(Sheets.Count)) DataSht.Name = "Data" Else Set DataSht = Sheets("Data") DataSht.Cells.ClearContents End If Call MakeHeaders Set ChemicalSht = Sheets("Chemicals") With ChemicalSht LastRow = .Range("A" & Rows.Count).End(xlUp).Row Set Chemicals = .Range("A1:A" & LastRow) NewRowCount = 4 For Each Chemical In Chemicals TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ "URL;" & Chemical.Offset(0, 1), _ Destination:=TempSht.Range("A1")) .Name = "Temp" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With Call MoveData(Chemical, NewRowCount) NewRowCount = NewRowCount + 1 '------------------------------------- Remove Line ------------------------------------------------ Exit For '-------------------------------------------------------------------------------------------------- Next Chemical End With DataSht.Columns("A:AQ").AutoFit DataSht.Columns("AG").ColumnWidth = 50 DataSht.Rows("1:" & NewRowCount).VerticalAlignment = xlTop End Sub Sub MakeHeaders() With Sheets("Data") .Range("A1") = "Chemical Name" .Range("B1") = "Generic Name(s)" .Range("C1") = "CAS No" .Range("D1") = "RTECS No" .Range("E1") = "UN No" .Range("F1") = "EC No" .Range("G1") = "Alternate Names" .Columns("G").WrapText = True .Range("H1") = "Molecular Mass" .Range("I1:K1").MergeCells = True .Range("I1") = "Fire Hazard" .Range("I1").HorizontalAlignment = xlCenter .Range("I2") = "Acute Hazard/Symptoms" .Range("J2") = "Prevention" .Range("K2") = "First Aid/Fire Fighting" .Range("L1:N1").MergeCells = True .Range("L1") = "Explosion Hazard" .Range("L1").HorizontalAlignment = xlCenter .Range("L2") = "Acute Hazard/Symptoms" .Range("M2") = "Prevention" .Range("N2") = "First Aid/Fire Fighting" .Range("O1:Q1").MergeCells = True .Range("O1") = "Exposure" .Range("O1").HorizontalAlignment = xlCenter .Range("O2") = "Acute Hazard/Symptoms" .Range("P2") = "Prevention" .Range("Q2") = "First Aid/Fire Fighting" .Range("R1:T1").MergeCells = True .Range("R1") = "Inhalation Exposure" .Range("R1").HorizontalAlignment = xlCenter .Range("R2") = "Acute Hazard/Symptoms" .Range("S2") = "Prevention" .Range("T2") = "First Aid/Fire Fighting" .Range("U1:W1").MergeCells = True .Range("U1") = "Skin Exposure" .Range("U1").HorizontalAlignment = xlCenter .Range("U2") = "Acute Hazard/Symptoms" .Range("V2") = "Prevention" .Range("W2") = "First Aid/Fire Fighting" .Range("X1:Z1").MergeCells = True .Range("X1") = "Eyes Exposure" .Range("X1").HorizontalAlignment = xlCenter .Range("X2") = "Acute Hazard/Symptoms" .Range("Y2") = "Prevention" .Range("Z2") = "First Aid/Fire Fighting" .Range("AA1:AC1").MergeCells = True .Range("AA1") = "Ingestion Exposure" .Range("AA1").HorizontalAlignment = xlCenter .Range("AA2") = "Acute Hazard/Symptoms" .Range("AB2") = "Prevention" .Range("AC2") = "First Aid/Fire Fighting" .Range("AD1") = "Spillage Disposal" .Range("AE1") = "Packaging and Labelling" .Columns("AE").WrapText = True .Range("AF1") = "Emergency Response" .Range("AG1") = "Safe Storage" .Columns("AG").WrapText = True .Range("AH1") = "Physical State; Appearance" .Range("AI1") = "Routes of exposure" .Range("AJ1") = "Chemical dangers" .Range("AK1") = "Inhalation risk" .Range("AL1") = "Occupational exposure limits" .Range("AM1") = "Effects of short-term exposure" .Range("AN1") = "Effects of long-term or repeated exposure" .Range("AO1") = "PHYSICAL PROPERTIES" .Range("AP1") = "ENVIRONMENTAL DATA" .Range("AQ1") = "NOTES" .Columns("A:AQ").AutoFit End With End Sub Sub MoveData(Chemical, RowCount) Set DataSht = Sheets("Data") 'Use ICSC: to get chemical names With Sheets("Temp") DataSht.Range("A" & RowCount) = Chemical Set c = .Columns("B").Find(what:="ICSC:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find ISCS for Chemical : " & Chemical) Stop Else 'Move Generic Name DataSht.Range("B" & RowCount) = c.Offset(0, -1).Value End If 'Use ISCS Number to find first Row of Alternate Names FirstAlternateRow = c.Row + 2 Set c = .Columns("A").Find(what:="CAS No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find CAS Number for Chemical : " & Chemical) Stop Else 'Move Generic Name DataSht.Range("C" & RowCount) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If 'Use CAS Number to find Last Row of Alternate Names LastAlternateRow = c.Row - 1 'Get Alternate Names Alternate = "" For TempRowCount = FirstAlternateRow To LastAlternateRow If Alternate = "" Then Alternate = .Range("A" & TempRowCount) Else Alternate = Alternate & Chr(10) & .Range("A" & TempRowCount) End If Next TempRowCount 'Move Alternate Name DataSht.Range("G" & RowCount) = Alternate Set c = .Columns("A").Find(what:="RTECS No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find RTECS Number for Chemical : " & Chemical) Stop Else 'Move RTECS DataSht.Range("D" & RowCount) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If Set c = .Columns("A").Find(what:="UN No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find UN Number for Chemical : " & Chemical) Stop Else 'Move UN No DataSht.Range("E" & RowCount) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If Set c = .Columns("A").Find(what:="EC No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find EC Number for Chemical : " & Chemical) Stop Else 'Move EC No DataSht.Range("F" & RowCount) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If Set c = .Columns("C").Find(what:="Molecular mass:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Molecular for Chemical : " & Chemical) Stop Else 'Move molecular mass DataSht.Range("G" & RowCount) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If Set c = .Columns("A").Find(what:="FIRE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find FIRE Hazard for Chemical : " & Chemical) Stop Else 'Move Fire hazard DataSht.Range("I" & RowCount) = c.Offset(0, 1).Value DataSht.Range("J" & RowCount) = c.Offset(0, 2).Value DataSht.Range("K" & RowCount) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="EXPLOSION", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Explosion Hazard for Chemical : " & Chemical) Stop Else 'Move Explosion Hazard DataSht.Range("L" & RowCount) = c.Offset(0, 1).Value DataSht.Range("M" & RowCount) = c.Offset(0, 2).Value DataSht.Range("N" & RowCount) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="EXPOSURE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Exposure for Chemical : " & Chemical) Stop Else 'find 2nd occurance Set c = .Columns("A").FindNext(after:=c) 'Move Exposure DataSht.Range("O" & RowCount) = c.Offset(0, 1).Value DataSht.Range("P" & RowCount) = c.Offset(0, 2).Value DataSht.Range("Q" & RowCount) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Inhalation", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Inhalation Exposure for Chemical : " & Chemical) Stop Else 'Move Inhalation Exposure DataSht.Range("R" & RowCount) = c.Offset(0, 1).Value DataSht.Range("S" & RowCount) = c.Offset(0, 2).Value DataSht.Range("T" & RowCount) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Skin", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Skin Exposure for Chemical : " & Chemical) Stop Else 'Move Skin Exposure DataSht.Range("U" & RowCount) = c.Offset(0, 1).Value DataSht.Range("V" & RowCount) = c.Offset(0, 2).Value DataSht.Range("W" & RowCount) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Eyes", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Eyes Exposure for Chemical : " & Chemical) Stop Else 'Move Eyes Exposure DataSht.Range("X" & RowCount) = c.Offset(0, 1).Value DataSht.Range("Y" & RowCount) = c.Offset(0, 2).Value DataSht.Range("Z" & RowCount) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Ingestion", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Ingestion Exposure for Chemical : " & Chemical) Stop Else 'Move Ingestion Exposure DataSht.Range("AA" & RowCount) = c.Offset(0, 1).Value DataSht.Range("AB" & RowCount) = c.Offset(0, 2).Value DataSht.Range("AC" & RowCount) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="SPILLAGE DISPOSAL", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find SPILLAGE DISPOSAL for Chemical : " & Chemical) Stop Else 'Move SPILLAGE DISPOSAL DataSht.Range("AD" & RowCount) = c.Offset(1, 0).Value End If Set c = .Columns("B").Find(what:="PACKAGING & LABELLING", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find PACKAGING & LABELLING for Chemical : " & Chemical) Stop Else 'Move PACKAGING & LABELLING Packaging = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If Packaging = "" Then Packaging = .Range("B" & TempRowCount) Else Packaging = Packaging & Chr(10) & .Range("B" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AE" & RowCount) = Packaging End If Set c = .Columns("A").Find(what:="EMERGENCY RESPONSE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find EMERGENCY RESPONSE for Chemical : " & Chemical) Stop Else 'Move EMERGENCY RESPONSE DataSht.Range("AF" & RowCount) = c.Offset(1, 0).Value End If Set c = .Columns("B").Find(what:="SAFE STORAGE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Safe Storage for Chemical : " & Chemical) Stop Else 'Move Safe Storage DataSht.Range("AG" & RowCount) = c.Offset(1, 0).Value End If Set c = .Columns("A").Find(what:="Physical State; Appearance", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Physical State; Appearance for Chemical : " & Chemical) Stop Else 'Move Physical State; Appearance Appearance = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Appearance = "" Then Appearance = .Range("A" & TempRowCount) Else Appearance = Appearance & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AH" & RowCount) = Appearance End If Set c = .Columns("B").Find(what:="Routes of exposure", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Routes of exposure for Chemical : " & Chemical) Stop Else 'Move Routes of exposure Routes = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If Routes = "" Then Routes = .Range("B" & TempRowCount) Else Routes = Routes & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AI" & RowCount) = Routes End If Set c = .Columns("A").Find(what:="Chemical dangers", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Chemical dangers for Chemical : " & Chemical) Stop Else 'Move Chemical dangers Dangers = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Dangers = "" Then Dangers = .Range("A" & TempRowCount) Else Dangers = Dangers & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AJ" & RowCount) = Dangers End If Set c = .Columns("B").Find(what:="Inhalation risk", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Inhalation risk for Chemical : " & Chemical) Stop Else 'Move Inhalation risk Inhalation = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If Inhalation = "" Then Inhalation = .Range("B" & TempRowCount) Else Inhalation = Inhalation & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AK" & RowCount) = Inhalation End If Set c = .Columns("A").Find(what:="Occupational exposure limits", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Occupational exposure limits for Chemical : " & Chemical) Stop Else 'Move Occupational exposure limits Occupational = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Occupational = "" Then Occupational = .Range("A" & TempRowCount) Else Occupational = Occupational & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AL" & RowCount) = Occupational End If Set c = .Columns("B").Find(what:="Effects of short-term exposure", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Effects of short-term exposure for Chemical : " & Chemical) Stop Else 'Move Effects of short-term exposure ShortTerm = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If ShortTerm = "" Then ShortTerm = .Range("B" & TempRowCount) Else ShortTerm = ShortTerm & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AM" & RowCount) = ShortTerm End If Set c = .Columns("B").Find(what:="Effects of long-term or repeated exposure", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Effects of long-term or repeated exposure for Chemical : " & Chemical) Stop Else 'Move Effects of long-term or repeated exposure LongTerm = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If LongTerm = "" Then LongTerm = .Range("B" & TempRowCount) Else LongTerm = LongTerm & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AN" & RowCount) = LongTerm End If Set c = .Columns("A").Find(what:="PHYSICAL PROPERTIES", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find PHYSICAL PROPERTIES for Chemical : " & Chemical) Stop Else 'Move PHYSICAL PROPERTIES Physical = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If Physical = "" Then Physical = .Range("B" & TempRowCount) Else Physical = Physical & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Range("AO" & RowCount) = Physical End If Set c = .Columns("B").Find(what:="ENVIRONMENTAL DATA", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find ENVIRONMENTAL DATA for Chemical : " & Chemical) Stop Else 'Move ENVIRONMENTAL DATA DataSht.Range("AP" & RowCount) = c.Offset(1, 0).Value End If Set c = .Columns("A").Find(what:="NOTES", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find NOTES for Chemical : " & Chemical) Stop Else 'Move NOTES DataSht.Range("AQ" & RowCount) = c.Offset(1, 0).Value End If End With End Sub |
Automating Web Query import
I found one error. the molecular Mass was in the wrong column
from 'Move molecular mass DataSht.Range("g" & RowCount) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) to 'Move molecular mass DataSht.Range("H" & RowCount) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) "Roger on Excel" wrote: Thanks for looking at this Joel, I will take a look at what you suggest later today. It sounds quite complicated, but I am keen to learn more so will work through your suggestions and see what I get. Best regards, Roger "Joel" wrote: Right now I'm not sure which is the better method to use to get the data from the tables. It depends on which items from the tables you ae looking for. I used method 1 (web query) to download everything from the table. Some of the data was hard to manipulate into ccolumn format on a spreadsheet. We would have to do something like I did to get the chemical names. first perform a query into a temporary page and then move the data to a summary page. When I work with the Internet Explorer it is harder to figure out how to get the data, but the coding usually is simplier. I use a few tricks to get the data. 1) I look at the source code for the webpage by using the Internet Explorer menu View - Source. I first look for the data I'm trying to extract. I look in particular for tags. a tag with look like the following <abc .................................................. .................... /abc The tag would be "abc". The tags are usually nested and there is always an opening (<) and closing () character. The closing marker. The closing marker may not have the tag name and may just lmay be a forward slah followed by an angle bracket (/). I also look for classname which are in the source id="msviRegionId" The Clssname is always id= followed by the name in double quotes. Using code you can get these items using the folowwing two statements Set ClassB = IE.document.getelementsbytagname("abc") Set RegionID = IE.document.getElementById("msviRegionId") Each will return multiple items and to get each item you can use a look like this for each itm in ClassB 'add code here next itm 2) The data is always in the innertext property. It depends how the data is organized which method(s) I use to get the data. I usually dump the data first to a spreadsheet using the following code RowCount = 1 for each itm in IE.Document.All Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm Note: innertext can sometimes by very long a cause memory errors. I sometimes have to limit the data using left Range("B" & RowCount) = left(itm.classname,1024) or RowCount = 1 for each itm in ClassB Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm 3) I also you break point and watch items. I will put a break in one of the FOR loop above by left click with mouse on the line of code to bring the cursor to the line. Then Pressing F9 to add Break Point. Then I add a WATCH item for debugging. I will highlight ITM with the mouse and then right click. Next I will select ADD WATCH and press OK on dialog window to add the watch. Finally, I will press the plus (+) sign on the watch window to see the data. Sometimes you can find the data in linked lists under "children". There are other properties that are sometimes useful like href (I used thsi to get the URL addresses of the chemicals). "Roger on Excel" wrote: Joel, This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
I tested for a few more chemicals and found that not all chemicals have the
same properties. You have to remove some of the STOP and Error messages. Also you may have to add more columns for properties I do not have coded. "Roger on Excel" wrote: Thanks for looking at this Joel, I will take a look at what you suggest later today. It sounds quite complicated, but I am keen to learn more so will work through your suggestions and see what I get. Best regards, Roger "Joel" wrote: Right now I'm not sure which is the better method to use to get the data from the tables. It depends on which items from the tables you ae looking for. I used method 1 (web query) to download everything from the table. Some of the data was hard to manipulate into ccolumn format on a spreadsheet. We would have to do something like I did to get the chemical names. first perform a query into a temporary page and then move the data to a summary page. When I work with the Internet Explorer it is harder to figure out how to get the data, but the coding usually is simplier. I use a few tricks to get the data. 1) I look at the source code for the webpage by using the Internet Explorer menu View - Source. I first look for the data I'm trying to extract. I look in particular for tags. a tag with look like the following <abc .................................................. .................... /abc The tag would be "abc". The tags are usually nested and there is always an opening (<) and closing () character. The closing marker. The closing marker may not have the tag name and may just lmay be a forward slah followed by an angle bracket (/). I also look for classname which are in the source id="msviRegionId" The Clssname is always id= followed by the name in double quotes. Using code you can get these items using the folowwing two statements Set ClassB = IE.document.getelementsbytagname("abc") Set RegionID = IE.document.getElementById("msviRegionId") Each will return multiple items and to get each item you can use a look like this for each itm in ClassB 'add code here next itm 2) The data is always in the innertext property. It depends how the data is organized which method(s) I use to get the data. I usually dump the data first to a spreadsheet using the following code RowCount = 1 for each itm in IE.Document.All Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm Note: innertext can sometimes by very long a cause memory errors. I sometimes have to limit the data using left Range("B" & RowCount) = left(itm.classname,1024) or RowCount = 1 for each itm in ClassB Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm 3) I also you break point and watch items. I will put a break in one of the FOR loop above by left click with mouse on the line of code to bring the cursor to the line. Then Pressing F9 to add Break Point. Then I add a WATCH item for debugging. I will highlight ITM with the mouse and then right click. Next I will select ADD WATCH and press OK on dialog window to add the watch. Finally, I will press the plus (+) sign on the watch window to see the data. Sometimes you can find the data in linked lists under "children". There are other properties that are sometimes useful like href (I used thsi to get the URL addresses of the chemicals). "Roger on Excel" wrote: Joel, This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger |
Automating Web Query import
Thanks so much for this Joel,
I will check it out tomorrow as im not feeling well this evening. I noticed that some of the fields have more data than others - a tricky problem for sure. All the best, Roger "Joel" wrote: I tested for a few more chemicals and found that not all chemicals have the same properties. You have to remove some of the STOP and Error messages. Also you may have to add more columns for properties I do not have coded. "Roger on Excel" wrote: Thanks for looking at this Joel, I will take a look at what you suggest later today. It sounds quite complicated, but I am keen to learn more so will work through your suggestions and see what I get. Best regards, Roger "Joel" wrote: Right now I'm not sure which is the better method to use to get the data from the tables. It depends on which items from the tables you ae looking for. I used method 1 (web query) to download everything from the table. Some of the data was hard to manipulate into ccolumn format on a spreadsheet. We would have to do something like I did to get the chemical names. first perform a query into a temporary page and then move the data to a summary page. When I work with the Internet Explorer it is harder to figure out how to get the data, but the coding usually is simplier. I use a few tricks to get the data. 1) I look at the source code for the webpage by using the Internet Explorer menu View - Source. I first look for the data I'm trying to extract. I look in particular for tags. a tag with look like the following <abc .................................................. .................... /abc The tag would be "abc". The tags are usually nested and there is always an opening (<) and closing () character. The closing marker. The closing marker may not have the tag name and may just lmay be a forward slah followed by an angle bracket (/). I also look for classname which are in the source id="msviRegionId" The Clssname is always id= followed by the name in double quotes. Using code you can get these items using the folowwing two statements Set ClassB = IE.document.getelementsbytagname("abc") Set RegionID = IE.document.getElementById("msviRegionId") Each will return multiple items and to get each item you can use a look like this for each itm in ClassB 'add code here next itm 2) The data is always in the innertext property. It depends how the data is organized which method(s) I use to get the data. I usually dump the data first to a spreadsheet using the following code RowCount = 1 for each itm in IE.Document.All Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm Note: innertext can sometimes by very long a cause memory errors. I sometimes have to limit the data using left Range("B" & RowCount) = left(itm.classname,1024) or RowCount = 1 for each itm in ClassB Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm 3) I also you break point and watch items. I will put a break in one of the FOR loop above by left click with mouse on the line of code to bring the cursor to the line. Then Pressing F9 to add Break Point. Then I add a WATCH item for debugging. I will highlight ITM with the mouse and then right click. Next I will select ADD WATCH and press OK on dialog window to add the watch. Finally, I will press the plus (+) sign on the watch window to see the data. Sometimes you can find the data in linked lists under "children". There are other properties that are sometimes useful like href (I used thsi to get the URL addresses of the chemicals). "Roger on Excel" wrote: Joel, This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. Let me know which approach you want to use. I can help with both approaches. "Roger on Excel" wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. |
Automating Web Query import
The macro I provided should be carefully checked. I used only a couple of
chemicals to check my results. I have noticed a few problems shown below For debugging and to run the code in pieces you can change these statement in getData(). when you run the code on the full set of data I would run it in sections. Maybe 500 to 1000 chemicals at a time. If the code ran 10 chemicals a minute, it would take about 10 hours to do all the chemicals. from LastRow = .Range("A" & Rows.Count).End(xlUp).Row Set Chemicals = .Range("A1:A" & LastRow) NewRowCount = 4 To FirstRow = 1 LastRow = .Range("A" & Rows.Count).End(xlUp).Row LastRow = 5 Set Chemicals = .Range("A" & FirstRow & ":A" & LastRow) NewRowCount = FirstRow + 3 'where the first chemical is put on the "Data" worksheet Note: Remeber to take out the EXIT FOR statement which only allows one chemical to get processed. To run the Code first get chemicals running the macro Getchemicals2(). You only need to run this macro once. Then run the Macro GetData() which will use the webpages from Getchemicals2() to get the rest of the data. 1) Some items I'm look at only one row of data on the Temp sheet and others I collected multiple rows by using a Do Loop and looking for the 1st blank row to terminate. I noticed that on the "Notes" section I'm looking a only one row where are some chemicals there are multiple rows for Notes. Also Emergency response I'm only looking at 1 row and missing the NFPA information. 2) I didn't include the chemical composition like C2H1O3. It is in the row directly above the Molecular Mass 3) I don't know if I included all the data because I used only a couple of chemicals as examples. There may be other properties (especially in the Important Data) that I may have missed. 4) The PDF printout of the chemicals has a section on Additional Information but I didn't see any chemicals with this information. Again I only looked at a few chemicals. "Roger on Excel" wrote: Thanks so much for this Joel, I will check it out tomorrow as im not feeling well this evening. I noticed that some of the fields have more data than others - a tricky problem for sure. All the best, Roger "Joel" wrote: I tested for a few more chemicals and found that not all chemicals have the same properties. You have to remove some of the STOP and Error messages. Also you may have to add more columns for properties I do not have coded. "Roger on Excel" wrote: Thanks for looking at this Joel, I will take a look at what you suggest later today. It sounds quite complicated, but I am keen to learn more so will work through your suggestions and see what I get. Best regards, Roger "Joel" wrote: Right now I'm not sure which is the better method to use to get the data from the tables. It depends on which items from the tables you ae looking for. I used method 1 (web query) to download everything from the table. Some of the data was hard to manipulate into ccolumn format on a spreadsheet. We would have to do something like I did to get the chemical names. first perform a query into a temporary page and then move the data to a summary page. When I work with the Internet Explorer it is harder to figure out how to get the data, but the coding usually is simplier. I use a few tricks to get the data. 1) I look at the source code for the webpage by using the Internet Explorer menu View - Source. I first look for the data I'm trying to extract. I look in particular for tags. a tag with look like the following <abc .................................................. .................... /abc The tag would be "abc". The tags are usually nested and there is always an opening (<) and closing () character. The closing marker. The closing marker may not have the tag name and may just lmay be a forward slah followed by an angle bracket (/). I also look for classname which are in the source id="msviRegionId" The Clssname is always id= followed by the name in double quotes. Using code you can get these items using the folowwing two statements Set ClassB = IE.document.getelementsbytagname("abc") Set RegionID = IE.document.getElementById("msviRegionId") Each will return multiple items and to get each item you can use a look like this for each itm in ClassB 'add code here next itm 2) The data is always in the innertext property. It depends how the data is organized which method(s) I use to get the data. I usually dump the data first to a spreadsheet using the following code RowCount = 1 for each itm in IE.Document.All Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm Note: innertext can sometimes by very long a cause memory errors. I sometimes have to limit the data using left Range("B" & RowCount) = left(itm.classname,1024) or RowCount = 1 for each itm in ClassB Range("A" & RowCount) = itm.tagname Range("B" & RowCount) = itm.classname Range("C" & RowCount) = itm.innertext RowCount = RowCount + 1 next itm 3) I also you break point and watch items. I will put a break in one of the FOR loop above by left click with mouse on the line of code to bring the cursor to the line. Then Pressing F9 to add Break Point. Then I add a WATCH item for debugging. I will highlight ITM with the mouse and then right click. Next I will select ADD WATCH and press OK on dialog window to add the watch. Finally, I will press the plus (+) sign on the watch window to see the data. Sometimes you can find the data in linked lists under "children". There are other properties that are sometimes useful like href (I used thsi to get the URL addresses of the chemicals). "Roger on Excel" wrote: Joel, This is excellent. Thanks for your advice. How would I download other information from other tables - for example I would like to input further data from the tables into further adjacent columns so one can use the table as a vlookup source? I see you use references like itm.innertext - does this select individual line items from the tables on the web page? Thanks, Roger "Joel" wrote: I had to switch to method 2 to get each chemical webpage. Try this code. Which data do you need? try using method one manually (Data - Import External Data - New Web Query) with one of the webpages from the code below and select one or more tables and see if the results are usable. I will help as required. Sub Getchemicals2() Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub "Joel" wrote: Thhe code below was very simple. I did it in about 15 minutes. It gets the names of all the chemicals. I will work on the rest later. Sub Getchemicals() Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Summary" URLFolder = "URL;http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For i = 0 To 25 AlphaLetter = Chr(Asc("a") + i) TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ URLFolder & AlphaLetter & "_index.htm", _ Destination:=TempSht.Range("A1")) .Name = "a_index" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With 'move data from tempory sheet to chemical sheet TempRowCount = 17 Do While Range("C" & TempRowCount) < "" ChemicalSht.Range("A" & ChemicalRowCount) = _ TempSht.Range("C" & TempRowCount) ChemicalRowCount = ChemicalRowCount + 1 TempRowCount = TempRowCount + 1 Loop Next i TempSht.Cells.ClearContents End Sub "Roger on Excel" wrote: Hi Joel, Thanks for responding. The first approach would work best for me - i have over 1000 chemicals in my list. here is an example of the website i would like to gather specific data tables from : http://www.ilo.org/public/english/pr...5/icsc0553.htm each chemical has a hyperlink to its own web page like this one. Let me know what you think Best regards, Roger ---------------------------------- "Joel" wrote: Ther are tow basic mathods that can be used. 1) Perform a web query. First setup a Reord Macro (Tools - Macro - Start Recording).. then perform one query manually by going to Data - Import External Data - New Web Query. Next modify the the recorded macro as required to add a loop changing the chemicals and the destination location so the data doesn't over-write each other. If one above doesn't work 2) Open an Internet Explorer application in Excel. through the Internet Explorer request each chemical and extract each results through the Internet Explorer Application. |
Automating Web Query import
I found some errors and didn't like how I was specifying the column. I
converted the column Leeter to constants so it would be easier to add, delete and move columns. I also the following: 1) Molecular formula 2) Added to Emergeny Response getting multiple rows of data 3) Added to Notes getting multiple rows of data Sub Getchemicals2() Found = False For Each sht In Sheets If sht.Name = "Chemicals" Then Found = True Exit For End If Next sht If Found = False Then Set ChemicalSht = Sheets.Add(after:=Sheets(Sheets.Count)) ChemicalSht.Name = "Chemicals" Else Sheets("Chemicals").Cells.ClearContents End If Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True URLFolder = _ "http://www.ilo.org/public/english/protection/safework/cis/products/icsc/dtasht/" ChemicalRowCount = 1 For Letters = 0 To 25 AlphaLetter = Chr(Asc("a") + Letters) URL = URLFolder & AlphaLetter & "_index.htm" 'get web page IE.Navigate2 URL Do While IE.readyState < 4 DoEvents Loop Do While IE.busy = True DoEvents Loop H2Found = False For Each itm In IE.document.all If H2Found = False Then If itm.tagname = "H2" Then H2Found = True End If Else If itm.tagname = "A" Then If itm.innertext = "" Then Exit For 'chemical name ChemicalSht.Range("A" & ChemicalRowCount) = itm.innertext 'webpage ChemicalSht.Range("B" & ChemicalRowCount) = itm.href ChemicalRowCount = ChemicalRowCount + 1 End If End If Next itm Next Letters End Sub Const ChemNameCol = 1 Const GenericNameCol = ChemNameCol + 1 Const CASNoCol = GenericNameCol + 1 Const RTECSNoCol = CASNoCol + 1 Const UNNoCol = RTECSNoCol + 1 Const ECNoCol = UNNoCol + 1 Const MolecularFormCol = ECNoCol + 1 Const AltNameCol = MolecularFormCol + 1 Const MoleMassCol = AltNameCol + 1 'Group of 3 Columns Const FireHazCol = MoleMassCol + 1 Const ExplosHazCol = FireHazCol + 3 Const ExposureCol = ExplosHazCol + 3 Const InhalCol = ExposureCol + 3 Const SkinCol = InhalCol + 3 Const EyesCol = SkinCol + 3 Const IngestCol = EyesCol + 3 Const SpillDisposCol = IngestCol + 3 Const PackCol = SpillDisposCol + 1 Const EmergRespCol = PackCol + 1 Const SafeStorCol = EmergRespCol + 1 Const PhysStateCol = SafeStorCol + 1 Const RoutesCol = PhysStateCol + 1 Const ChemDangCol = RoutesCol + 1 Const InhalRiskCol = ChemDangCol + 1 Const OccupatCol = InhalRiskCol + 1 Const ShortTermCol = OccupatCol + 1 Const LongTermCol = ShortTermCol + 1 Const PhysicPropCol = LongTermCol + 1 Const EnvironCol = PhysicPropCol + 1 Const NoteCol = EnvironCol + 1 Const LastCol = NoteCol Sub GetData() Found = False For Each sht In Sheets If sht.Name = "Temp" Then Found = True Exit For End If Next sht If Found = False Then Set TempSht = Sheets.Add(after:=Sheets(Sheets.Count)) TempSht.Name = "Temp" Else Set TempSht = Sheets("Temp") TempSht.Cells.ClearContents End If Found = False For Each sht In Sheets If sht.Name = "Data" Then Found = True Exit For End If Next sht If Found = False Then Set DataSht = Sheets.Add(after:=Sheets(Sheets.Count)) DataSht.Name = "Data" Else Set DataSht = Sheets("Data") DataSht.Cells.ClearContents End If Call MakeHeaders Set ChemicalSht = Sheets("Chemicals") With ChemicalSht FirstRow = 1 LastRow = .Range("A" & Rows.Count).End(xlUp).Row LastRow = 5 Set Chemicals = .Range("A" & FirstRow & ":A" & LastRow) NewRowCount = FirstRow + 3 For Each Chemical In Chemicals TempSht.Cells.ClearContents With TempSht.QueryTables.Add(Connection:= _ "URL;" & Chemical.Offset(0, 1), _ Destination:=TempSht.Range("A1")) .Name = "Temp" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With Call MoveData(Chemical, NewRowCount) NewRowCount = NewRowCount + 1 Next Chemical End With DataSht.Columns("A:AQ").AutoFit DataSht.Columns("AG").ColumnWidth = 50 DataSht.Rows("1:" & NewRowCount).VerticalAlignment = xlTop End Sub Sub MakeHeaders() With Sheets("Data") .Cells(1, ChemNameCol) = "Chemical Name" .Cells(1, GenericNameCol) = "Generic Name(s)" .Cells(1, CASNoCol) = "CAS No" .Cells(1, RTECSNoCol) = "RTECS No" .Cells(1, UNNoCol) = "UN No" .Cells(1, ECNoCol) = "EC No" .Cells(1, MolecularFormCol) = "Molucular Formula" .Cells(1, AltNameCol) = "Alternate Names" .Columns(AltNameCol).WrapText = True .Cells(1, MoleMassCol) = "Molecular Mass" .Range(.Cells(1, FireHazCol), .Cells(1, FireHazCol + 2)).MergeCells = True .Cells(1, FireHazCol) = "Fire Hazard" .Cells(1, FireHazCol).HorizontalAlignment = xlCenter .Cells(2, FireHazCol) = "Acute Hazard/Symptoms" .Cells(2, FireHazCol + 1) = "Prevention" .Cells(2, FireHazCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, ExplosHazCol), .Cells(1, ExplosHazCol + 2)).MergeCells = True .Cells(1, ExplosHazCol) = "Explosion Hazard" .Cells(1, ExplosHazCol).HorizontalAlignment = xlCenter .Cells(2, ExplosHazCol) = "Acute Hazard/Symptoms" .Cells(2, ExplosHazCol + 1) = "Prevention" .Cells(2, ExplosHazCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, ExposureCol), .Cells(1, ExposureCol + 2)).MergeCells = True .Cells(1, ExposureCol) = "Exposure" .Cells(1, ExposureCol).HorizontalAlignment = xlCenter .Cells(2, ExposureCol) = "Acute Hazard/Symptoms" .Cells(2, ExposureCol + 1) = "Prevention" .Cells(2, ExposureCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, InhalCol), .Cells(1, InhalCol + 2)).MergeCells = True .Cells(1, InhalCol) = "Inhalation Exposure" .Cells(1, InhalCol).HorizontalAlignment = xlCenter .Cells(2, InhalCol) = "Acute Hazard/Symptoms" .Cells(2, InhalCol + 1) = "Prevention" .Cells(2, InhalCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, SkinCol), .Cells(1, SkinCol + 2)).MergeCells = True .Cells(1, SkinCol) = "Skin Exposure" .Cells(1, SkinCol).HorizontalAlignment = xlCenter .Cells(2, SkinCol) = "Acute Hazard/Symptoms" .Cells(2, SkinCol + 1) = "Prevention" .Cells(2, SkinCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, EyesCol), .Cells(1, EyesCol + 2)).MergeCells = True .Cells(1, EyesCol) = "Eyes Exposure" .Cells(1, EyesCol).HorizontalAlignment = xlCenter .Cells(2, EyesCol) = "Acute Hazard/Symptoms" .Cells(2, EyesCol + 1) = "Prevention" .Cells(2, EyesCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, IngestCol), .Cells(1, IngestCol + 2)).MergeCells = True .Cells(1, IngestCol) = "Ingestion Exposure" .Cells(1, IngestCol).HorizontalAlignment = xlCenter .Cells(2, IngestCol) = "Acute Hazard/Symptoms" .Cells(2, IngestCol + 1) = "Prevention" .Cells(2, IngestCol + 2) = "First Aid/Fire Fighting" .Cells(1, SpillDisposCol) = "Spillage Disposal" .Cells(1, PackCol) = "Packaging and Labelling" .Columns(PackCol).WrapText = True .Cells(1, EmergRespCol) = "Emergency Response" .Cells(1, SafeStorCol) = "Safe Storage" .Columns(SafeStorCol).WrapText = True .Cells(1, PhysStateCol) = "Physical State; Appearance" .Cells(1, RoutesCol) = "Routes of Exposure" .Cells(1, ChemDangCol) = "Chemical Dangers" .Cells(1, InhalRiskCol) = "Inhalation Risk" .Cells(1, OccupatCol) = "Occupational exposure limits" .Cells(1, ShortTermCol) = "Effects of short-term exposure" .Cells(1, LongTermCol) = "Effects of long-term or repeated exposure" .Cells(1, PhysicPropCol) = "PHYSICAL PROPERTIES" .Cells(1, EnvironCol) = "ENVIRONMENTAL DATA" .Cells(1, NoteCol) = "NOTES" Range("A1:A" & LastCol).EntireColumn.AutoFit End With End Sub Sub MoveData(Chemical, RowCount) Set DataSht = Sheets("Data") 'Use ICSC: to get chemical names With Sheets("Temp") DataSht.Cells(RowCount, ChemNameCol) = Chemical Set c = .Columns("B").Find(what:="ICSC:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find ISCS for Chemical : " & Chemical) Stop Else 'Move Generic Name DataSht.Cells(RowCount, GenericNameCol) = c.Offset(0, -1).Value End If 'Use ISCS Number to find first Row of Alternate Names FirstAlternateRow = c.Row + 2 Set c = .Columns("A").Find(what:="CAS No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find CAS Number for Chemical : " & Chemical) Stop Else 'Move Generic Name DataSht.Cells(RowCount, CASNoCol) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If 'Use CAS Number to find Last Row of Alternate Names LastAlternateRow = c.Row - 1 'Get Alternate Names Alternate = "" For TempRowCount = FirstAlternateRow To LastAlternateRow If Alternate = "" Then Alternate = .Range("A" & TempRowCount) Else Alternate = Alternate & Chr(10) & .Range("A" & TempRowCount) End If Next TempRowCount 'Move Alternate Name DataSht.Cells(RowCount, AltNameCol) = Alternate Set c = .Columns("A").Find(what:="RTECS No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find RTECS Number for Chemical : " & Chemical) Stop Else 'Move RTECS DataSht.Cells(RowCount, RTECSNoCol) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If Set c = .Columns("A").Find(what:="UN No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find UN Number for Chemical : " & Chemical) 'Stop Else 'Move UN No DataSht.Cells(RowCount, UNNoCol) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If Set c = .Columns("A").Find(what:="EC No:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find EC Number for Chemical : " & Chemical) Stop Else 'Move EC No DataSht.Cells(RowCount, ECNoCol) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) End If Set c = .Columns("C").Find(what:="Molecular mass:", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Molecular for Chemical : " & Chemical) 'Stop Else 'Move molecular mass DataSht.Cells(RowCount, MoleMassCol) = Trim(Mid(c.Value, InStr(c.Value, ":") + 1)) 'Move Molecular Formula DataSht.Cells(RowCount, MolecularFormCol) = c.Offset(-1, 0).Value End If Set c = .Columns("A").Find(what:="FIRE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find FIRE Hazard for Chemical : " & Chemical) Stop Else 'Move Fire hazard DataSht.Cells(RowCount, FireHazCol) = c.Offset(0, 1).Value DataSht.Cells(RowCount, FireHazCol + 1) = c.Offset(0, 2).Value DataSht.Cells(RowCount, FireHazCol + 2) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="EXPLOSION", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Explosion Hazard for Chemical : " & Chemical) Stop Else 'Move Explosion Hazard DataSht.Cells(RowCount, ExplosHazCol) = c.Offset(0, 1).Value DataSht.Cells(RowCount, ExplosHazCol + 1) = c.Offset(0, 2).Value DataSht.Cells(RowCount, ExplosHazCol + 2) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="EXPOSURE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Exposure for Chemical : " & Chemical) Stop Else 'find 2nd occurance Set c = .Columns("A").FindNext(after:=c) 'Move Exposure DataSht.Cells(RowCount, ExposureCol) = c.Offset(0, 1).Value DataSht.Cells(RowCount, ExposureCol + 1) = c.Offset(0, 2).Value DataSht.Cells(RowCount, ExposureCol + 2) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Inhalation", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Inhalation Exposure for Chemical : " & Chemical) Stop Else 'Move Inhalation Exposure DataSht.Cells(RowCount, InhalCol) = c.Offset(0, 1).Value DataSht.Cells(RowCount, InhalCol + 1) = c.Offset(0, 2).Value DataSht.Cells(RowCount, InhalCol + 2) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Skin", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Skin Exposure for Chemical : " & Chemical) Stop Else 'Move Skin Exposure DataSht.Cells(RowCount, SkinCol) = c.Offset(0, 1).Value DataSht.Cells(RowCount, SkinCol + 1) = c.Offset(0, 2).Value DataSht.Cells(RowCount, SkinCol + 2) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Eyes", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Eyes Exposure for Chemical : " & Chemical) Stop Else 'Move Eyes Exposure DataSht.Cells(RowCount, EyesCol) = c.Offset(0, 1).Value DataSht.Cells(RowCount, EyesCol + 1) = c.Offset(0, 2).Value DataSht.Cells(RowCount, EyesCol + 2) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="Ingestion", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Ingestion Exposure for Chemical : " & Chemical) Stop Else 'Move Ingestion Exposure DataSht.Cells(RowCount, IngestCol) = c.Offset(0, 1).Value DataSht.Cells(RowCount, IngestCol + 1) = c.Offset(0, 2).Value DataSht.Cells(RowCount, IngestCol + 2) = c.Offset(0, 3).Value End If Set c = .Columns("A").Find(what:="SPILLAGE DISPOSAL", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find SPILLAGE DISPOSAL for Chemical : " & Chemical) Stop Else 'Move SPILLAGE DISPOSAL DataSht.Cells(RowCount, SpillDisposCol) = c.Offset(1, 0).Value End If Set c = .Columns("B").Find(what:="PACKAGING & LABELLING", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find PACKAGING & LABELLING for Chemical : " & Chemical) Stop Else 'Move PACKAGING & LABELLING Packaging = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If Packaging = "" Then Packaging = .Range("B" & TempRowCount) Else Packaging = Packaging & Chr(10) & .Range("B" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, PackCol) = Packaging End If Set c = .Columns("A").Find(what:="EMERGENCY RESPONSE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find EMERGENCY RESPONSE for Chemical : " & Chemical) Stop Else 'Move EMERGENCY RESPONSE Emergency = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Emergency = "" Then Emergency = .Range("A" & TempRowCount) Else Emergency = Emergency & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, EmergRespCol) = Emergency End If Set c = .Columns("B").Find(what:="SAFE STORAGE", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Safe Storage for Chemical : " & Chemical) Stop Else 'Move Safe Storage DataSht.Cells(RowCount, SafeStorCol) = c.Offset(1, 0).Value End If Set c = .Columns("A").Find(what:="Physical State; Appearance", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Physical State; Appearance for Chemical : " & Chemical) Stop Else 'Move Physical State; Appearance Appearance = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Appearance = "" Then Appearance = .Range("A" & TempRowCount) Else Appearance = Appearance & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, PhysStateCol) = Appearance End If Set c = .Columns("B").Find(what:="Routes of exposure", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Routes of exposure for Chemical : " & Chemical) 'Stop Else 'Move Routes of exposure Routes = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If Routes = "" Then Routes = .Range("B" & TempRowCount) Else Routes = Routes & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, RoutesCol) = Routes End If Set c = .Columns("A").Find(what:="Chemical dangers", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Chemical dangers for Chemical : " & Chemical) Stop Else 'Move Chemical dangers Dangers = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Dangers = "" Then Dangers = .Range("A" & TempRowCount) Else Dangers = Dangers & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, ChemDangCol) = Dangers End If Set c = .Columns("B").Find(what:="Inhalation risk", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Inhalation risk for Chemical : " & Chemical) Stop Else 'Move Inhalation risk Inhalation = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If Inhalation = "" Then Inhalation = .Range("B" & TempRowCount) Else Inhalation = Inhalation & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, InhalRiskCol) = Inhalation End If Set c = .Columns("A").Find(what:="Occupational exposure limits", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Occupational exposure limits for Chemical : " & Chemical) Stop Else 'Move Occupational exposure limits Occupational = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Occupational = "" Then Occupational = .Range("A" & TempRowCount) Else Occupational = Occupational & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, OccupatCol) = Occupational End If Set c = .Columns("B").Find(what:="Effects of short-term exposure", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Effects of short-term exposure for Chemical : " & Chemical) Stop Else 'Move Effects of short-term exposure ShortTerm = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If ShortTerm = "" Then ShortTerm = .Range("B" & TempRowCount) Else ShortTerm = ShortTerm & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, ShortTermCol) = ShortTerm End If Set c = .Columns("B").Find(what:="Effects of long-term or repeated exposure", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find Effects of long-term or repeated exposure for Chemical : " & Chemical) ' Stop Else 'Move Effects of long-term or repeated exposure LongTerm = "" TempRowCount = c.Row + 1 Do While .Range("B" & TempRowCount) < "" If LongTerm = "" Then LongTerm = .Range("B" & TempRowCount) Else LongTerm = LongTerm & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, LongTermCol) = LongTerm End If Set c = .Columns("A").Find(what:="PHYSICAL PROPERTIES", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find PHYSICAL PROPERTIES for Chemical : " & Chemical) Stop Else 'Move PHYSICAL PROPERTIES Physical = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Physical = "" Then Physical = .Range("A" & TempRowCount) Else Physical = Physical & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, PhysicPropCol) = Physical End If Set c = .Columns("B").Find(what:="ENVIRONMENTAL DATA", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find ENVIRONMENTAL DATA for Chemical : " & Chemical) Stop Else 'Move ENVIRONMENTAL DATA DataSht.Cells(RowCount, EnvironCol) = c.Offset(1, 0).Value End If Set c = .Columns("A").Find(what:="NOTES", LookIn:=xlValues, lookat:=xlPart) If c Is Nothing Then MsgBox ("Error - Could not find NOTES for Chemical : " & Chemical) Stop Else 'Move NOTES Notes = "" TempRowCount = c.Row + 1 Do While .Range("A" & TempRowCount) < "" If Notes = "" Then Notes = .Range("A" & TempRowCount) Else Notes = Notes & Chr(10) & .Range("A" & TempRowCount) End If TempRowCount = TempRowCount + 1 Loop DataSht.Cells(RowCount, NoteCol) = Notes End If End With End Sub |
Automating Web Query import
On Oct 13, 5:36*pm, Roger on Excel
wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger Do you always want to obtain the same info, just for a diffrerent chemical? If so, specifically what info do you want?..ron |
Automating Web Query import
Ron: Look at the webpage below. There are over 5000 chemicals each with
different properties and warnings! http://www.ilo.org/public/english/pr...asht/index.htm "ron" wrote: On Oct 13, 5:36 pm, Roger on Excel wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger Do you always want to obtain the same info, just for a diffrerent chemical? If so, specifically what info do you want?..ron |
Automating Web Query import
On Oct 16, 1:14*pm, Joel wrote:
Ron: Look at the webpage below. *There are over 5000 chemicals each with different properties and warnings! http://www.ilo.org/public/english/pr...is/products/ic... "ron" wrote: On Oct 13, 5:36 pm, Roger on Excel wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger Do you always want to obtain the same info, just for a diffrerent chemical? *If so, specifically what info do you want?..ron- Hide quoted text - - Show quoted text - Yes, but depending what information is required, perhaps capturing it a) from the source code or b) by obtaining specific tables might be a reasonable solution. For example, Table 10 contains all of the "Important Data" information. Set cTables = ie.Document.getElementsByTagname("table") s = cTables(10).innertext Debug.Print s and "s" then contains IMPORTANT DATA Physical State; Appearance COLOURLESS LIQUID, WITH CHARACTERISTIC ODOUR. Physical dangers The vapour is heavier than air and may travel along the ground; distant ignition possible. Chemical dangers The substance can form explosive peroxides on contact with strong oxidants such as acetic acid, nitric acid, hydrogen peroxide. Reacts with chloroform and bromoform under basic conditions, causing fire and explosion hazard. Attacks plastic. Occupational exposure limits TLV: 500 ppm as TWA, 750 ppm as STEL; A4 (not classifiable as a human carcinogen); BEI issued; (ACGIH 2004). MAK: 500 ppm 1200 mg/m³ Peak limitation category: I(2); Pregnancy risk group: D; (DFG 2006).Routes of exposure The substance can be absorbed into the body by inhalation and through the skin. Inhalation risk A harmful contamination of the air can be reached rather quickly on evaporation of this substance at 20°C ; on spraying or dispersing, however, much faster. Effects of short-term exposure The vapour irritates the eyes and the respiratory tract. The substance may cause effects on the central nervous system, liver, kidneys and gastrointestinal tract. Effects of long-term or repeated exposure Repeated or prolonged contact with skin may cause dermatitis. The substance may have effects on the blood and bone marrow. Again, depending upon which pieces of data for each chemical are required, perhaps one of these approaches might prove useful...ron |
Automating Web Query import
On Oct 16, 2:39*pm, ron wrote:
On Oct 16, 1:14*pm, Joel wrote: Ron: Look at the webpage below. *There are over 5000 chemicals each with different properties and warnings! http://www.ilo.org/public/english/pr...is/products/ic... "ron" wrote: On Oct 13, 5:36 pm, Roger on Excel wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger Do you always want to obtain the same info, just for a diffrerent chemical? *If so, specifically what info do you want?..ron- Hide quoted text - - Show quoted text - Yes, but depending what information is required, perhaps capturing it a) from the source code or b) by obtaining specific tables might be a reasonable solution. *For example, Table 10 contains all of the "Important Data" information. * * Set cTables = ie.Document.getElementsByTagname("table") * * s = cTables(10).innertext * * Debug.Print s and "s" then contains IMPORTANT DATA Physical State; Appearance COLOURLESS LIQUID, WITH CHARACTERISTIC ODOUR. Physical dangers The vapour is heavier than air and may travel along the ground; distant ignition possible. Chemical dangers The substance can form explosive peroxides on contact with strong oxidants such as acetic acid, nitric acid, hydrogen peroxide. Reacts with chloroform and bromoform under basic conditions, causing fire and explosion hazard. Attacks plastic. Occupational exposure limits TLV: 500 ppm as TWA, 750 ppm as STEL; A4 (not classifiable as a human carcinogen); BEI issued; (ACGIH 2004). MAK: 500 ppm 1200 mg/m³ Peak limitation category: I(2); Pregnancy risk group: D; (DFG 2006).Routes of exposure The substance can be absorbed into the body by inhalation and through the skin. Inhalation risk A harmful contamination of the air can be reached rather quickly on evaporation of this substance at 20°C ; on spraying or dispersing, however, much faster. Effects of short-term exposure The vapour irritates the eyes and the respiratory tract. The substance may cause effects on the central nervous system, liver, kidneys and gastrointestinal tract. Effects of long-term or repeated exposure Repeated or prolonged contact with skin may cause dermatitis. The substance may have effects on the blood and bone marrow. Again, depending upon which pieces of data for each chemical are required, perhaps one of these approaches might prove useful...ron- Hide quoted text - - Show quoted text - Whoops, I checked a few more chemicals and found that Table 10 is not always the "Important Data" table. Still, I'd like to see an answer to my original question - what specific data fields does Roger want to capture?..ron |
Automating Web Query import
Ron: It looks a little easier to get the data from the webpage your way than
mine. Your has an advantage of getting the formated data from the innerhtml property. "ron" wrote: On Oct 16, 1:14 pm, Joel wrote: Ron: Look at the webpage below. There are over 5000 chemicals each with different properties and warnings! http://www.ilo.org/public/english/pr...is/products/ic... "ron" wrote: On Oct 13, 5:36 pm, Roger on Excel wrote: I use hyperlinks for chemicals that access a webpage for the individual compounds showing tables of data for that chemical. I have a large list of chemicals and I want to automate the downloading of specific data in the tables from the web pages. The chemical data is stored in the same format of tables on each webpage. Can anyone help? Thanks, Roger Do you always want to obtain the same info, just for a diffrerent chemical? If so, specifically what info do you want?..ron- Hide quoted text - - Show quoted text - Yes, but depending what information is required, perhaps capturing it a) from the source code or b) by obtaining specific tables might be a reasonable solution. For example, Table 10 contains all of the "Important Data" information. Set cTables = ie.Document.getElementsByTagname("table") s = cTables(10).innertext Debug.Print s and "s" then contains IMPORTANT DATA Physical State; Appearance COLOURLESS LIQUID, WITH CHARACTERISTIC ODOUR. Physical dangers The vapour is heavier than air and may travel along the ground; distant ignition possible. Chemical dangers The substance can form explosive peroxides on contact with strong oxidants such as acetic acid, nitric acid, hydrogen peroxide. Reacts with chloroform and bromoform under basic conditions, causing fire and explosion hazard. Attacks plastic. Occupational exposure limits TLV: 500 ppm as TWA, 750 ppm as STEL; A4 (not classifiable as a human carcinogen); BEI issued; (ACGIH 2004). MAK: 500 ppm 1200 mg/m³ Peak limitation category: I(2); Pregnancy risk group: D; (DFG 2006).Routes of exposure The substance can be absorbed into the body by inhalation and through the skin. Inhalation risk A harmful contamination of the air can be reached rather quickly on evaporation of this substance at 20°C ; on spraying or dispersing, however, much faster. Effects of short-term exposure The vapour irritates the eyes and the respiratory tract. The substance may cause effects on the central nervous system, liver, kidneys and gastrointestinal tract. Effects of long-term or repeated exposure Repeated or prolonged contact with skin may cause dermatitis. The substance may have effects on the blood and bone marrow. Again, depending upon which pieces of data for each chemical are required, perhaps one of these approaches might prove useful...ron |
Automating Web Query import
Thanks guys for all your help.
I am incredibly thankful for all your time trying to help me. I wish you all the best. Have a great weekend, Roger |
Automating Web Query import
I wanted to do try to get the data using Ron's method to compare techniques.
I have had problems with using the getElementsByTagname() method. I always seem to have problems finding all the data. Thought I would try again. The IE method started working out easier than the Query method. The Internet Method was even better at doing some error checks and finding missing properties. Got everything working with 5 chemicals. Then increase to 10 and found some missing properties in Important Items. Made Some minor changes. Then went to 20 chemicals and got stuck. It took me 8 hours to figure out the correct methods to get the last chemical working properly. It seems there is minor differences between the chemical webpages that weren't obvious. Here is the new code Const ChemNameCol = 1 Const GenericNameCol = ChemNameCol + 1 Const ISCSCol = GenericNameCol + 1 Const CASNoCol = ISCSCol + 1 Const RTECSNoCol = CASNoCol + 1 Const UNNoCol = RTECSNoCol + 1 Const ECNoCol = UNNoCol + 1 Const MolecularFormCol = ECNoCol + 1 Const AltNameCol = MolecularFormCol + 1 Const MoleMassCol = AltNameCol + 1 'Group of 3 Columns Const FireHazCol = MoleMassCol + 1 Const ExplosHazCol = FireHazCol + 3 Const ExposureCol = ExplosHazCol + 3 Const InhalCol = ExposureCol + 3 Const SkinCol = InhalCol + 3 Const EyesCol = SkinCol + 3 Const IngestCol = EyesCol + 3 Const SpillDisposCol = IngestCol + 3 Const PackCol = SpillDisposCol + 1 Const EmergRespCol = PackCol + 1 Const SafeStorCol = EmergRespCol + 1 Const PhysStateCol = SafeStorCol + 1 Const RoutesCol = PhysStateCol + 1 Const ChemDangCol = RoutesCol + 1 Const InhalRiskCol = ChemDangCol + 1 Const OccupatCol = InhalRiskCol + 1 Const ShortTermCol = OccupatCol + 1 Const LongTermCol = ShortTermCol + 1 Const PhysDangerCol = LongTermCol + 1 Const PhysicPropCol = PhysDangerCol + 1 Const EnvironCol = PhysicPropCol + 1 Const NoteCol = EnvironCol + 1 Const LastCol = NoteCol Sub MakeHeaders() With Sheets("Data") .Cells(1, ChemNameCol) = "Chemical Name" .Cells(1, GenericNameCol) = "Generic Name(s)" .Cells(1, ISCSCol) = "ISCS No" .Cells(1, CASNoCol) = "CAS No" .Cells(1, RTECSNoCol) = "RTECS No" .Cells(1, UNNoCol) = "UN No" .Cells(1, ECNoCol) = "EC No" .Cells(1, MolecularFormCol) = "Molucular Formula" .Cells(1, AltNameCol) = "Alternate Names" .Columns(AltNameCol).WrapText = True .Cells(1, MoleMassCol) = "Molecular Mass" .Range(.Cells(1, FireHazCol), .Cells(1, FireHazCol + 2)).MergeCells = True .Cells(1, FireHazCol) = "Fire Hazard" .Cells(1, FireHazCol).HorizontalAlignment = xlCenter .Cells(2, FireHazCol) = "Acute Hazard/Symptoms" .Cells(2, FireHazCol + 1) = "Prevention" .Cells(2, FireHazCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, ExplosHazCol), _ .Cells(1, ExplosHazCol + 2)).MergeCells = True .Cells(1, ExplosHazCol) = "Explosion Hazard" .Cells(1, ExplosHazCol).HorizontalAlignment = xlCenter .Cells(2, ExplosHazCol) = "Acute Hazard/Symptoms" .Cells(2, ExplosHazCol + 1) = "Prevention" .Cells(2, ExplosHazCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, ExposureCol), _ .Cells(1, ExposureCol + 2)).MergeCells = True .Cells(1, ExposureCol) = "Exposure" .Cells(1, ExposureCol).HorizontalAlignment = xlCenter .Cells(2, ExposureCol) = "Acute Hazard/Symptoms" .Cells(2, ExposureCol + 1) = "Prevention" .Cells(2, ExposureCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, InhalCol), .Cells(1, InhalCol + 2)).MergeCells = True .Cells(1, InhalCol) = "Inhalation Exposure" .Cells(1, InhalCol).HorizontalAlignment = xlCenter .Cells(2, InhalCol) = "Acute Hazard/Symptoms" .Cells(2, InhalCol + 1) = "Prevention" .Cells(2, InhalCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, SkinCol), .Cells(1, SkinCol + 2)).MergeCells = True .Cells(1, SkinCol) = "Skin Exposure" .Cells(1, SkinCol).HorizontalAlignment = xlCenter .Cells(2, SkinCol) = "Acute Hazard/Symptoms" .Cells(2, SkinCol + 1) = "Prevention" .Cells(2, SkinCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, EyesCol), .Cells(1, EyesCol + 2)).MergeCells = True .Cells(1, EyesCol) = "Eyes Exposure" .Cells(1, EyesCol).HorizontalAlignment = xlCenter .Cells(2, EyesCol) = "Acute Hazard/Symptoms" .Cells(2, EyesCol + 1) = "Prevention" .Cells(2, EyesCol + 2) = "First Aid/Fire Fighting" .Range(.Cells(1, IngestCol), .Cells(1, IngestCol + 2)).MergeCells = True .Cells(1, IngestCol) = "Ingestion Exposure" .Cells(1, IngestCol).HorizontalAlignment = xlCenter .Cells(2, IngestCol) = "Acute Hazard/Symptoms" .Cells(2, IngestCol + 1) = "Prevention" .Cells(2, IngestCol + 2) = "First Aid/Fire Fighting" .Cells(1, SpillDisposCol) = "Spillage Disposal" .Cells(1, PackCol) = "Packaging and Labelling" .Columns(PackCol).WrapText = True .Cells(1, EmergRespCol) = "Emergency Response" .Cells(1, SafeStorCol) = "Safe Storage" .Columns(SafeStorCol).WrapText = True .Cells(1, PhysStateCol) = "Physical State; Appearance" .Cells(1, RoutesCol) = "Routes of Exposure" .Cells(1, ChemDangCol) = "Chemical Dangers" .Cells(1, InhalRiskCol) = "Inhalation Risk" .Cells(1, OccupatCol) = "Occupational exposure limits" .Cells(1, ShortTermCol) = "Effects of short-term exposure" .Cells(1, LongTermCol) = "Effects of long-term or repeated exposure" .Cells(1, PhysDangerCol) = "Physical Dangers" .Cells(1, PhysicPropCol) = "PHYSICAL PROPERTIES" .Cells(1, EnvironCol) = "ENVIRONMENTAL DATA" .Cells(1, NoteCol) = "NOTES" Range("A1:A" & LastCol).EntireColumn.AutoFit End With End Sub Sub GetData2() Found = False For Each sht In Sheets If sht.Name = "Data" Then Found = True Exit For End If Next sht If Found = False Then Set DataSht = Sheets.Add(after:=Sheets(Sheets.Count)) DataSht.Name = "Data" Else Set DataSht = Sheets("Data") DataSht.Cells.ClearContents End If Call MakeHeaders Set ie = CreateObject("InternetExplorer.Application") ie.Visible = True Set ChemicalSht = Sheets("Chemicals") With ChemicalSht FirstRow = 1 LastRow = .Range("A" & Rows.Count).End(xlUp).Row LastRow = 20 Set Chemicals = .Range("A" & FirstRow & ":A" & LastRow) NewRowCount = FirstRow + 3 For Each Chemical In Chemicals URL = Chemical.Offset(0, 1) 'get web page ie.Navigate2 URL Do While ie.readyState < 4 DoEvents Loop Do While ie.busy = True DoEvents Loop Set cTables = ie.Document.getElementsByTagname("table") Call MoveData2(cTables, Chemical, NewRowCount) NewRowCount = NewRowCount + 1 Next Chemical .Range("A1:A" & LastCol).EntireColumn.AutoFit End With DataSht.Columns(NoteCol).ColumnWidth = 50 DataSht.Rows("1:" & NewRowCount).VerticalAlignment = xlTop End Sub Sub MoveData2(cTables, Chemical, RowCount) Dim StrNumber As String LF = Chr(10) CR = Chr(13) 'With Sheets("Test") ' .Cells.ClearContents ' RowCount = 1 ' For Each itm In cTables ' .Range("A" & RowCount) = itm.classname ' .Range("B" & RowCount) = itm.innertext ' RowCount = RowCount + 1 ' Next itm 'End With Set DataSht = Sheets("Data") 'Use ICSC: to get chemical names With DataSht .Cells(RowCount, ChemNameCol) = Chemical GenericText = cTables.Item(3).innertext GenericText = Replace(GenericText, CR, "") 'Move Generic Name .Cells(RowCount, GenericNameCol) = _ Trim(Left(GenericText, InStr(GenericText, ":") - 1)) 'Move ISCS .Cells(1, ISCSCol).NumberFormat = "@" .Cells(RowCount, ISCSCol) = _ Trim(Mid(GenericText, InStr(GenericText, ":") + 1)) Alternate = cTables.Item(4).innertext Alternate = Replace(Alternate, CR, "") 'Move Alternate Name .Cells(RowCount, AltNameCol) = Alternate 'Get Additional ID Numbers ID = cTables.Item(5).innertext SplitData = Split(ID, CR) For i = LBound(SplitData) To UBound(SplitData) 'get each line line itm = Trim(SplitData(i)) itm = Replace(itm, LF, "") 'if no colon sign then molecular mass If InStr(itm, ":") 0 Then 'split name and number using colon sign StrNumber = Trim(Mid(itm, InStr(itm, ":") + 1)) itm = Trim(Left(itm, InStr(itm, ":") - 1)) ID = Trim(Mid(ID, InStr(ID, CR) + 1)) Select Case itm Case "CAS No": 'Move CASAN Name .Cells(RowCount, CASNoCol).NumberFormat = "@" .Cells(RowCount, CASNoCol) = StrNumber Case "RTECS No": 'Move RTECS .Cells(RowCount, RTECSNoCol).NumberFormat = "@" .Cells(RowCount, RTECSNoCol) = StrNumber Case "UN No": 'Move UN No .Cells(RowCount, UNNoCol).NumberFormat = "@" .Cells(RowCount, UNNoCol) = StrNumber Case "EC No": .Cells(RowCount, ECNoCol).NumberFormat = "@" 'split string number from molecular formula StrNumber = _ Trim(Left(StrNumber, InStr(StrNumber, " ") - 1)) 'Move EC No .Cells(RowCount, ECNoCol) = StrNumber Case "Molecular mass": .Cells(RowCount, MoleMassCol) = StrNumber End Select End If Next i 'Get molecular formula Molecular = cTables.Item(5).Cells.Item(2).innertext 'Remove extra data in front of chemical formula If Left(Molecular, 1) = "(" Then Molecular = Mid(Molecular, InStr(Molecular, CR) + 1) End If If InStr(Molecular, CR) 0 Then Molecular = _ Trim(Left(Molecular, InStr(Molecular, CR) - 1)) Else Molecular = Trim(Molecular) End If 'Move Molecular Formula Molecular = Molecular For Each TableRow In cTables.Item(6).Rows Select Case UCase(TableRow.Cells(0).innertext) Case "FIRE": 'Move Fire hazard .Cells(RowCount, FireHazCol) = _ TableRow.Cells(1).innertext .Cells(RowCount, FireHazCol + 1) = _ TableRow.Cells(2).innertext .Cells(RowCount, FireHazCol + 2) = _ TableRow.Cells(3).innertext Case "EXPLOSION": 'Move Explosion Hazard .Cells(RowCount, ExplosHazCol) = _ TableRow.Cells(1).innertext .Cells(RowCount, ExplosHazCol + 1) = _ TableRow.Cells(2).innertext .Cells(RowCount, ExplosHazCol + 2) = _ TableRow.Cells(3).innertext Case "EXPOSURE": 'Move Exposure .Cells(RowCount, ExposureCol) = _ TableRow.Cells(1).innertext .Cells(RowCount, ExposureCol + 1) = _ TableRow.Cells(2).innertext .Cells(RowCount, ExposureCol + 2) = _ TableRow.Cells(3).innertext Case "INHALATION": 'Move Inhalation Exposure .Cells(RowCount, InhalCol) = _ TableRow.Cells(1).innertext .Cells(RowCount, InhalCol + 1) = _ TableRow.Cells(2).innertext .Cells(RowCount, InhalCol + 2) = _ TableRow.Cells(3).innertext Case "SKIN": 'Move Skin Exposure .Cells(RowCount, SkinCol) = _ TableRow.Cells(1).innertext .Cells(RowCount, SkinCol + 1) = _ TableRow.Cells(2).innertext .Cells(RowCount, SkinCol + 2) = _ TableRow.Cells(3).innertext Case "EYES": 'Move Eyes Exposure .Cells(RowCount, EyesCol) = _ TableRow.Cells(1).innertext .Cells(RowCount, EyesCol + 1) = _ TableRow.Cells(2).innertext .Cells(RowCount, EyesCol + 2) = _ TableRow.Cells(3).innertext Case "INGESTION": 'Move Ingestion Exposure .Cells(RowCount, IngestCol) = _ TableRow.Cells(1).innertext .Cells(RowCount, IngestCol + 1) = _ TableRow.Cells(2).innertext .Cells(RowCount, IngestCol + 2) = _ TableRow.Cells(3).innertext End Select Next TableRow Set SpillRow = cTables.Item(7).Rows(1) 'Move SPILLAGE DISPOSAL .Cells(RowCount, SpillDisposCol) = _ SpillRow.Cells(0).innertext 'Move PACKAGING & LABELING 'Combine columns 2 & 3 together If SpillRow.Cells.Length = 2 Then Pack = SpillRow.Cells(1).innertext Else Pack = SpillRow.Cells(1).innertext & _ LF & SpillRow.Cells(2).innertext End If Pack = Replace(Pack, CR, "") .Cells(RowCount, PackCol) = Pack Set EmergencyRow = cTables.Item(8).Rows(1) 'Move Emergency Response Emergency = EmergencyRow.Cells(0).innertext Emergency = Replace(Emergency, CR, "") .Cells(RowCount, EmergRespCol) = Emergency 'Move Safe Storage .Cells(RowCount, SafeStorCol) = _ EmergencyRow.Cells(1).innertext Set ImportantRow = cTables.Item(9).Rows(1) Cols = ImportantRow.Cells.Length For ColCount = 0 To (Cols - 1) Set cell = ImportantRow.Cells(ColCount) Set B = cell.getElementsByTagname("B") Set P = cell.getElementsByTagname("P") First = True Done = False Do If First = True Then Title = B(0).innertext Title = Trim(Replace(Title, ":", "")) Title = Trim(Replace(Title, LF, "")) Detail = B(0).nextsibling.nextsibling.data PCount = 0 First = False Else Item = P(PCount).innertext If Item < "" Then Title = Left(Item, InStr(Item, CR) - 1) Title = Trim(Replace(Title, ":", "")) Title = Trim(Replace(Title, LF, "")) Detail = Mid(Item, InStr(Item, CR) + 1) If Left(Detail, 1) = LF Then Detail = Mid(Detail, 2) End If Else Title = "" End If PCount = PCount + 1 End If If Title < "" Then Select Case UCase(Title) Case "PHYSICAL STATE; APPEARANCE": 'Move Physical State; Appearance .Cells(RowCount, PhysStateCol) = Detail Case "ROUTES OF EXPOSURE": 'Move Routes of exposure .Cells(RowCount, RoutesCol) = Detail Case "CHEMICAL DANGERS": 'Move Chemical dangers .Cells(RowCount, ChemDangCol) = Detail Case "INHALATION RISK": 'Move Inhalation risk .Cells(RowCount, InhalRiskCol) = Detail Case "OCCUPATIONAL EXPOSURE LIMITS": 'Move Occupational exposure limits .Cells(RowCount, OccupatCol) = Detail Case "EFFECTS OF SHORT-TERM EXPOSURE": 'Move Effects of short-term exposure .Cells(RowCount, ShortTermCol) = Detail Case "EFFECTS OF LONG-TERM OR REPEATED EXPOSURE": 'Move Effects of long-term or repeated exposure .Cells(RowCount, LongTermCol) = Detail Case "PHYSICAL DANGERS": 'Move Physical Danagers .Cells(RowCount, PhysDangerCol) = Detail Case Else: MsgBox ("Unknown Detail : " & _ Title & " : Stop") Stop End Select End If If PCount = P.Length Then Done = True End If Loop While Done = False Next ColCount 'Move Physical Properties Physical = cTables.Item(10).Rows(1).Cells(0).innertext Physical = Replace(Physical, CR, "") .Cells(RowCount, PhysicPropCol) = Physical 'Move ENVIRONMENTAL DATA Environmental = cTables.Item(10).Rows(1).Cells(1).innertext Environmental = Replace(Environmental, CR, "") .Cells(RowCount, EnvironCol) = Environmental 'Move NOTES Notes = cTables.Item(11).Rows(1).innertext Notes = Replace(Notes, CR, "") .Cells(RowCount, NoteCol) = Notes End With End Sub |
All times are GMT +1. The time now is 05:39 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com