ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   HTML / web page processing (https://www.excelbanter.com/excel-programming/356981-html-web-page-processing.html)

donbowyer

HTML / web page processing
 
I need to extract specific data from a web page I have pasted into Excel.
The page contains many hyperlinks and objects like pictures, controls,
embedded controls related to 'flash player' as well as the required data.
I've tried the following simplistic "code". The hyperlinks remove OK but the
rest of the code results in a lock up and enforced shutdown.
Since I have no way of referencing the required data eg by name, I need to
remove all the "clutter" then I can manipulate what's left to get what I need.

On Error Resume Next
ActiveSheet.Hyperlinks.Delete

For Each Control In ActiveSheet.Controls
Control.Select
Control.Delete
Next Control

For Each Picture In ActiveSheet.Pictures
Picture.Select
Selection.Delete
Next Picture

For Each Shape In ActiveSheet.Shapes
Shape.Select
Selection.Delete
Next Shape
End Sub
--
donwb

[email protected]

HTML / web page processing
 
Don't paste it if you can help it, use the WebQuery option in;
Data Import External Data New Web Query

If you do this and use the default options, no pictures or other
elements will be imported, just data.

Post back if you have any more questions.


Gene Mills

HTML / web page processing
 
Try pasting special with text only
"donbowyer" wrote in message
...
I need to extract specific data from a web page I have pasted into Excel.
The page contains many hyperlinks and objects like pictures, controls,
embedded controls related to 'flash player' as well as the required data.
I've tried the following simplistic "code". The hyperlinks remove OK but
the
rest of the code results in a lock up and enforced shutdown.
Since I have no way of referencing the required data eg by name, I need to
remove all the "clutter" then I can manipulate what's left to get what I
need.

On Error Resume Next
ActiveSheet.Hyperlinks.Delete

For Each Control In ActiveSheet.Controls
Control.Select
Control.Delete
Next Control

For Each Picture In ActiveSheet.Pictures
Picture.Select
Selection.Delete
Next Picture

For Each Shape In ActiveSheet.Shapes
Shape.Select
Selection.Delete
Next Shape
End Sub
--
donwb




markbpdx

HTML / web page processing
 

if you get any help ... I am serching for something similar - I need to
get the value currently selected in a html "select" control

the Html snipit looks like this"

<form
<pSelect Item<br
<SELECT NAME="Item" onChange="go()"
<OPTION VALUE="3" Item-3-description
<OPTION VALUE="1" Item-1-description
<OPTION VALUE="5" Item-5-description
<OPTION VALUE="4" Item-4-description
<OPTION SELECTED VALUE="product-2-description"
<table<tr<tdproduct-2-description
</SELECT
</p
</form

? Word.Documents(1).Fields(1).Code.Text
HTMLCONTROL Forms.HTML:Select.1

The following has the entire html text
word.Documents(2).HTMLProject.HTMLProjectItems(1). Text
but I was hoping not to have to parse this since word seems to know its
a control.

Using excel :
?ActiveSheet.OLEObjects(1).progid
Forms.HTML:Select.1

ActiveSheet.OLEObjects(1).oletype = xlOLEControl

(documentation only referes to xlOLEType constants: xlOLELink or
xlOLEEmbed)

documentation for "OLE Programmatic Identifiers" includes information
for

ActiveX Controls, Microsoft Access, Microsoft Excel, Microsoft Graph
Microsoft Office Web Components, Microsoft Outlook, Microsoft
PowerPoint
Microsoft Word, ActiveX Controls

ActiveX Controls
To create the ActiveX controls listed in the following table, use the
corresponding OLE programmatic identifier.


To create this control Use this identifier
CheckBox Forms.CheckBox.1
ComboBox Forms.ComboBox.1

etc.


--
markbpdx
------------------------------------------------------------------------
markbpdx's Profile: http://www.excelforum.com/member.php...o&userid=37610
View this thread: http://www.excelforum.com/showthread...hreadid=525928


markbpdx[_2_]

HTML / web page processing
 

if you get any help ... I am serching for something similar - I need to
get the value currently selected in a html "select" control

the Html snipit looks like this"

<form
<pSelect Item<br
<SELECT NAME="Item" onChange="go()"
<OPTION VALUE="3" Item-3-description
<OPTION VALUE="1" Item-1-description
<OPTION VALUE="5" Item-5-description
<OPTION VALUE="4" Item-4-description
<OPTION SELECTED VALUE="2"product-2-description
</SELECT
</p
</form

? Word.Documents(1).Fields(1).Code.Text
HTMLCONTROL Forms.HTML:Select.1

The following has the entire html text
word.Documents(2).HTMLProject.HTMLProjectItems(1). Text
but I was hoping not to have to parse this since word seems to know its
a control.

Using excel :
?ActiveSheet.OLEObjects(1).progid
Forms.HTML:Select.1

ActiveSheet.OLEObjects(1).oletype = xlOLEControl

(documentation only referes to xlOLEType constants: xlOLELink or
xlOLEEmbed)

documentation for "OLE Programmatic Identifiers" includes information
for

ActiveX Controls, Microsoft Access, Microsoft Excel, Microsoft Graph
Microsoft Office Web Components, Microsoft Outlook, Microsoft
PowerPoint
Microsoft Word, ActiveX Controls

ActiveX Controls
To create the ActiveX controls listed in the following table, use the
corresponding OLE programmatic identifier.


To create this control Use this identifier
CheckBox Forms.CheckBox.1
ComboBox Forms.ComboBox.1

etc.

Idealy I would like to be able to determine the current selected item
and its associated Value(s)


--
markbpdx
------------------------------------------------------------------------
markbpdx's Profile: http://www.excelforum.com/member.php...o&userid=37610
View this thread: http://www.excelforum.com/showthread...hreadid=525928


Tim Williams

HTML / web page processing
 
Where is the HTML page loaded? In IE ?
The HTML portion is clear, but the relevance of all the stuff below is not.

....and the thread you're tagging on to is long gone from my newsgroup server
so no clues there either.

Tim


"markbpdx" wrote in
message ...

if you get any help ... I am serching for something similar - I need to
get the value currently selected in a html "select" control

the Html snipit looks like this"

<form
<pSelect Item<br
<SELECT NAME="Item" onChange="go()"
<OPTION VALUE="3" Item-3-description
<OPTION VALUE="1" Item-1-description
<OPTION VALUE="5" Item-5-description
<OPTION VALUE="4" Item-4-description
<OPTION SELECTED VALUE="product-2-description"
<table<tr<tdproduct-2-description
</SELECT
</p
</form

? Word.Documents(1).Fields(1).Code.Text
HTMLCONTROL Forms.HTML:Select.1

The following has the entire html text
word.Documents(2).HTMLProject.HTMLProjectItems(1). Text
but I was hoping not to have to parse this since word seems to know its
a control.

Using excel :
?ActiveSheet.OLEObjects(1).progid
Forms.HTML:Select.1

ActiveSheet.OLEObjects(1).oletype = xlOLEControl

(documentation only referes to xlOLEType constants: xlOLELink or
xlOLEEmbed)

documentation for "OLE Programmatic Identifiers" includes information
for

ActiveX Controls, Microsoft Access, Microsoft Excel, Microsoft Graph
Microsoft Office Web Components, Microsoft Outlook, Microsoft
PowerPoint
Microsoft Word, ActiveX Controls

ActiveX Controls
To create the ActiveX controls listed in the following table, use the
corresponding OLE programmatic identifier.


To create this control Use this identifier
CheckBox Forms.CheckBox.1
ComboBox Forms.ComboBox.1

etc.


--
markbpdx
------------------------------------------------------------------------
markbpdx's Profile:
http://www.excelforum.com/member.php...o&userid=37610
View this thread: http://www.excelforum.com/showthread...hreadid=525928




markbpdx[_5_]

HTML / web page processing
 

I am serching for something that will tell the value currently selected
in a html "select" control using VBA from Excel or Word (or for that
matter any ole controls that have an oletype=xlOLEControl and a
progid=forms.HTML:[xxxxx] where [xxxxx] is some html form tag)

the Html snipit looks like this"

<form
<pSelect Item<br
<SELECT NAME="Item" onChange="go()"
<OPTION VALUE="3" Item-3-description
<OPTION VALUE="1" Item-1-description
<OPTION VALUE="5" Item-5-description
<OPTION VALUE="4" Item-4-description
<OPTION SELECTED VALUE="2"product-2-description
</SELECT
</p
</form

the web page is opened in word (or excel) using an open method similar
to the following

Word.Documents.Open "http://xxx.yyy.com/page.php?" & paramater_list

the web page is displayed in the Word or Excel Document/worksheet

------------------------------------------
In the immediate window ?Word.Documents(1).Fields(1).Code.Text
returns this as a value without the quotes "HTMLCONTROL
Forms.HTML:Select.1"

(the fields property seems to be the one you would use if it was a word
or excel form)

The following property returns the entire html text:
word.Documents(1).HTMLProject.HTMLProjectItems(1). Text

But I was hoping not to have to parse this since word seems to know its
a control.

Using excel in a similar fasion:
?ActiveSheet.OLEObjects(1).progid
returns "Forms.HTML:Select.1"

Excel also has an oleObjects collection which returns oletype
xlOLEControl
(documentation only referes to xlOLEType constants: xlOLELink or
xlOLEEmbed as being returned )

ActiveSheet.OLEObjects(1).oletype = xlOLEControl

--------------------------------------------------------------------
The following is excerpted from vba documentation for
"OLE Programmatic Identifiers"
-----------------------------------------------------------------------

ActiveX Controls, Microsoft Access, Microsoft Excel, Microsoft Graph
Microsoft Office Web Components, Microsoft Outlook, Microsoft
PowerPoint
Microsoft Word, ActiveX Controls

ActiveX Controls
To create the ActiveX controls listed in the following table, use the
corresponding OLE programmatic identifier.


_To_create_this_contro_l _Use_this_identifier__
CheckBox Forms.CheckBox.1
ComboBox Forms.ComboBox.1

etc.
------------------------------------------------------------------------
that section doesn't seem to refer to any identifiers with HTML in them
but it seems like it would be the likely place to find them.

Idealy I would like to be able to determine the current selected item
and its associated Value(s)


--
markbpdx
------------------------------------------------------------------------
markbpdx's Profile: http://www.excelforum.com/member.php...o&userid=37610
View this thread: http://www.excelforum.com/showthread...hreadid=525928


Tim Williams

HTML / web page processing
 
Have you tried automating IE to load the page and then reading the values
from there?
Navigating a page loaded in Word/Excel doesn't seem to expose the controls
in a predictable way, and the "native" HTML methods/properties are not
available.

Lots of past posts in this group on using IE.

Tim


"markbpdx" wrote in
message ...

I am serching for something that will tell the value currently selected
in a html "select" control using VBA from Excel or Word (or for that
matter any ole controls that have an oletype=xlOLEControl and a
progid=forms.HTML:[xxxxx] where [xxxxx] is some html form tag)

the Html snipit looks like this"

<form
<pSelect Item<br
<SELECT NAME="Item" onChange="go()"
<OPTION VALUE="3" Item-3-description
<OPTION VALUE="1" Item-1-description
<OPTION VALUE="5" Item-5-description
<OPTION VALUE="4" Item-4-description
<OPTION SELECTED VALUE="2"product-2-description
</SELECT
</p
</form

the web page is opened in word (or excel) using an open method similar
to the following

Word.Documents.Open "http://xxx.yyy.com/page.php?" & paramater_list

the web page is displayed in the Word or Excel Document/worksheet

------------------------------------------
In the immediate window ?Word.Documents(1).Fields(1).Code.Text
returns this as a value without the quotes "HTMLCONTROL
Forms.HTML:Select.1"

(the fields property seems to be the one you would use if it was a word
or excel form)

The following property returns the entire html text:
word.Documents(1).HTMLProject.HTMLProjectItems(1). Text

But I was hoping not to have to parse this since word seems to know its
a control.

Using excel in a similar fasion:
?ActiveSheet.OLEObjects(1).progid
returns "Forms.HTML:Select.1"

Excel also has an oleObjects collection which returns oletype
xlOLEControl
(documentation only referes to xlOLEType constants: xlOLELink or
xlOLEEmbed as being returned )

ActiveSheet.OLEObjects(1).oletype = xlOLEControl

--------------------------------------------------------------------
The following is excerpted from vba documentation for
"OLE Programmatic Identifiers"
-----------------------------------------------------------------------

ActiveX Controls, Microsoft Access, Microsoft Excel, Microsoft Graph
Microsoft Office Web Components, Microsoft Outlook, Microsoft
PowerPoint
Microsoft Word, ActiveX Controls

ActiveX Controls
To create the ActiveX controls listed in the following table, use the
corresponding OLE programmatic identifier.


_To_create_this_contro_l _Use_this_identifier__
CheckBox Forms.CheckBox.1
ComboBox Forms.ComboBox.1

etc.
------------------------------------------------------------------------
that section doesn't seem to refer to any identifiers with HTML in them
but it seems like it would be the likely place to find them.

Idealy I would like to be able to determine the current selected item
and its associated Value(s)


--
markbpdx
------------------------------------------------------------------------
markbpdx's Profile:
http://www.excelforum.com/member.php...o&userid=37610
View this thread: http://www.excelforum.com/showthread...hreadid=525928





All times are GMT +1. The time now is 01:58 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com