Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default taking "strip html" to the next level

I'm really glad I found this - and glad you took the time to write the
code. Not being a 'coder' I was wondering if my request would be
possible to add into the script -

I have various spreadsheets totaling 10's of thousands of cells that
contain HTML tags that I need to remove the HTML from. This script
definitely does that and it helps me quite a bit...

My request is the following:

What I envision is when I double click on a cell with HTML tags the
script executes just like it does, but instead of having to manually
copy the text from within the user form, then click the 'command'
button, and CTRL-V back into the original cell....it would be cool if
those steps were automated.

In other words, the user double-clicks the cell and "bingo" the HTML
markup cell contents are replaced with the non-HTML content.

If you could select an entire column and put all that code in
"For...Next" (i'm sure for...next isn't correct) loop and the script
executes all the way down the column with one double-click - that
would *really* be cool...but if somebody could show me how to do the
simplest automation I'd greatly appreciate it....

making my way through about 15,000 rows and 3 columns is going to be a
lot of double-clicks, CTRL-C, click, CTRL-V, Arrow
Down....repeat.....don't get me wrong though...I'm very appreciative
to have what you've already provided!
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default taking "strip html" to the next level

sorry...I meant to post this as a reply to the thread below.

the original thread I meant to reply to is he
http://groups.google.com/group/micro...cd8f0a1a 71de
  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default taking "strip html" to the next level

On Fri, 13 Jun 2008 07:33:03 -0700 (PDT), Steve127 wrote:

I'm really glad I found this - and glad you took the time to write the
code. Not being a 'coder' I was wondering if my request would be
possible to add into the script -

I have various spreadsheets totaling 10's of thousands of cells that
contain HTML tags that I need to remove the HTML from. This script
definitely does that and it helps me quite a bit...

My request is the following:

What I envision is when I double click on a cell with HTML tags the
script executes just like it does, but instead of having to manually
copy the text from within the user form, then click the 'command'
button, and CTRL-V back into the original cell....it would be cool if
those steps were automated.

In other words, the user double-clicks the cell and "bingo" the HTML
markup cell contents are replaced with the non-HTML content.

If you could select an entire column and put all that code in
"For...Next" (i'm sure for...next isn't correct) loop and the script
executes all the way down the column with one double-click - that
would *really* be cool...but if somebody could show me how to do the
simplest automation I'd greatly appreciate it....

making my way through about 15,000 rows and 3 columns is going to be a
lot of double-clicks, CTRL-C, click, CTRL-V, Arrow
Down....repeat.....don't get me wrong though...I'm very appreciative
to have what you've already provided!


Steve,

Since this is for readability, is it correct to assume that you'd want the
document "collapsed" after having been processed? Or do you just want to leave
blank lines.

For example,

==============================
Option Explicit
Sub StripHTML()
Dim c As Range
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "</?[a-z][a-z0-9]*[^<]*"

'not sure how you want to set the range to act on
' but this is quick and easy

For Each c In Selection
c.Value = re.Replace(c.Value, "")
Next c
End Sub
=================================

removes all the HTML tags in Selection, except for Comments and the Document
Type tags


If you also want to remove the blank lines, then perhaps:

=======================================
Option Explicit
Sub StripHTML()
Dim c As Range, rw As Object
Dim i As Long, lFirstRow As Long, lLastRow As Long, lColumn As Long
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "</?[a-z][a-z0-9]*[^<]*"

Application.ScreenUpdating = False

'not sure how you want to set the range to act on
' but using Selection is quick and easy
For Each c In Selection
c.Value = re.Replace(c.Value, "")
Next c

lFirstRow = Selection.Row
lLastRow = Selection.Rows.Count + lFirstRow - 1
lColumn = Selection.Column


For i = lLastRow To lFirstRow Step -1
If Application.WorksheetFunction.Trim(Cells(i, 1).Value) = "" Then
Cells(i, lColumn).Delete shift:=xlUp
End If
Next i
Application.ScreenUpdating = True
End Sub
=====================================

--ron
  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3
Default taking "strip html" to the next level

Thank you Ron - I'll give both a try and let you know.

To clarify:

I'm working on an export from a MySQL table. The database is part of
a shopping cart system. I inherited the database from person(s) who
input the product data with a lot of deprecated and non-validating
HTML. I am trying to remove all those tags.

As an example:

Suppose column D cells contain 'product_desc' data which are the cells
that have the bad HTML. Using the script from the original poster,
you double click the cell (say D3). In the popup text box you see the
text that is in D3, except the HTML tags are gone. What I do then is
CTRL-A, then CTRL-C, click the command button, and paste back into
D3. That gives me what I'm looking for - same product description
without HTML tags and database/table integrity.

One table alone has over 15,000 rows and 3 fields (or columns) with
bad HTML so you can imagine the routine will take me a very long time
to finish.

There might be a way to do this same thing inside MySQL, but I'm less
proficient at it than I am Excel! :) I can do write basic data
queries, but writing something to remove HTML tags would be way over
my head.

Anyway, hope that gives some insight into my problem (roadblock
really).

BTW...I messed around with the original script and managed to get it
to auto-paste the 'good' text into the cell after clicking the command
button, but I still have to do CTRL-A & CTRL-C. I gave both of those
a shot but kept getting into runtime errors and so forth and it
quickly got past my skill level.

Thank you

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Excel - Golf - how to display "-2" as "2 Under" or "4"as "+4" or "4 Over" in a calculation cell Steve Kay Excel Discussion (Misc queries) 2 August 8th 08 01:54 AM
Sub to strip away "Sheet" prefix from names Max Excel Programming 4 April 17th 07 03:02 PM
how do I stop "copy" taking a picture instead of copying text Martin Gray Excel Discussion (Misc queries) 1 April 6th 06 02:05 PM
"Strip" Cell Formula al Excel Programming 2 October 5th 05 10:42 PM
Hiding "work" taking place on a worksheet. mika.[_2_] Excel Programming 2 November 20th 03 02:19 AM


All times are GMT +1. The time now is 07:49 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"