View Single Post
  #3   Report Post  
Posted to microsoft.public.excel.programming
Ron Rosenfeld Ron Rosenfeld is offline
external usenet poster
 
Posts: 5,651
Default taking "strip html" to the next level

On Fri, 13 Jun 2008 07:33:03 -0700 (PDT), Steve127 wrote:

I'm really glad I found this - and glad you took the time to write the
code. Not being a 'coder' I was wondering if my request would be
possible to add into the script -

I have various spreadsheets totaling 10's of thousands of cells that
contain HTML tags that I need to remove the HTML from. This script
definitely does that and it helps me quite a bit...

My request is the following:

What I envision is when I double click on a cell with HTML tags the
script executes just like it does, but instead of having to manually
copy the text from within the user form, then click the 'command'
button, and CTRL-V back into the original cell....it would be cool if
those steps were automated.

In other words, the user double-clicks the cell and "bingo" the HTML
markup cell contents are replaced with the non-HTML content.

If you could select an entire column and put all that code in
"For...Next" (i'm sure for...next isn't correct) loop and the script
executes all the way down the column with one double-click - that
would *really* be cool...but if somebody could show me how to do the
simplest automation I'd greatly appreciate it....

making my way through about 15,000 rows and 3 columns is going to be a
lot of double-clicks, CTRL-C, click, CTRL-V, Arrow
Down....repeat.....don't get me wrong though...I'm very appreciative
to have what you've already provided!


Steve,

Since this is for readability, is it correct to assume that you'd want the
document "collapsed" after having been processed? Or do you just want to leave
blank lines.

For example,

==============================
Option Explicit
Sub StripHTML()
Dim c As Range
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "</?[a-z][a-z0-9]*[^<]*"

'not sure how you want to set the range to act on
' but this is quick and easy

For Each c In Selection
c.Value = re.Replace(c.Value, "")
Next c
End Sub
=================================

removes all the HTML tags in Selection, except for Comments and the Document
Type tags


If you also want to remove the blank lines, then perhaps:

=======================================
Option Explicit
Sub StripHTML()
Dim c As Range, rw As Object
Dim i As Long, lFirstRow As Long, lLastRow As Long, lColumn As Long
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "</?[a-z][a-z0-9]*[^<]*"

Application.ScreenUpdating = False

'not sure how you want to set the range to act on
' but using Selection is quick and easy
For Each c In Selection
c.Value = re.Replace(c.Value, "")
Next c

lFirstRow = Selection.Row
lLastRow = Selection.Rows.Count + lFirstRow - 1
lColumn = Selection.Column


For i = lLastRow To lFirstRow Step -1
If Application.WorksheetFunction.Trim(Cells(i, 1).Value) = "" Then
Cells(i, lColumn).Delete shift:=xlUp
End If
Next i
Application.ScreenUpdating = True
End Sub
=====================================

--ron