View Single Post
  #22   Report Post  
Posted to microsoft.public.excel.worksheet.functions
AFSSkier AFSSkier is offline
external usenet poster
 
Posts: 94
Default With a Function, how can I get rid of formating codes like <di

Ron,

The amp and nbsp are HTML amper codes. How do you remove them?

Original Data (my comments):
<div<font face=Arial size=2 color="#006666"THE (5/31)6-3 FOR VERSION 2 & 3
ARE READY FOR ON & amp ;(<=no spaces after amper) OFF REPORTS </font</div
<div & nbsp ;(<=no spaces after amper) </div

UDF Result (it's only stripping the none printable characters):
THE (5/31)6-3 FOR VERSION 2 amp; 3 ARE READY FOR ON amp; OFF REPORTS nbsp;

Result needed:
THE (5/31)6-3 FOR VERSION 2 and 3 ARE READY FOR ON and OFF REPORTS.

--
Thanks, Kevin


"Ron Rosenfeld" wrote:

On Mon, 1 Jun 2009 10:36:01 -0700, AFSSkier
wrote:

Ron,
Your UDF code works great, however there are still additional formatting
codes like (hard return), and &

--
Thanks, Kevin


That's just a matter of adding those codes to the Pattern.

A code like nbsp has to be added as the hexadecimal ascii code for that which
is A0.

\r and \n are the codes for <CR and <LF

and the & stands alone -- but I don't know how to tell the difference between
an ampersand used as part of a text string, and one being used as a formatting
code.

But try this for the pattern line in the UDF.

======================
re.Pattern = "<[^<]+|[&\xA0\r\n]"
======================


Or, all together:

==============================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<]+|[&\xA0\r\n]"
StripFormat = re.Replace(S, "")
End Function
=================================
--ron