View Single Post
  #4   Report Post  
Posted to microsoft.public.excel.programming
Claus Busch Claus Busch is offline
external usenet poster
 
Posts: 3,872
Default Eliminate duplicate words from string

Hi,

Am Sun, 29 Nov 2020 23:16:59 -0800 (PST) schrieb Tatsujin:

BTW, I just now realized that my string input will sometimes include punctuation marks at the end of words. The punctuation marks that I am most concerned about are periods, commas, and semi-colons. For example:

sWords = "person woman and man. TV, man, woman; but no person."

What's a good way to remove punctuation marks? That way, my strings only contain words without the punctuation marks to the right of words.


store the unique words without punctuation marks in sWords:

Sub Test()
Dim myDic As Object, re As Object
Dim sWords As String
Dim varText() As Variant, varOut As Variant
Dim i As Long
Dim ptrn, Match, Matches

Set myDic = CreateObject("Scripting.Dictionary")
Set re = CreateObject("vbscript.regexp")

sWords = "person woman and man. TV, man, woman; but no person."

'Separate all "words"
ptrn = "\w+"
re.Pattern = ptrn
re.IgnoreCase = False
re.Global = True
Set Matches = re.Execute(sWords)

ReDim Preserve varText(Matches.Count - 1)
For Each Match In Matches
varText(i) = Match.Value
i = i + 1
Next

'Create unique words
For i = LBound(varText) To UBound(varText)
myDic(varText(i)) = varText(i)
Next
varOut = myDic.items
sWords = Join(varOut, " ")
End Sub


Regards
Claus B.
--
Windows10
Office 2016