ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Eliminate duplicate words from string (https://www.excelbanter.com/excel-programming/454999-eliminate-duplicate-words-string.html)

Tatsujin

Eliminate duplicate words from string
 
If I have a string of words, such as:

sWords = "person person woman man camera TV TV person man man"

What is an efficient way to remove duplicate words? The variable sWords should contain "person woman man camera TV".

Thanks!

Claus Busch

Eliminate duplicate words from string
 
Hi,

Am Sun, 22 Nov 2020 01:03:30 -0800 (PST) schrieb Tatsujin:

If I have a string of words, such as:

sWords = "person person woman man camera TV TV person man man"

What is an efficient way to remove duplicate words? The variable sWords should contain "person woman man camera TV".


try:

Sub Test()
Dim sWords As String
Dim myDic As Object
Dim varWords As Variant, varTmp As Variant
Dim i As Integer

sWords = "person person woman man camera TV TV person man man"
varWords = Split(sWords, " ")
Set myDic = CreateObject("scripting.dictionary")
For i = LBound(varWords) To UBound(varWords)
myDic(varWords(i)) = varWords(i)
Next i
varTmp = myDic.items
sWords = Join(varTmp, " ")
End Sub


Regards
Claus B.
--
Windows10
Office 2016

Tatsujin

Eliminate duplicate words from string
 
Thanks Claus, that works perfectly...

BTW, I just now realized that my string input will sometimes include punctuation marks at the end of words. The punctuation marks that I am most concerned about are periods, commas, and semi-colons. For example:

sWords = "person woman and man. TV, man, woman; but no person."

What's a good way to remove punctuation marks? That way, my strings only contain words without the punctuation marks to the right of words.

Thanks!

Claus Busch

Eliminate duplicate words from string
 
Hi,

Am Sun, 29 Nov 2020 23:16:59 -0800 (PST) schrieb Tatsujin:

BTW, I just now realized that my string input will sometimes include punctuation marks at the end of words. The punctuation marks that I am most concerned about are periods, commas, and semi-colons. For example:

sWords = "person woman and man. TV, man, woman; but no person."

What's a good way to remove punctuation marks? That way, my strings only contain words without the punctuation marks to the right of words.


store the unique words without punctuation marks in sWords:

Sub Test()
Dim myDic As Object, re As Object
Dim sWords As String
Dim varText() As Variant, varOut As Variant
Dim i As Long
Dim ptrn, Match, Matches

Set myDic = CreateObject("Scripting.Dictionary")
Set re = CreateObject("vbscript.regexp")

sWords = "person woman and man. TV, man, woman; but no person."

'Separate all "words"
ptrn = "\w+"
re.Pattern = ptrn
re.IgnoreCase = False
re.Global = True
Set Matches = re.Execute(sWords)

ReDim Preserve varText(Matches.Count - 1)
For Each Match In Matches
varText(i) = Match.Value
i = i + 1
Next

'Create unique words
For i = LBound(varText) To UBound(varText)
myDic(varText(i)) = varText(i)
Next
varOut = myDic.items
sWords = Join(varOut, " ")
End Sub


Regards
Claus B.
--
Windows10
Office 2016

jerry_maguire

Try this simply
 
To filter for unique values, click Data Sort & Filter Advanced. To remove duplicate values, click Data Data Tools Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.

Regards,
Jerry


All times are GMT +1. The time now is 02:36 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
ExcelBanter.com