Posted to microsoft.public.excel.programming
|
|
Developing TEXT scrambler kind of FUNCTIONS in Excel
http://www.source-code.biz/snippets/vbasic/4.htm
has some soundex code.
--
Regards,
Tom Ogilvy
"Robin Hammond" wrote in message
...
Hari,
Here's a thought on something you might have a look at, although I have
only
seen it described in SQL's books online rather than trying to use it
myself.
Not sure if it will work for you but it might be worth investigating.
There are two functions: SoundEx and Difference, defined in T-SQL that
might
help. This is from BOL:
SOUNDEX converts an alpha string to a four-character code to find
similar-sounding words or names. The first character of the code is the
first character of character_expression and the second through fourth
characters of the code are numbers. Vowels in character_expression are
ignored unless they are the first letter of the string. String functions
can
be nested.
e.g. in SQL
select soundex('visual basic'), soundex('viswl basic')
returns an identical result
select difference('visual basic','viswl basic')
returns 4, which apparently means they are as similar as possible.
There's a function on John Walkenbach's site written by Richard Yanco that
purports to convert text to a Soundex value that you might be able to test
for use in Excel.
And, strangely enough, if you do a google groups search on Soundex for the
*Excel* groups, there appear to be a few discussions of less complex, but
similar challenges to the one you are trying to crack.
HTH, and let us know if it works,
Robin Hammond
www.enhanceddatasystems.com
"Tim Williams" <saxifrax at pacbell dot net wrote in message
...
Hari,
I'd second the opinions of the other posters: it seems a better approch
to
maintain a list of common mis-spellings and use replace() on your
original
data.
Tim
"Peter T" <peter_t@discussions wrote in message
.. .
Hi Hari,
You may well eventually develop some amazing routine that appears to do
what
you want. But I find it difficult to imagine how it will ever be
foolproof,
leading to a false sense of confidence and false results.
Have you tried working with the Spell checker. Even manually I don't
suppose
it would take too long to run through 1,000 rows, particularly once it
has
been "trained" to your topic (can be automated to some extent). There
is
also AutoCorrect, I notice it does nothing with pasted cells until you
F2/enter. Not sure if it can be automated - haven't tried.
Not the answer you are looking for - just a thought.
Regards,
Peter T
"Hari Prasadh" wrote in message
...
Hi Tom,
2 aspects to it
a) While generating a unique list, I get the frequency count of each
word's
appearance also. Please note a single person / response in column A
could
mention more than 1 tool. Now suppose in column A I have responses
from
3000
people and if I see the frequency then for the software tools
question
I
might get "Visual basic" having let's say 100 appearances, which is
reasonable number for it to be added to the unique list. Now, since
there
will be typos so I might get a count of "Visul basic" being 2 and
count
of
"Viswl Basic" being 1 and similarly .... lots and lots of such FALSE
instances of "Visual Basic" which have very low counts. Now this
happens
for
Each software tool. I cannot use these false instances as part of
unique
list as it wouldnt serve my purpose. As, once am through with the
mapping
I
would assign a numeric code to "Visual Basic" and load the data in
SPSS
(Stats software) and run some statistics on it. If I upload the false
instances of visual basic also then they would play havoc with my
stats.
b) The point made in a) gets compounded because every time its a new
market
research study. Like if am tracking software tools today, tomorrow I
might
be on to tracking responses to a question like " How would you
describe
the
Denim area at this Superstore". So every time it will be preparation
of
a
new and unique list and accordingly the column A also changes. So, I
cannot
invest my time in manipulating column I by having a "messy kind of
list".
That might offset the whole point of automation.
--
Thanks a lot,
Hari
India
"Tom Ogilvy" wrote in message
...
It seems to me, if you build your list of interest from the text of
the
entries, you will already have a list of misspellings. It would be
far
easier to construct a cross walk table from that than what you are
asking
(I
would think).
--
Regards,
Tom Ogilvy
"Hari Prasadh" wrote in message
...
Hi,
This is the code which I have written for automatic mapping of
data.
If some sample data (10 rows and 2 columns ) is required, please
tell
me
and
I would be happy to paste that as well in the future post.
Option Explicit
Public basearray() As String
Public arrWords As Variant
Sub readingarrayofuniquewords()
Dim p As Integer
Dim BaseArrLength As Integer
Range("i65536").Select
Selection.End(xlUp).Select
p = Selection.Row - 2
ReDim basearray(p)
For BaseArrLength = 0 To p
basearray(BaseArrLength) = Cells(BaseArrLength + 2, "i")
Next BaseArrLength
End Sub
Sub Upcoding()
Dim R As String
Dim z As Integer
Dim g As Integer
Dim hu As Integer
Dim hg As Integer
Dim msgboxresult As String
Dim tempwithspace As String
Dim tempwithoutspace As String
Dim flag As Integer
Application.ScreenUpdating = False
msgboxresult = MsgBox("Columns B through F will be cleared" &
vbLf
&
"
Press
no if you want to exit out of the macro", vbYesNo, " Warning")
If msgboxresult = vbNo Then Exit Sub
Range("B2:G65536").Select
Selection.ClearContents
Range("A65536").Select
Selection.End(xlUp).Select
R = ActiveCell.Row
Call readingarrayofuniquewords
For z = 2 To R
flag = 0
Splitwords ActiveSheet.Range("A" & z).Value
Range("B" & z).Select
'here put the function for combining elements of array
For hu = UBound(arrWords) To LBound(arrWords) Step -1
For hg = hu To UBound(arrWords) Step 1
If ActiveCell.Column 7 Then
flag = 1
Exit For
End If
tempwithspace =
MergingElementsOfArrayWithSpace(arrWords,
hu, hg)
tempwithoutspace =
MergingElementsOfArrayWithoutSpace(arrWords, hu, hg)
For g = LBound(basearray) To UBound(basearray)
If UCase(tempwithspace) = UCase(basearray(g))
Then
ActiveCell.Value = basearray(g)
' u have to put the logic for more than
4
then
exit
loop below
ActiveCell.Offset(0,
1).Range("A1").Select
Exit For
ElseIf UCase(tempwithoutspace) =
UCase(basearray(g))
Then
ActiveCell.Value = basearray(g)
' u have to put the logic for more than
4
then
exit
loop below
ActiveCell.Offset(0,
1).Range("A1").Select
Exit For
End If
Next g
Next hg
If flag = 1 Then
Exit For
End If
Next hu
Next z
Application.ScreenUpdating = True
End Sub
Sub Splitwords(sText As String)
Dim x As Integer
Dim arrReplace As Variant
arrReplace = Array(vbTab, ":", ";", ".", ",", "-", Chr(10),
Chr(13))
For x = LBound(arrReplace) To UBound(arrReplace)
sText = Replace(sText, arrReplace(x), " ")
Next x
arrWords = Split(Application.WorksheetFunction.Trim(sText), "
")
End Sub
Function MergingElementsOfArrayWithoutSpace(concatarray As
Variant,
hi
As
Integer, ti As Integer) As Variant
Dim tmp As String
Dim f As Integer
tmp = ""
'see whether the range ti - hi to ti is correct or if it is _
to be increased by 1.
For f = ti - hi To ti
tmp = tmp & concatarray(f)
Next f
MergingElementsOfArrayWithoutSpace =
Application.WorksheetFunction.Trim(tmp)
End Function
Function MergingElementsOfArrayWithSpace(concatarray As Variant,
hi
As
Integer, ti As Integer) As Variant
Dim tmp As String
Dim f As Integer
tmp = ""
'see whether the range ti - hi to ti is correct or if it is _
to be increased by 1.
For f = ti - hi To ti
tmp = tmp & concatarray(f) & " "
Next f
MergingElementsOfArrayWithSpace =
Application.WorksheetFunction.Trim(tmp)
End Function
--
Thanks a lot,
Hari
India
|