View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.programming
Hari Prasadh Hari Prasadh is offline
external usenet poster
 
Posts: 63
Default Developing TEXT scrambler kind of FUNCTIONS in Excel

Hi,

To put things in perspective, I analyse Market research data.

Let's say I have some string type data starting from Cell A2 to Last Cell in
column A Also let's say I have some string type data starting from Cell I2
to Last Cell in column I.

The data in a sample cell of column A (let's say cell A2) would be something
like " I use C++ , Visual Basic and Win2K Server at my workplace. At home I
dabble with C++ and Qualcomm". Basically column A would be complete
sentences and out of that sentence I would be interested only in some of the
words. Like if I'm tracking usage of software tools (and if am not
interested in Operating systems) then for me only C++ and VB would be my
point of interest. This is where Column I plays its part.

With full help from NewsGroup (Tim Williams - "Generating count of unique
words in a cell or cells" ) I have been able to get a nice piece of module
which enables me to get a count of unique words ( frequency of a word in
Column A) . After running the module , I scan the results and expunge those
words which are not point of interest in my study. Like based on the above
example - the words "I" , "use" , "Win2K server" and "Qualcomm" etc. would
be removed. I then take the remaining list of unique words and paste them in
column I (starting from row 2 ). Hence, in column I would have a list of
RELEVANT words only.

The part which I explained above, I naively refer to as Text Mining.

After this I developed a macro ( by copying snippets of syntax from variety
of sources and Recording feature). This macro basically compares the CELLS
in Column A to Column I
and display the Matching words in Column B through E. What I mean is cells
in columns B thru G display a list of words which appear in the
corresponding cell of Column A AND also appears within any cell in Column I.

Taking the above example cell B2 would say "C++" and Cell C2 would say "VB"
because Column I would not be having rest of the words which are there in
Cell A2. (cell D2 and E2 would be left blank. Please note if there were no
matches then B thru E will be left blank.)

Presently the problem is the text in column A would be having TYPOS. Like
somebody may say in cell A2 "I use Visula Basic" and another person may say
in cell A3 "I use Visul Basic". Now, I wont be getting any data matches in
Column B because column I would be having "Visual Basic" but not "Visula
Basic" or Visul Basic".

So, I want to develop a TEXT Scrambler function(S) which can :-

a) First function - SCRAMBLE a single letter of the word in Column I . That
is if Column I has "Visual basic" then any 2 adjacent non empty letters are
swapped. That is function should be capable of giving out results like
"Visula Basic" , "Visual Baisc" and similar permutations of adjacent letters
only. I hope that at a time only "one" transformation of adjacent letters
would be sufficient. (first letter might not be permuted as my understanding
is that people dont commit typing errors in their first letter.) I dont want
to swap the "space" between 2 words, that is in a particular transformation
I would just swap any 2 adjacent LETTERS of a particular WORD within the
STRING.

b) Second function - MISS or remove a single letter of the word in column I.
That is if a particular cell in column I has "Visual Basic" then it could
give me permutations like "Viual Basic" , "Visual Baic" etc.

c) Third function - SUBSTITUTE a single letter of the word in column I with
any of the other 25 letters of the English alphabet. That is if column I has
"Visual basic" then it would be able to give me "Vidual Basic" , "Visual
Nasic" and similar permutations.

d) Fourth function - Am being too ambitious but.... Would like to have a
function which can combine the effects of a), b) c) simultaneously though
each of them are individually transformed only once. (Would doing this be
disastrous from computing resources point of view ?)

I want all the above to be FUNCTIONS and not macros . I'm aware about the
difference between 2 only to the extent that in case of a function I can
write a statement like :-
If StringSubsetFromColumnA = ScrambledCellofColumnI(..,...) Then
CellinColumnB = UnscrambledcellofColumnI
End if

I hope I have been able to express my needs correctly. Im posting my present
unscrambled macro in the follow-up post to this as I didnt want to make my
post too big. (Not posting everything in one mail, is that a correct
practice in Newsgroups ?)


Thanks a lot,
Hari
India