View Single Post
  #2   Report Post  
Posted to microsoft.public.excel.programming
John Coleman John Coleman is offline
external usenet poster
 
Posts: 274
Default How can I delete 'fuzzy' duplicated rows?


ani_unicorn wrote:
I have approximately 45,000 entries that are a mix of text and numerics. I
need to set a formula, macro or something that will help me delete 'fuzzy'or
non-identical rows that are duplicated. ie some rows have typos, or other
similar mistakes.
I can use the unique filter and other formulas to find identical cells but
need to know how to find and delete the rows with odd characters in them.
Please help!
Thanks
Ani


It would be relatively easy to write a function that takes as input two
row indices and returns the percentage of columns in which the two rows
agree. You would need to look at your data and decide how much
agreement constitutes a virtual duplicate. Once you have this function,
then a nested loop can be written which deletes rows if they match
agree with a previous row beyond a certain threshhold. I got to do some
last minute christmas shopping now so don't have time to write any
actual code.

Hope that helps

-John Coleman