Just another option to tinker around with ..
which also extracts both the list of uniques
and the list of duplicates for reference ..
Meanings:
--------------
Uniques = 1st instance of the item in the list,
Duplicates = 2nd, 3rd, etc instances of the item in the list
There's no case sensitivity distinction for uniques,
i.e. HVT = HvT = hvt (all are considered identical, not unique)
Assume the data is in Sheet1, in A2 down
HVT88947
HVT88947
HVT88948
HVT88948
9784268
712589
DDKLM
DDKLM
HVT88946
712589
etc
Put in B2:
=IF(OR(COUNTIF($A$2:A2,A2)1,A2=""),"",ROW())
Put in C2:
=IF(OR(COUNTIF($A$2:A2,A2)<2,A2=""),"",ROW())
Select B2:C2, fill down to say, C100
(Let's take a small list length of ~100 items)
In Sheet2
-------------
Select A2:A100
(# of rows similar to cols B and C in Sheet1)
Put in the formula bar:
=IF(ISERROR(MATCH(SMALL(Sheet1!B:B,ROW(A1:A100)),S heet1!B:B,0)),"",INDEX(She
et1!A:A,MATCH(SMALL(Sheet1!B:B,ROW(A1:A100)),Sheet 1!B:B,0)))
Array-enter the formula with CTRL+SHIFT+ENTER
instead of just pressing ENTER
The above extracts the list of uniques from Sheet1's col A
Select B2:B100
(# of rows similar to cols B and C in Sheet1)
Put in the formula bar:
=IF(ISERROR(MATCH(SMALL(Sheet1!C:C,ROW(A1:A100)),S heet1!C:C,0)),"",INDEX(She
et1!A:A,MATCH(SMALL(Sheet1!C:C,ROW(A1:A100)),Sheet 1!C:C,0)))
Array-enter the formula with CTRL+SHIFT+ENTER
The above extracts the list of duplicates from Sheet1's col A
Put in A1: =SUMPRODUCT(--(A2:A100<""))
A1 returns the # of uniques
Put in B1: =COUNTA(Sheet1!A:A)-A1
B1 returns the # of duplicates
For the sample data in A2:A11 in Sheet1
(as per your post,
with A1:B1 assumed empty, no headers)
you'll get in col A:
6 (# of uniques)
HVT88947
HVT88948
9784268
712589
DDKLM
HVT88946
and in col B:
4 (# of duplicates)
HVT88947
HVT88948
DDKLM
71258
Test out the returns above
by adding some new unique & duplicate items
in Sheet1's col A
(you can even leave blank cells in-between entries)
--
The drawback of the above set-up
is the performance hit [calc speed]
as the range involved increases
The formulas could all be adapted to suit
say, a big input range of A2:A5000 in Sheet1
but the response time would suffer quite a bit
One way to mitigate the performance hit would be to
switch the calc mode to "Manual", via:
Tools Options Calculation tab Check "Manual" OK
And then press F9 to recalc after entries in col A
are done per input session
--
Rgds
Max
xl 97
---
GMT+8, 1° 22' N 103° 45' E
xdemechanik <atyahoo<dotcom
----
"Duplicateman" wrote in message
...
Let's say you have a list of hundreds of confirmation numbers, they can be
text, numeric or alpa-numeric, how can I know how many of the entries are
exact duplicates vs how many are different. Example:
HVT88947
HVT88947
HVT88948
HVT88948
9784268
712589
DDKLM
DDKLM
HVT88946
712589
Ok, we have 10 entries above. Four of them are duplicate entries, six of
them are different entries, right? Ok, so lets say you now have thousands
of
these entries and you know there are lots of duplicates somewhere in the
list. How can you calculate how many are duplicate entries and how many
are
different (or original entries)? And you need to have the calc work so
that
as people add more entries the calc always updates the number of
duplicates
there are in the growing list?
|