Explanation of SUMPRODUCT
"Bob Phillips" wrote:
In the example that you show, which should be (at least)
=SUMPRODUCT(1/(COUNTIF(A1:A10,A1:A10)))
In this version of the formula, I don't think you need SUMPRODUCT. SUM will
do, since you're dealing with a single array and you aren't doing any
dot-producting. I don't even think you need to enter it as an array formula.
But neither the SUMPRODUCT or SUM version will work if one of the cells is
blank.
For the purposes of exploring this SUMPRODUCT construct a little further,
let's say we take Bill's example and erase the contents of the cell that has
"Jon" in it. The data in A1:A10 is then Bob,John,Bob,Bob,John,
,Bob,Bill,Bill,Max
Here's a parsing of another popular version of this formula, one that
ignores blank cells:
=SUM((A1:A10<"")/COUNTIF(A1:A10,A1:A10&""))
Piece by piece, here's what that says:
COUNTIF(A1:A10,A1:A10), as Bob explained, produces a 10-element array where
each element is the number of times the corresponding cell value appears in
the range A1:A10.
The extra &"" at the end is a string concatenation. You're adding a
zero-length string onto whatever was in each cell. That converts empty
cells, which Excel interprets as the number 0, to empty strings. Why do you
want to do that? Because if you represent empty cells as 0, then you'll end
up with zero in the denominator of the 1/COUNTIF() expression. And you know
what happens when you do that. So the COUNTIF(A1:A10.A1:A10&"") part of the
expression evaluates to: {4;2;4;4;2;1;4;2;2;1}. If we didn't have the &"",
the array would have been {4;2;4;4;2;0;4;2;2;1}. Notice the zero.
Why is that zero there? Because COUNTIF promotes an empty cell to 0 if the
cell is in the 'criteria' argument, but not if the empty cell is in the
'range' argument position. Try this: put the number 0 in B1 and B2. Put
COUNTIF(B1,B2) in cell B3. Then try deleting the zeros in B1 and B2 one at a
time. Watch how it affects B3. (Let me know if you find the documentation
on this.)
Now, back to the numerator of our array-over-array fraction: we've got
(A1:A10<"") We're checking whether each cell in A1:A10 is not an empty
cell. That evaluates to an array of Booleans:
{TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRU E}
In arithmetic expressions, Excel interprets TRUE as 1 and FALSE as 0. You
will sometimes see Booleans preceded by a double minus, --, which will
explicitly force a Boolean-to-integer conversion.
So for each entry representing a non-empty cell, we've got
1/(the number of cells with that value in them)
and for empty cells we've got
0/(the number of empty cells).
Add up the resulting array of fractions and you end up with a count of
unique non-empty values in the range A1:A10.
I find that I can get away without using an array formula if I use
SUMPRODUCT to add the array, but not if I use SUM. If I use SUM, I have to
enter the formula as an array formula. I suppose SUMPRODUCT knows how to
handle an array divided by an array piecewise, while SUM doesn't.
|