ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Worksheet Functions (https://www.excelbanter.com/excel-worksheet-functions/)
-   -   Shannon diversity index formula (https://www.excelbanter.com/excel-worksheet-functions/138279-shannon-diversity-index-formula.html)

peter

Shannon diversity index formula
 
I'm trying to create a single formula that evaluates to the Shannon
diversity index (SDI) on a single array. An explanation of SDI can be
found on http://en.wikipedia.org/wiki/Diversity_index.

SDI = -E { p(i)*log(p(i)) }

In words, that's the negative sum over the (proportion times the log
of the proportion)

It's a simple thing to do if all the values in the array are 0, but
if one is zero, that screws up the LOG function, because LOG(0) is
undefined and so it returns an error.

This is my formula at present: =-SUMPRODUCT((B2:B20/SUM(B$2:B
$20)),LOG(B2:B20/SUM(B$2:B$20)))
And below is the sample data...

A B
1 Species TOTAL
2 Species 1 2
3 Species 2 2
4 Species 3 2
5 Species 4 2
6 Species 5 2
7 Species 6 2
8 Species 7 2
9 Species 8 2
10 Species 9 2
11 Species 10 2
12 Species 11 2
13 Species 12 2
14 Species 13 2
15 Species 14 2
16 Species 15 2
17 Species 16 2
18 Species 17 2
19 Species 18 2
20 Species 19 2

The TOTAL column is the sum of all the following columns. However,
there will always be a number of zeros in the TOTAL column, so the
function returns an error. I've tried using conditional tests as
explained on http://www.xldynamic.com/source/xld.SUMPRODUCT.html, but
I can't figure out how to get those to work inside the LOG function
which is the source of the error.

The SDI is also easy to calculate if I have a couple of columns in
which to put the proportions and the LOG(proportions), but that's not
what I'm looking for.

I have seen a few functions made by various folks around on the
internet, but 1) some have this same problem, and 2) some require an
add-in, neither of which are acceptable.

What say you all?
Peter


ExcelBanter AI

Answer: Shannon diversity index formula
 
Hi Peter,

I understand that you are trying to create a single formula that evaluates to the Shannon diversity index (SDI) on a single array. The formula you have provided is correct, but it returns an error when there are zeros in the array. To solve this problem, you can use the IF function to check if the value is zero and replace it with a very small number, such as
Code:

0.0000000001
. Here's the modified formula:
  1. =-SUMPRODUCT((B2:B20/SUM(B$2:B$20)),IF(B2:B20=0,
    Code:

    0.0000000001
    ,B2:B20/SUM(B$2:B$20))*LOG(IF(B2:B20=0,
    Code:

    0.0000000001
    ,B2:B20/SUM(B$2:B$20)]))

This formula first checks if the value in B2:B20 is zero using the IF function. If it is zero, it replaces it with
Code:

0.0000000001
, otherwise, it uses the original value. Then it calculates the proportion and multiplies it by the log of the proportion. Finally, it sums up all the values to get the SDI.

Barb Reinhardt

Shannon diversity index formula
 
Try this

=-SUMPRODUCT(--(C2:C200),(C2:C20/SUM(C2:C20)),(LOG(C2:C20/SUM(C2:C20))))



"peter" wrote:

I'm trying to create a single formula that evaluates to the Shannon
diversity index (SDI) on a single array. An explanation of SDI can be
found on http://en.wikipedia.org/wiki/Diversity_index.

SDI = -E { p(i)*log(p(i)) }

In words, that's the negative sum over the (proportion times the log
of the proportion)

It's a simple thing to do if all the values in the array are 0, but
if one is zero, that screws up the LOG function, because LOG(0) is
undefined and so it returns an error.

This is my formula at present: =-SUMPRODUCT((B2:B20/SUM(B$2:B
$20)),LOG(B2:B20/SUM(B$2:B$20)))
And below is the sample data...

A B
1 Species TOTAL
2 Species 1 2
3 Species 2 2
4 Species 3 2
5 Species 4 2
6 Species 5 2
7 Species 6 2
8 Species 7 2
9 Species 8 2
10 Species 9 2
11 Species 10 2
12 Species 11 2
13 Species 12 2
14 Species 13 2
15 Species 14 2
16 Species 15 2
17 Species 16 2
18 Species 17 2
19 Species 18 2
20 Species 19 2

The TOTAL column is the sum of all the following columns. However,
there will always be a number of zeros in the TOTAL column, so the
function returns an error. I've tried using conditional tests as
explained on http://www.xldynamic.com/source/xld.SUMPRODUCT.html, but
I can't figure out how to get those to work inside the LOG function
which is the source of the error.

The SDI is also easy to calculate if I have a couple of columns in
which to put the proportions and the LOG(proportions), but that's not
what I'm looking for.

I have seen a few functions made by various folks around on the
internet, but 1) some have this same problem, and 2) some require an
add-in, neither of which are acceptable.

What say you all?
Peter



peter

Shannon diversity index formula
 
This function still evaluates to a #NUM error.

On Apr 8, 2:22 pm, Barb Reinhardt
wrote:
Try this

=-SUMPRODUCT(--(C2:C200),(C2:C20/SUM(C2:C20)),(LOG(C2:C20/SUM(C2:C20))))



Gary''s Student

Shannon diversity index formula
 
p should never be zero. Because pi is the fraction of individuals belonging
to the i-th species, by definition it can't be zero. Including a zero is
like including the number of whales in the Amazon river basin - there aren't
any. Just exclude zeros, polar bears, kangaroos, etc.
--
Gary''s Student - gsnu200714


"peter" wrote:

I'm trying to create a single formula that evaluates to the Shannon
diversity index (SDI) on a single array. An explanation of SDI can be
found on http://en.wikipedia.org/wiki/Diversity_index.

SDI = -E { p(i)*log(p(i)) }

In words, that's the negative sum over the (proportion times the log
of the proportion)

It's a simple thing to do if all the values in the array are 0, but
if one is zero, that screws up the LOG function, because LOG(0) is
undefined and so it returns an error.

This is my formula at present: =-SUMPRODUCT((B2:B20/SUM(B$2:B
$20)),LOG(B2:B20/SUM(B$2:B$20)))
And below is the sample data...

A B
1 Species TOTAL
2 Species 1 2
3 Species 2 2
4 Species 3 2
5 Species 4 2
6 Species 5 2
7 Species 6 2
8 Species 7 2
9 Species 8 2
10 Species 9 2
11 Species 10 2
12 Species 11 2
13 Species 12 2
14 Species 13 2
15 Species 14 2
16 Species 15 2
17 Species 16 2
18 Species 17 2
19 Species 18 2
20 Species 19 2

The TOTAL column is the sum of all the following columns. However,
there will always be a number of zeros in the TOTAL column, so the
function returns an error. I've tried using conditional tests as
explained on http://www.xldynamic.com/source/xld.SUMPRODUCT.html, but
I can't figure out how to get those to work inside the LOG function
which is the source of the error.

The SDI is also easy to calculate if I have a couple of columns in
which to put the proportions and the LOG(proportions), but that's not
what I'm looking for.

I have seen a few functions made by various folks around on the
internet, but 1) some have this same problem, and 2) some require an
add-in, neither of which are acceptable.

What say you all?
Peter



peter

Shannon diversity index formula
 
Well, when you've got a datasheet with a list of 400 species on it,
and the polar bear doesn't happen to show up for the survey that day,
then the polar bear gets a zero. So what you're saying is what I'm
trying to do... exclude them without having to do lots of
manipulations.

And I'm quite happy to say that there are, in fact, whales in the
Amazon River basin. The Amazon River dolphin: http://www.acsonline.org/factpack/Boto.htm.
You're correct that the polar bears and kangaroos are elsewhere.

However, I'm not asking a theoretical question here, I'm asking for a
formula. So, back on track... anyone with anything helpful?

On Apr 8, 3:36 pm, Gary''s Student
wrote:
p should never be zero. Because pi is the fraction of individuals belonging
to the i-th species, by definition it can't be zero. Including a zero is
like including the number of whales in the Amazon river basin - there aren't
any. Just exclude zeros, polar bears, kangaroos, etc.
--
Gary''s Student - gsnu200714

"peter" wrote:
I'm trying to create a single formula that evaluates to the Shannon
diversity index (SDI) on a single array. An explanation of SDI can be
found onhttp://en.wikipedia.org/wiki/Diversity_index.


SDI = -E { p(i)*log(p(i)) }


In words, that's the negative sum over the (proportion times the log
of the proportion)


It's a simple thing to do if all the values in the array are 0, but
if one is zero, that screws up the LOG function, because LOG(0) is
undefined and so it returns an error.


This is my formula at present: =-SUMPRODUCT((B2:B20/SUM(B$2:B
$20)),LOG(B2:B20/SUM(B$2:B$20)))
And below is the sample data...


A B
1 Species TOTAL
2 Species 1 2
3 Species 2 2
4 Species 3 2
5 Species 4 2
6 Species 5 2
7 Species 6 2
8 Species 7 2
9 Species 8 2
10 Species 9 2
11 Species 10 2
12 Species 11 2
13 Species 12 2
14 Species 13 2
15 Species 14 2
16 Species 15 2
17 Species 16 2
18 Species 17 2
19 Species 18 2
20 Species 19 2


The TOTAL column is the sum of all the following columns. However,
there will always be a number of zeros in the TOTAL column, so the
function returns an error. I've tried using conditional tests as
explained onhttp://www.xldynamic.com/source/xld.SUMPRODUCT.html, but
I can't figure out how to get those to work inside the LOG function
which is the source of the error.


The SDI is also easy to calculate if I have a couple of columns in
which to put the proportions and the LOG(proportions), but that's not
what I'm looking for.


I have seen a few functions made by various folks around on the
internet, but 1) some have this same problem, and 2) some require an
add-in, neither of which are acceptable.


What say you all?
Peter




Gary''s Student

Shannon diversity index formula
 
Sorry about the whales thing... I have nothing against whales, some of my
best friend are cetaceans.


In any event in C2 thru C20 enter:

=-(B2/SUM($B$2:$B$20))*IF(B2=0,0,LOG(B2/SUM(B$2:B$20)))
and in another cell:

=sum(C2:C20)
--
Gary''s Student - gsnu200714


"peter" wrote:

Well, when you've got a datasheet with a list of 400 species on it,
and the polar bear doesn't happen to show up for the survey that day,
then the polar bear gets a zero. So what you're saying is what I'm
trying to do... exclude them without having to do lots of
manipulations.

And I'm quite happy to say that there are, in fact, whales in the
Amazon River basin. The Amazon River dolphin: http://www.acsonline.org/factpack/Boto.htm.
You're correct that the polar bears and kangaroos are elsewhere.

However, I'm not asking a theoretical question here, I'm asking for a
formula. So, back on track... anyone with anything helpful?

On Apr 8, 3:36 pm, Gary''s Student
wrote:
p should never be zero. Because pi is the fraction of individuals belonging
to the i-th species, by definition it can't be zero. Including a zero is
like including the number of whales in the Amazon river basin - there aren't
any. Just exclude zeros, polar bears, kangaroos, etc.
--
Gary''s Student - gsnu200714

"peter" wrote:
I'm trying to create a single formula that evaluates to the Shannon
diversity index (SDI) on a single array. An explanation of SDI can be
found onhttp://en.wikipedia.org/wiki/Diversity_index.


SDI = -E { p(i)*log(p(i)) }


In words, that's the negative sum over the (proportion times the log
of the proportion)


It's a simple thing to do if all the values in the array are 0, but
if one is zero, that screws up the LOG function, because LOG(0) is
undefined and so it returns an error.


This is my formula at present: =-SUMPRODUCT((B2:B20/SUM(B$2:B
$20)),LOG(B2:B20/SUM(B$2:B$20)))
And below is the sample data...


A B
1 Species TOTAL
2 Species 1 2
3 Species 2 2
4 Species 3 2
5 Species 4 2
6 Species 5 2
7 Species 6 2
8 Species 7 2
9 Species 8 2
10 Species 9 2
11 Species 10 2
12 Species 11 2
13 Species 12 2
14 Species 13 2
15 Species 14 2
16 Species 15 2
17 Species 16 2
18 Species 17 2
19 Species 18 2
20 Species 19 2


The TOTAL column is the sum of all the following columns. However,
there will always be a number of zeros in the TOTAL column, so the
function returns an error. I've tried using conditional tests as
explained onhttp://www.xldynamic.com/source/xld.SUMPRODUCT.html, but
I can't figure out how to get those to work inside the LOG function
which is the source of the error.


The SDI is also easy to calculate if I have a couple of columns in
which to put the proportions and the LOG(proportions), but that's not
what I'm looking for.


I have seen a few functions made by various folks around on the
internet, but 1) some have this same problem, and 2) some require an
add-in, neither of which are acceptable.


What say you all?
Peter





peter

Shannon diversity index formula
 
Thanks, but I'm really trying to consolidate the whole thing into a
single formula in a single cell.

On Apr 8, 4:40 pm, Gary''s Student
wrote:
Sorry about the whales thing... I have nothing against whales, some of my
best friend are cetaceans.

In any event in C2 thru C20 enter:

=-(B2/SUM($B$2:$B$20))*IF(B2=0,0,LOG(B2/SUM(B$2:B$20)))
and in another cell:

=sum(C2:C20)
--
Gary''s Student - gsnu200714

"peter" wrote:
Well, when you've got a datasheet with a list of 400 species on it,
and the polar bear doesn't happen to show up for the survey that day,
then the polar bear gets a zero. So what you're saying is what I'm
trying to do... exclude them without having to do lots of
manipulations.


And I'm quite happy to say that there are, in fact, whales in the
Amazon River basin. The Amazon River dolphin: http://www.acsonline.org/factpack/Boto.htm.
You're correct that the polar bears and kangaroos are elsewhere.


However, I'm not asking a theoretical question here, I'm asking for a
formula. So, back on track... anyone with anything helpful?


On Apr 8, 3:36 pm, Gary''s Student
wrote:
p should never be zero. Because pi is the fraction of individuals belonging
to the i-th species, by definition it can't be zero. Including a zero is
like including the number of whales in the Amazon river basin - there aren't
any. Just exclude zeros, polar bears, kangaroos, etc.
--
Gary''s Student - gsnu200714


"peter" wrote:
I'm trying to create a single formula that evaluates to the Shannon
diversity index (SDI) on a single array. An explanation of SDI can be
found onhttp://en.wikipedia.org/wiki/Diversity_index.


SDI = -E { p(i)*log(p(i)) }


In words, that's the negative sum over the (proportion times the log
of the proportion)


It's a simple thing to do if all the values in the array are 0, but
if one is zero, that screws up the LOG function, because LOG(0) is
undefined and so it returns an error.


This is my formula at present: =-SUMPRODUCT((B2:B20/SUM(B$2:B
$20)),LOG(B2:B20/SUM(B$2:B$20)))
And below is the sample data...


A B
1 Species TOTAL
2 Species 1 2
3 Species 2 2
4 Species 3 2
5 Species 4 2
6 Species 5 2
7 Species 6 2
8 Species 7 2
9 Species 8 2
10 Species 9 2
11 Species 10 2
12 Species 11 2
13 Species 12 2
14 Species 13 2
15 Species 14 2
16 Species 15 2
17 Species 16 2
18 Species 17 2
19 Species 18 2
20 Species 19 2


The TOTAL column is the sum of all the following columns. However,
there will always be a number of zeros in the TOTAL column, so the
function returns an error. I've tried using conditional tests as
explained onhttp://www.xldynamic.com/source/xld.SUMPRODUCT.html, but
I can't figure out how to get those to work inside the LOG function
which is the source of the error.


The SDI is also easy to calculate if I have a couple of columns in
which to put the proportions and the LOG(proportions), but that's not
what I'm looking for.


I have seen a few functions made by various folks around on the
internet, but 1) some have this same problem, and 2) some require an
add-in, neither of which are acceptable.


What say you all?
Peter




Barb Reinhardt

Shannon diversity index formula
 
Try this then:

=-SUMPRODUCT((C$2:C$20/SUM(C$2:C$20)),(IF(C$2:C$200,(LOG(C$2:C$20/SUM(C$2:C$20))))))
Activate with CTRL SHIFT ENTER
You should see {} around the formula when you are done.

"peter" wrote:

This function still evaluates to a #NUM error.

On Apr 8, 2:22 pm, Barb Reinhardt
wrote:
Try this

=-SUMPRODUCT(--(C2:C200),(C2:C20/SUM(C2:C20)),(LOG(C2:C20/SUM(C2:C20))))




Barb Reinhardt

Shannon diversity index formula
 
Try this then:

=-SUMPRODUCT((C$2:C$20/SUM(C$2:C$20)),(IF(C$2:C$200,(LOG(C$2:C$20/SUM(C$2:C$20))))))
Activate with CTRL SHIFT ENTER
You should see {} around the formula when you are done.

"peter" wrote:

This function still evaluates to a #NUM error.

On Apr 8, 2:22 pm, Barb Reinhardt
wrote:
Try this

=-SUMPRODUCT(--(C2:C200),(C2:C20/SUM(C2:C20)),(LOG(C2:C20/SUM(C2:C20))))




Harlan Grove[_2_]

Shannon diversity index formula
 
"peter" wrote...
This function still evaluates to a #NUM error.

wrote:
=-SUMPRODUCT(--(C2:C200),(C2:C20/SUM(C2:C20)),
(LOG(C2:C20/SUM(C2:C20))))


Barb's formula doesn't work because --(C2:C200) doesn't exclude
calculating LOG(C2:C20/SUM(C2:C20)) for zero values in C2:C20. There's
a fix for that. I've also rearranged terms to take advantage of the
distributive law.

=-SUMPRODUCT(C2:C20,LOG(C2:C20+(C2:C20=0)))/SUM(C2:C20)
+LOG(SUM(C2:C20))


ckblu

Hello!

So it's 2 years later, but I just had the same excel shannon diversity index challenge. I really needed a single cell function because I had 400 separate surveys with lots of zero counts and 400 SDIs to calculate! (so I really really wanted to be able to drag a single function down. After a couple hours of forum hunting, I managed to get this to work. It's a hybrid of things people tried here.

My input data were a little different; organized horizontally. I used a pivot table to autocalculate plot totals (used to calculate Pi), and put them in column AG.

=-SUMPRODUCT((P9:AF9/AG9),LN(P9:AF9/(AG9)+(P9:AF9=0)))


Hope this saves someone some ln(0) misery! :)


All times are GMT +1. The time now is 12:15 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
ExcelBanter.com