| Title: | Diversity Indices with Statistical Inference |
|---|---|
| Description: | Provides a comprehensive framework for analyzing diversity from frequency/abundance count data. Implements a wide range of classical and entropy-based diversity indices, including Berger-Parker, Simpson (and related variants), Shannon, Brillouin, McIntosh, Margalef, Menhinick and Smith-Wilson. Supports permutation-based hypothesis tests for comparing groups with respect to diversity (global and pairwise comparisons), as well as confidence interval estimation using multiple bootstrap methods. Includes functionality for generating diversity profiles based on parametric families such as Hill numbers, Rényi entropy, and Tsallis entropy. The methods are applicable to ecological community data (species abundance counts) and genetic or phenotypic class frequency data. |
| Authors: | J. Aravind [aut, cre] (ORCID: <https://orcid.org/0000-0002-4791-442X>), Suman Roy [aut] (ORCID: <https://orcid.org/0000-0001-5612-8415>), Anju Mahendru Singh [aut] (ORCID: <https://orcid.org/0000-0001-6958-1630>), ICAR-NBGPR [cph] (ROR: <https://ror.org/00scbd467>, url: https://nbpgr.org.in/) |
| Maintainer: | J. Aravind <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1.0.9000 |
| Built: | 2026-05-15 14:01:53 UTC |
| Source: | https://github.com/aravind-j/diversitystats |
This function generates bootstrap resamples using boot
and computes confidence intervals using several standard bootstrap methods
via boot.ci. The indexing for the statistic function is
handled internally.
bootstrap.ci( x, fun, R = 1000, conf = 0.95, na.omit = TRUE, type = c("norm", "basic", "stud", "perc", "bca"), parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus", 1L), cl = NULL, seed = 123, ... )bootstrap.ci( x, fun, R = 1000, conf = 0.95, na.omit = TRUE, type = c("norm", "basic", "stud", "perc", "bca"), parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus", 1L), cl = NULL, seed = 123, ... )
x |
A numeric or factor vector of observations. |
fun |
A function to summarize the observations. |
R |
Integer specifying the number of permutations. Default is 1000. |
conf |
Confidence level of the interval. Default is 0.95. |
na.omit |
logical. If |
type |
A vector of character strings representing the type of intervals
required. The value should be any subset of the values |
parallel |
The type of parallel operation to be used (if any). If missing, the
default is taken from the option |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
An optional parallel or snow cluster for use if
|
seed |
Integer. Random seed used to ensure reproducibility of bootstrap. Default is 123. |
... |
Additional arguments passed to |
Supported interval types include normal approximation, basic, studentized (bootstrap-t), percentile, and bias-corrected and accelerated (BCa) intervals. If a requested interval type cannot be computed (for example, studentized or BCa intervals), the function falls back to percentile intervals.
A a named list of confidence intervals, each containing lower and upper bounds, with additional attributes storing the observed statistic and the mean of the bootstrap replicates.
The default number of bootstrap replicates R = 1000 is
provided for quick exploratory analysis. For more reliable (but slower)
confidence intervals or standard error estimates it is strongly recommended
to increase R to at 5000-10000, depending on your data and required
precision.
library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Conver '#t qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) # NOTE: Increase R to 10000 for more reliable (but slower) estimates. # Bootstrap CIs ---- bootstrap.ci(pdata$NMSR, mean, type = "norm", R = 100) bootstrap.ci(pdata$NMSR, mean, type = "basic", R = 100) bootstrap.ci(pdata$NMSR, mean, type = "perc", R = 100) bootstrap.ci(pdata$NMSR, mean, type = "bca", R = 100) bootstrap.ci(pdata$NMSR, mean, type = c("norm", "basic", "perc", "bca"), R = 100) bootstrap.ci(pdata$LNGS, shannon, type = "norm", R = 100) bootstrap.ci(pdata$PTLC, simpson, type = "basic", R = 100) bootstrap.ci(pdata$LFRT, mcintosh_evenness, type = "perc", R = 100) bootstrap.ci(pdata$LBTEF, mcintosh_diversity, type = "bca", R = 100) bootstrap.ci(pdata$LNGS, shannon, type = c("norm", "basic", "perc", "bca"), R = 100, base = 2) # Studentised intervals require a `fun` returning # variances in addition to an estimate bootstrap.ci(pdata$NMSR, mean, type = "stud", R = 100) stat_fun_mean <- function(x) { est <- mean(x) se <- sd(x) / sqrt(length(x)) out <- c(est, se) # Important : Tells bootstrap.ci to consider second output as SE attr(out, "se") <- TRUE return(out) } bootstrap.ci(pdata$NMSR, stat_fun_mean, type = "stud", R = 100) bootstrap.ci(pdata$DSTA, shannon, type = "stud", R = 100) stat_fun_shannon <- function(x, base = 2) { tab <- tabulate(x) p <- tab / length(x) # Only keep p > 0 to avoid log(0) p <- p[p > 0] est <- -sum(p * log(p, base = base)) # Approximate SE using sqrt(Var(p * log(p))) se <- sqrt(sum((p * log(p, base = base))^2) / length(x)) out <- c(est, se) # Important : Tells bootstrap.ci to consider second output as SE attr(out, "se") <- TRUE return(out) } bootstrap.ci(pdata$DSTA, stat_fun_shannon, type = "stud", R = 100)library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Conver '#t qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) # NOTE: Increase R to 10000 for more reliable (but slower) estimates. # Bootstrap CIs ---- bootstrap.ci(pdata$NMSR, mean, type = "norm", R = 100) bootstrap.ci(pdata$NMSR, mean, type = "basic", R = 100) bootstrap.ci(pdata$NMSR, mean, type = "perc", R = 100) bootstrap.ci(pdata$NMSR, mean, type = "bca", R = 100) bootstrap.ci(pdata$NMSR, mean, type = c("norm", "basic", "perc", "bca"), R = 100) bootstrap.ci(pdata$LNGS, shannon, type = "norm", R = 100) bootstrap.ci(pdata$PTLC, simpson, type = "basic", R = 100) bootstrap.ci(pdata$LFRT, mcintosh_evenness, type = "perc", R = 100) bootstrap.ci(pdata$LBTEF, mcintosh_diversity, type = "bca", R = 100) bootstrap.ci(pdata$LNGS, shannon, type = c("norm", "basic", "perc", "bca"), R = 100, base = 2) # Studentised intervals require a `fun` returning # variances in addition to an estimate bootstrap.ci(pdata$NMSR, mean, type = "stud", R = 100) stat_fun_mean <- function(x) { est <- mean(x) se <- sd(x) / sqrt(length(x)) out <- c(est, se) # Important : Tells bootstrap.ci to consider second output as SE attr(out, "se") <- TRUE return(out) } bootstrap.ci(pdata$NMSR, stat_fun_mean, type = "stud", R = 100) bootstrap.ci(pdata$DSTA, shannon, type = "stud", R = 100) stat_fun_shannon <- function(x, base = 2) { tab <- tabulate(x) p <- tab / length(x) # Only keep p > 0 to avoid log(0) p <- p[p > 0] est <- -sum(p * log(p, base = base)) # Approximate SE using sqrt(Var(p * log(p))) se <- sqrt(sum((p * log(p, base = base))^2) / length(x)) out <- c(est, se) # Important : Tells bootstrap.ci to consider second output as SE attr(out, "se") <- TRUE return(out) } bootstrap.ci(pdata$DSTA, stat_fun_shannon, type = "stud", R = 100)
These are core low-level routines for the computation of multiple diversity
indices, designed to be used by diversity.calc
and other functions.
berger_parker(x) berger_parker_reciprocal(x) simpson(x) gini_simpson(x) simpson_max(x) simpson_reciprocal(x) simpson_relative(x) simpson_evenness(x) shannon(x, base = 2) shannon_max(x, base = 2) shannon_relative(x, base = 2) shannon_ens(x, base = 2) mcintosh_diversity(x) mcintosh_evenness(x) smith_wilson(x, warn = TRUE) heip_evenness(x) margalef_index(x) menhinick_index(x) brillouin_index(x) hill_number(x, q = 1) renyi_entropy(x, q = 1) tsallis_entropy(x, q = 1) hill_evenness(x, q = 1)berger_parker(x) berger_parker_reciprocal(x) simpson(x) gini_simpson(x) simpson_max(x) simpson_reciprocal(x) simpson_relative(x) simpson_evenness(x) shannon(x, base = 2) shannon_max(x, base = 2) shannon_relative(x, base = 2) shannon_ens(x, base = 2) mcintosh_diversity(x) mcintosh_evenness(x) smith_wilson(x, warn = TRUE) heip_evenness(x) margalef_index(x) menhinick_index(x) brillouin_index(x) hill_number(x, q = 1) renyi_entropy(x, q = 1) tsallis_entropy(x, q = 1) hill_evenness(x, q = 1)
x |
A factor vector of categories (e.g., species, traits). The frequency of each level is treated as the abundance of that category. |
base |
The logarithm base to be used for computation of shannon family
of diversity indices. Default is |
warn |
logical. If |
q |
The order of the parametric index. |
The calculated diversity index value.
The function diversity.calc calculates the following diversity
measures
Margalef's Richness Index (\(D_{Margalef}\)) (Margalef 1973)
Menhinick's Index (\(D_{Menhinick}\)) (Menhinick 1964)
Berger–Parker Index (\(D_{BP}\)) (Berger and Parker 1970)
Reciprocal Berger–Parker Index (\(D_{BP_{R}}\)) (Magurran 2011)
Simpson's Index (\(d\)) (Simpson 1949; Peet 1974)
Simpson's Index of Diversity or Gini's Diversity Index or Gini-Simpson Index or Nei's Diversity Index or Nei's Variation Index (\(D\)) or Hurlbert’s probability of interspecific encounter (\(PIE\)) (Gini 1912, 1912; Greenberg 1956; Berger and Parker 1970; Hurlbert 1971; Nei 1973; Peet 1974)
Maximum Simpson's Index of Diversity or Maximum Nei's Diversity/Variation Index (\(D_{max}\)) (Hennink and Zeven 1990)
Simpson's Reciprocal Index or Hill's \(N_{2}\) (\(D_{R}\)) or Effective number of Species (\(ENS_{d}\)) (Williams 1964; Hill 1973)
Relative Simpson's Index of Diversity or Relative Nei's Diversity/Variation Index (\(D'\)) (Hennink and Zeven 1990)
Simpson’s evenness or equitability (\(D_{e}\)) (Pielou 1966; Hill 1973)
Shannon or Shannon-Weaver or Shannon-Wiener Diversity Index or Shannon entropy (\(H\)) (Shannon and Weaver 1949; Peet 1974)
Maximum Shannon-Weaver Diversity Index (\(H_{max}\)) (Hennink and Zeven 1990)
Relative Shannon-Weaver Diversity Index or Shannon Equitability Index (\(H'\)) or Pielou's Evenness (\(J\)) (Pielou 1966; Hennink and Zeven 1990)
Effective number of species for the Shannon - Weaver Diversity Index (\(ENS_{H}\)) or Hill's \(N_{1}\) (Macarthur 1965; Hill 1973)
Heip's Evenness Index (\(E_{Heip}\)) (Heip 1974)
McIntosh Diversity Index (\(D_{Mc}\)) (McIntosh 1967; Peet 1974)
McIntosh Evenness Index (\(E_{Mc}\)) (Pielou 1975)
Smith & Wilson's Evenness Index (\(E_{var}\)) (Smith and Wilson 1996)
Brillouin Diversity Index (\(D_{Brillouin}\)) (Brillouin 2013)
Rényi Entropy (\({}^q H_{Rényi}\)) (Renyi 1960)
Tsallis or HCDT Entropy (\({}^q H_{Tsallis}\)) (Havrda and Charvat 1967; Daroczy 1970; Tsallis 1988)
Hill Numbers (\({}^q D\)) (Hill 1973)
diversity.calc(x, base = exp(1), na.omit = TRUE)diversity.calc(x, base = exp(1), na.omit = TRUE)
x |
A factor vector of categories (e.g., species, traits). The frequency of each level is treated as the abundance of that category. |
base |
The logarithm base to be used for computation of shannon family
of diversity indices. Default is |
na.omit |
logical. If |
A list of the different diversity measures computed.
The number of classes in x or the richness.
Margalef's Richness Index (\(D_{Margalef}\)) (Margalef 1973)
Menhinick's Index (\(D_{Menhinick}\)) (Menhinick 1964)
Berger–Parker Index (\(D_{BP}\)) (Berger and Parker 1970)
Reciprocal Berger–Parker Index (\(D_{BP_{R}}\)) (Magurran 2011)
Simpson's index (\(d\)) (Simpson 1949; Peet 1974)
Simpson's Index of Diversity or Gini's Diversity Index or Gini-Simpson Index or Nei's Diversity Index or Nei's Variation Index (\(D\)) or Hurlbert’s probability of interspecific encounter (\(PIE\)) (Gini 1912, 1912; Greenberg 1956; Berger and Parker 1970; Hurlbert 1971; Nei 1973; Peet 1974)
Maximum Simpson's Index of Diversity or Maximum Nei's Diversity/Variation Index (\(D_{max}\)) (Hennink and Zeven 1990)
Simpson's Reciprocal Index or Hill's \(N_{2}\) (\(D_{R}\)) or Effective number of Species (\(ENS_{d}\)) (Williams 1964; Hill 1973)
Relative Simpson's Index of Diversity or Relative Nei's Diversity/Variation Index (\(D'\)) (Hennink and Zeven 1990)
Simpson’s evenness or equitability (\(D_{e}\)) (Pielou 1966; Hill 1973)
Shannon or Shannon-Weaver or Shannon-Wiener Diversity Index or Shannon entropy (\(H\)) (Shannon and Weaver 1949; Peet 1974)
Maximum Shannon-Weaver Diversity Index (\(H_{max}\)) (Hennink and Zeven 1990)
Relative Shannon-Weaver Diversity Index or Shannon Equitability Index (\(H'\)) or Pielou's Evenness (\(J\)) (Pielou 1966; Hennink and Zeven 1990)
Effective number of species for the Shannon - Weaver Diversity Index (\(ENS_{H}\)) or Hill's \(N_{1}\) (Macarthur 1965; Hill 1973)
Heip's Evenness Index (\(E_{Heip}\)) (Heip 1974)
McIntosh Diversity Index (\(D_{Mc}\)) (McIntosh 1967; Peet 1974)
McIntosh Evenness Index (\(E_{Mc}\)) (Pielou 1975)
Smith & Wilson's Evenness Index (\(E_{var}\)) (Smith and Wilson 1996)
Brillouin Diversity Index (Brillouin 2013)
Rényi Entropy of order 0 (Renyi 1960)
Rényi Entropy of order 1
Rényi Entropy of order 2
Tsallis Entropy of order 0 (Havrda and Charvat 1967; Daroczy 1970; Tsallis 1988)
Tsallis Entropy of order 1
Tsallis Entropy of order 2
Hill Number of order 0 (\({}^0 D\)) (Hill 1973)
Hill Number of order 1 (\({}^1 D\))
Hill Number of order 2 (\({}^2 D\))
The diversity indices implemented in diversity.calc
are computed as follows.
The number of classes of a phenotypic trait (or species richness) (\(k\)) can be described by adjusting for sample size (\(N\)) in Margalef's Richness Index (\(D_{Margalef}\)) (Margalef 1973) and Menhinick's Index (\(D_{Menhinick}\)) (Menhinick 1964)
These are computed as follows.
\[D_{Margalef} = \frac{k - 1}{\ln(N)}\] \[D_{Menhinick} = \frac{k}{\sqrt{N}}\]This is the is the proportion of individuals belonging to the most abundant class in a trait (or species in a community) and is computed as below.
\[D_{BP} = \max(p_i)\]Where, \(p_{i}\) denotes the proportion/fraction/frequency of accessions in the \(i\)th phenotypic class for a trait (or number of individuals in the \(i\)th species).
It's reciprocal estimates the relative diversity of this class.
\[D_{BP_{R}} = \frac{1}{D_{BP}}\]Simpson's index (\(d\)) which estimates the probability that two accessions randomly selected will belong to the same phenotypic class of a trait (or species in a community), is computed as follows (Simpson 1949; Peet 1974).
\[d = \sum_{i = 1}^{k}p_{i}^{2}\]Where, \(k\) is the number of phenotypic classes for the trait (or number of species in the community).
The value of \(d\) can range from 0 to 1 with 0 representing maximum diversity and 1, no diversity.
\(d\) is subtracted from 1 to give Simpson's index of diversity (\(D\)) (Greenberg 1956; Berger and Parker 1970; Hurlbert 1971; Peet 1974; Hennink and Zeven 1990) originally suggested by Gini (1912, 1912) and described in literature as Gini's diversity index or Gini-Simpson index. It is the same as Nei's diversity index or Nei's variation index (Nei 1973; Hennink and Zeven 1990). Greater the value of \(D\), greater the diversity with a range from 0 to 1.
\[D = 1 - d\]The maximum value of \(D\), \(D_{max}\) occurs when accessions are uniformly distributed across the phenotypic classes (or individuals are uniformly distributed across species in a community) and is computed as follows (Hennink and Zeven 1990).
\[D_{max} = 1 - \frac{1}{k}\]Reciprocal of \(d\) gives the Simpson's reciprocal index (\(D_{R}\)) (Williams 1964; Hennink and Zeven 1990) and can range from 1 to \(k\). This was also described in Hill (1973) as \(N_{2}\) or as Effective number of Species (or classes) (\(ENS_{d}\)).
\[D_{R} = \frac{1}{d}\]Relative Simpson's index of diversity or Relative Nei's diversity/variation index (\(H'\)) (Hennink and Zeven 1990) is defined as follows (Peet 1974).
\[D' = \frac{D}{D_{max}}\]Simpson’s evenness or equitability (\(D_{e}\) is described as follows (Pielou 1966; Hill 1973).
\[D_{e} = \frac{1}{d \cdot k}\]An index of information \(H\), was described by Shannon and Weaver (1949) as follows.
\[H = -\sum_{i=1}^{k}p_{i} \log_{2}(p_{i})\]\(H\) is described as Shannon or Shannon-Weaver or Shannon-Wiener diversity index or Shannon entropy in literature (Shannon and Weaver 1949; Peet 1974).
Alternatively, \(H\) is also computed using natural logarithm instead of logarithm to base 2.
\[H = -\sum_{i=1}^{k}p_{i} \ln(p_{i})\]The maximum value of \(H\) (\(H_{max}\)) is \(\ln(k)\). This value occurs when each phenotypic class for a trait has the same proportion of accessions (or each species in a community has the same proportion of individuals) (Hennink and Zeven 1990).
\[H_{max} = \log_{2}(k)\;\; \textrm{OR} \;\; H_{max} = \ln(k)\]The relative Shannon-Weaver diversity index or Shannon equitability index (\(H'\)) or Pielou's Evenness (\(J\)) is the Shannon diversity index (\(I\)) divided by the maximum diversity (\(H_{max}\)) (Pielou 1966; Hennink and Zeven 1990).
\[H' = \frac{H}{H_{max}}\]Macarthur (1965) described the Effective number of species (or classes) for the Shannon index (\(ENS_{H}\)) as follows.
\[ENS_{H} = e^{H}\]Heip’s index or Heip's Evenness index is a transformation of the Shannon–Wiener diversity index that standardizes it relative to number of classes in the trait (or species richness) and is computed as follows (Heip 1974).
\[E_{Heip} = \frac{e^{H} - 1}{k - 1}\]A similar index of diversity was described by McIntosh (1967) as follows (\(D_{Mc}\)) (Peet 1974).
\[D_{Mc} = \frac{N - \sqrt{\sum_{i=1}^{k}n_{i}^2}}{N - \sqrt{N}}\]Where, \(n_{i}\) denotes the number of accessions in the \(i\)th phenotypic class for a trait (or number of individuals in the \(i\)th species in the community) and \(N\) is the total number of accessions so that \(p_{i} = {n_{i}}/{N}\).
An additional measure of evenness was proposed by Pielou (1975) as follows.
\[E_{Mc} = \frac{N - \sqrt{\sum_{i=1}^{k}n_{i}^2}}{N - \frac{N}{\sqrt{S}}}\]This index measures how equally are accessions/genotypes distributed among different trait classes (or individuals individuals are distributed among different species in a community). This is less sensitive to rare classes or species and is computed as follows.
\[E_{var} = 1 - \frac{2}{\pi} \arctan{\left ( \frac{1}{k} \sum_{i=1}^{k}(\ln{n_{i} - \overline{\ln{n}}})^{2} \right )}\]This is an information-theoretic measure appropriate for complete censuses and is computed as follows (Brillouin 2013).
\[H_{B} = \frac{\ln(N!) - \sum \ln(n_i!)}{N}\]Parametric indices, also known as multivariate or compound indices, use a sensitivity parameter (\(q\)) to weigh frequent and rare classes within a trait (or common or rare species within a community).
The Rényi entropy extends several entropy measures, including Shannon entropy, and is computed as follows (Renyi 1960).
\[{}^q H_{Rényi} = \frac{1}{1-q} \ln \sum_{i=1}^{k} p_{i}^{q} , \quad q \ge 0, q \neq 1\]It is more frequently computed using natural logarithm instead of logarithm to base 2. The index is undefined for (\(q = 1\)), but Shannon entropy is as a limiting case.
Tsallis proposed a similar measure, the HCDT or Tsallis entropy (Havrda and Charvat 1967; Daroczy 1970; Tsallis 1988), which matches species richness for \(q = 0\), Shannon entropy \(q = 1\), and the Gini-Simpson index for \(q = 2\).
\[{}^q H_{Tsallis} = \frac{1}{q - 1} \left ( 1 - \sum_{i=1}^{k} p_i^q \right ), \quad q \ge 0, q \neq 1\]Hill showed that species richness, Shannon entropy, and Simpson's index are all related diversity indices, collectively known as Hill numbers which is defined as below (Hill 1973).
\[{}^q D = {\left ( \sum_{i=1}^{k} p_{i}^{q} \right )}^{\frac{1}{1-q}} , \quad q \ge 0, q \neq 1\]Where,
\[{}^0 D = k\] \[{}^1 D = e^{H}\] \[{}^2 D = D_{R}\]Berger WH, Parker FL (1970).
“Diversity of planktonic foraminifera in deep-sea sediments.”
Science, 168(3937), 1345–1347.
Brillouin L (2013).
Science and information theory, Dover edition.
Dover Publications, Inc., Mineola, New York.
ISBN 978-0-486-31641-3.
Daroczy Z (1970).
“Generalized information functions.”
Information and Control, 16(1), 36–51.
Gini C (1912).
Variabilita e Mutabilita. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche. [Fasc. I.].
Tipogr. di P. Cuppini, Bologna.
Gini C (1912).
“Variabilita e mutabilita.”
In Pizetti E, Salvemini T (eds.), Memorie di Metodologica Statistica.
Liberia Eredi Virgilio Veschi, Roma, Italy.
Greenberg JH (1956).
“The measurement of linguistic diversity.”
Language, 32(1), 109.
Havrda J, Charvat F (1967).
“Quantification method of classification processes. Concept of structural α-entropy.”
Kybernetika, 3(1), (30)–35.
Heip C (1974).
“A new index measuring evenness.”
Journal of the Marine Biological Association of the United Kingdom, 54(3), 555–557.
Hennink S, Zeven AC (1990).
“The interpretation of Nei and Shannon-Weaver within population variation indices.”
Euphytica, 51(3), 235–240.
Hill MO (1973).
“Diversity and evenness: A unifying notation and its consequences.”
Ecology, 54(2), 427–432.
Hurlbert SH (1971).
“The nonconcept of species diversity: a critique and alternative parameters.”
Ecology, 52(4), 577–586.
Macarthur RH (1965).
“Patterns of species diversity.”
Biological Reviews, 40(4), 510–533.
Magurran AE (2011).
Measuring biological diversity, 9 [Nachdr.] edition.
Blackwell, Malden, Mass.
ISBN 978-0-632-05633-0.
Margalef R (1973).
“Information theory in ecology.”
International Journal of General Systems, 3, 36–71.
McIntosh RP (1967).
“An index of diversity and the relation of certain concepts to diversity.”
Ecology, 48(3), 392–404.
Menhinick EF (1964).
“A comparison of some species-individuals diversity indices applied to samples of field insects.”
Ecology, 45(4), 859–861.
Nei M (1973).
“Analysis of gene diversity in subdivided populations.”
Proceedings of the National Academy of Sciences, 70(12), 3321–3323.
Peet RK (1974).
“The measurement of species diversity.”
Annual Review of Ecology and Systematics, 5(1), 285–307.
Pielou EC (1966).
“The measurement of diversity in different types of biological collections.”
Journal of Theoretical Biology, 13, 131–144.
Pielou EC (1975).
Ecological diversity.
New York : Wiley.
ISBN 978-0-471-68925-6.
Renyi A (1960).
“On measures of entropy and information.”
In Neyman J (ed.), Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability (June 20-July 30 1960), Volume I: Contributions to the Theory of Statistics, 547–561.
University of California Press.
Shannon CE, Weaver W (1949).
The Mathematical Theory of Communication, number v. 2 in The Mathematical Theory of Communication.
University of Illinois Press.
Simpson EH (1949).
“Measurement of diversity.”
Nature, 163(4148), 688–688.
Smith B, Wilson JB (1996).
“A consumer's guide to evenness indices.”
Oikos, 76(1), 70.
Tsallis C (1988).
“Possible generalization of Boltzmann-Gibbs statistics.”
Journal of Statistical Physics, 52(1-2), 479–487.
Williams CB (1964).
Patterns in the Balance of Nature and Related Problems in Quantitative Ecology.
Academic Press.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Qualitative trait data ---- #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Convert qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) # Get diversity measures diversity.calc(x = pdata$CUAL) # Get diversity measures for multiple qualitative traits div_out1 <- lapply(pdata[, qual], diversity.calc) do.call(rbind, div_out1) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Species abundance data ---- #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ library(vegan) data(dune) abundance_site1 <- unlist(dune[1, ]) abundance_site1[abundance_site1 != 0] # Convert to raw counts for use in diversity.calc using rep abundance_site1_raw <- factor(rep(names(abundance_site1), times = abundance_site1)) # Get diversity measures diversity.calc(x = abundance_site1_raw) # Convert multiple site abundance data to raw counts abundance_site_raw <- apply(dune, 1, function(x) { factor(rep(names(x), times = x)) }) # Get diversity measures for multiple sites div_out2 <- lapply(abundance_site_raw, diversity.calc) do.call(rbind, div_out2)#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Qualitative trait data ---- #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Convert qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) # Get diversity measures diversity.calc(x = pdata$CUAL) # Get diversity measures for multiple qualitative traits div_out1 <- lapply(pdata[, qual], diversity.calc) do.call(rbind, div_out1) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Species abundance data ---- #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ library(vegan) data(dune) abundance_site1 <- unlist(dune[1, ]) abundance_site1[abundance_site1 != 0] # Convert to raw counts for use in diversity.calc using rep abundance_site1_raw <- factor(rep(names(abundance_site1), times = abundance_site1)) # Get diversity measures diversity.calc(x = abundance_site1_raw) # Convert multiple site abundance data to raw counts abundance_site_raw <- apply(dune, 1, function(x) { factor(rep(names(x), times = x)) }) # Get diversity measures for multiple sites div_out2 <- lapply(abundance_site_raw, diversity.calc) do.call(rbind, div_out2)
The function diversity.compare compares diversity indices between
different groups using the following approaches.
Global permutation test across all groups simultaneously.
Pairwise tests between all combinations of groups.
Bootstrap confidence intervals (CI) for each group.
Diversity profiles (Hill, Renyi and Tsallis) for each group.
diversity.compare( x, group, R = 1000, base = exp(1), na.omit = TRUE, global.test = TRUE, pairwise.test = TRUE, bootstrap.ci = TRUE, diversity.profile = TRUE, p.adjust.method = c("bonferroni", "holm"), ci.conf = 0.95, ci.type = c("perc", "bca"), q = seq(0, 3, 0.1), parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, seed = 123 )diversity.compare( x, group, R = 1000, base = exp(1), na.omit = TRUE, global.test = TRUE, pairwise.test = TRUE, bootstrap.ci = TRUE, diversity.profile = TRUE, p.adjust.method = c("bonferroni", "holm"), ci.conf = 0.95, ci.type = c("perc", "bca"), q = seq(0, 3, 0.1), parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, seed = 123 )
x |
A factor vector of categories (e.g., species, traits). The frequency of each level is treated as the abundance of that category. |
group |
A factor vector indicating the group of each observation. Must
have the same length as |
R |
Integer specifying the number of permutations. Default is 1000. |
base |
The logarithm base to be used for computation of shannon family
of diversity indices. Default is |
na.omit |
logical. If |
global.test |
logical. If |
pairwise.test |
logical. If |
bootstrap.ci |
logical. If |
diversity.profile |
logical. If |
p.adjust.method |
(perm.test.pairwise only) Method for adjusting
p-values for multiple comparisons. Options include |
ci.conf |
Confidence level of the bootstrap interval. Default is 0.95. |
ci.type |
A vector of character strings representing the type of
intervals required. The options are |
q |
The order of the parametric index. |
parallel |
The type of parallel operation to be used (if any). If missing, the
default is taken from the option |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
An optional parallel or snow cluster for use if
|
seed |
Integer. Random seed used to ensure reproducibility of permutations and bootstrap. Default is 123. |
A list with the following elements.
A data frame of the different diversity indices computed for each group.
A data frame of results of global permutation test including the test statistic (weighted sum of squares between group summary indices) and the p value for the different diversity indices.
A list of the following data frames.
A data frame of p values for each between group comparison for different diversity measures.
A data frame of compact letter displays of significant differences among groups for different diversity measures.
A data frame of lower and upper bootstrap confidence intervals computed for each group in different diversity measures.
A list of data frames of Hill, Renyi and Tsallis diversity profiles computed for each group.
In small samples with bounded statistics like Shannon Diversity Index and Menhinick index, the bootstrap upper CI can equal the observed value because resamples cannot exceed the theoretical maximum.
Similarly in small samples, the lower confidence bound can be zero because bootstrap resamples occasionally can contain only a single category (class or species), due to sampling uncertainty and the natural lower bound of the diversity index like Shannon Diversity Index.
The BCa bootstrap can produce negative lower confidence limits due to boundary effects and skewness in the resampled distribution.
library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Convert qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) diversity.compare(x = pdata$CUAL, group = pdata$LNGS, R = 100, base = exp(1), na.omit = TRUE) diversity.compare(x = pdata$ANGB, group = pdata$LNGS, R = 100, base = exp(1), na.omit = TRUE)library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Convert qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) diversity.compare(x = pdata$CUAL, group = pdata$LNGS, R = 100, base = exp(1), na.omit = TRUE) diversity.compare(x = pdata$ANGB, group = pdata$LNGS, R = 100, base = exp(1), na.omit = TRUE)
Computes diversity profiles across a continuous range of sensitivity
parameter (The order q) using parametric diversity indices such as
Hill numbers, Rényi entropy, and Tsallis entropy. The function also supports
the generation of multiple profiles to enable comparisons among groups. It
provides flexible options for generation of bootstrap confidence intervals
for the values.
diversity.profile( x, group, q = seq(0, 3, 0.1), na.omit = TRUE, ci.conf = 0.95, R = 1000, parameter = c("hill", "renyi", "tsallis"), ci.type = c("perc", "bca"), parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus", 1L), cl = NULL, seed = 123 )diversity.profile( x, group, q = seq(0, 3, 0.1), na.omit = TRUE, ci.conf = 0.95, R = 1000, parameter = c("hill", "renyi", "tsallis"), ci.type = c("perc", "bca"), parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus", 1L), cl = NULL, seed = 123 )
x |
A numeric or factor vector of observations. |
group |
A factor vector indicating the group of each observation. Must
have the same length as |
q |
The order of the parametric index. |
na.omit |
logical. If |
ci.conf |
Confidence level of the bootstrap interval. Default is 0.95. |
R |
Integer specifying the number of permutations. Default is 1000. |
parameter |
The parametric index. Options include |
ci.type |
A vector of character strings representing the type of
intervals required. The options are |
parallel |
The type of parallel operation to be used (if any). If missing, the
default is taken from the option |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
An optional parallel or snow cluster for use if
|
seed |
Integer. Random seed used to ensure reproducibility of bootstrap. Default is 123. |
See Parametric Indices in diversity.calc for
theoretical details.
A list of data frames with the following columns for each factor
level in group.
The default number of bootstrap replicates R = 1000 is
provided for quick exploratory analysis. For more reliable (but slower)
confidence intervals or standard error estimates it is strongly recommended
to increase R to at 5000-10000, depending on your data and required
precision.
library(EvaluateCore) library(dplyr) library(ggplot2) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Convert qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) important_q <- c(0, 1, 2) important_labels <- c("0D", "1D", "2D") # NOTE: Increase R to 10000 for more reliable (but slower) estimates. # Hill profile - Percentile CIs ---- hill_profile1 <- diversity.profile(x = pdata$CUAL, group = pdata$LNGS, parameter = "hill", ci.type = "perc", R = 100) hill_profile1 hill_profile1_df <- dplyr::bind_rows(hill_profile1, .id = "group") hill_points1_df <- hill_profile1_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(hill_profile1_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = hill_points1_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(hill_profile1_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Rényi profile - Percentile CIs ---- renyi_profile1 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "renyi", ci.type = "perc", R = 100) renyi_profile1 renyi_profile1_df <- dplyr::bind_rows(renyi_profile1, .id = "group") renyi_points1_df <- renyi_profile1_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(renyi_profile1_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = renyi_points1_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(renyi_profile1_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Tsallis profile - Percentile CIs ---- tsallis_profile1 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "tsallis", ci.type = "perc", R = 100) tsallis_profile1 tsallis_profile1_df <- dplyr::bind_rows(tsallis_profile1, .id = "group") tsallis_points1_df <- tsallis_profile1_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(tsallis_profile1_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = tsallis_points1_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(tsallis_profile1_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Hill profile - BCa CIs ---- hill_profile2 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "hill", ci.type = "bca", R = 100) hill_profile2 hill_profile2_df <- dplyr::bind_rows(hill_profile2, .id = "group") hill_points2_df <- hill_profile2_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(hill_profile2_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = hill_points2_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(hill_profile2_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Rényi profile - BCa CIs ---- renyi_profile2 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "renyi", ci.type = "bca", R = 100) renyi_profile2 renyi_profile2_df <- dplyr::bind_rows(renyi_profile2, .id = "group") renyi_points2_df <- renyi_profile2_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(renyi_profile2_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = renyi_points2_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(renyi_profile2_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Tsallis profile - BCa CIs ---- tsallis_profile2 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "tsallis", ci.type = "bca", R = 100) tsallis_profile2 tsallis_profile2_df <- dplyr::bind_rows(tsallis_profile2, .id = "group") tsallis_points2_df <- tsallis_profile2_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(tsallis_profile2_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = tsallis_points2_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(tsallis_profile2_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw()library(EvaluateCore) library(dplyr) library(ggplot2) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Convert qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) important_q <- c(0, 1, 2) important_labels <- c("0D", "1D", "2D") # NOTE: Increase R to 10000 for more reliable (but slower) estimates. # Hill profile - Percentile CIs ---- hill_profile1 <- diversity.profile(x = pdata$CUAL, group = pdata$LNGS, parameter = "hill", ci.type = "perc", R = 100) hill_profile1 hill_profile1_df <- dplyr::bind_rows(hill_profile1, .id = "group") hill_points1_df <- hill_profile1_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(hill_profile1_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = hill_points1_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(hill_profile1_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Rényi profile - Percentile CIs ---- renyi_profile1 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "renyi", ci.type = "perc", R = 100) renyi_profile1 renyi_profile1_df <- dplyr::bind_rows(renyi_profile1, .id = "group") renyi_points1_df <- renyi_profile1_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(renyi_profile1_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = renyi_points1_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(renyi_profile1_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Tsallis profile - Percentile CIs ---- tsallis_profile1 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "tsallis", ci.type = "perc", R = 100) tsallis_profile1 tsallis_profile1_df <- dplyr::bind_rows(tsallis_profile1, .id = "group") tsallis_points1_df <- tsallis_profile1_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(tsallis_profile1_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = tsallis_points1_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(tsallis_profile1_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Hill profile - BCa CIs ---- hill_profile2 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "hill", ci.type = "bca", R = 100) hill_profile2 hill_profile2_df <- dplyr::bind_rows(hill_profile2, .id = "group") hill_points2_df <- hill_profile2_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(hill_profile2_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = hill_points2_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(hill_profile2_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Rényi profile - BCa CIs ---- renyi_profile2 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "renyi", ci.type = "bca", R = 100) renyi_profile2 renyi_profile2_df <- dplyr::bind_rows(renyi_profile2, .id = "group") renyi_points2_df <- renyi_profile2_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(renyi_profile2_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = renyi_points2_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(renyi_profile2_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw() # Tsallis profile - BCa CIs ---- tsallis_profile2 <- diversity.profile(pdata$CUAL, group = pdata$LNGS, parameter = "tsallis", ci.type = "bca", R = 100) tsallis_profile2 tsallis_profile2_df <- dplyr::bind_rows(tsallis_profile2, .id = "group") tsallis_points2_df <- tsallis_profile2_df %>% filter(q %in% important_q) %>% mutate(order_label = factor(q, levels = important_q, labels = important_labels)) ggplot(tsallis_profile2_df, aes(x = q, y = observed, color = group, fill = group)) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) + geom_line(linewidth = 1) + geom_vline(xintercept = c(0, 1, 2), linetype = "dashed", color = "grey60") + geom_point(data = tsallis_points2_df, aes(shape = order_label), size = 3, stroke = 1, inherit.aes = TRUE) + scale_shape_manual(values = c(17, 19, 15), name = "Important q") + labs(x = "Order (q)", y = "Hill number", color = "Group", fill = "Group") + theme_bw() ggplot(tsallis_profile2_df, aes(x = q, y = observed)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey80") + geom_line(color = "black", linewidth = 1) + facet_wrap(~ group, scales = "free_y") + labs(x = "Order (q)", y = "Hill number") + theme_bw()
These functions perform permutation-based hypothesis tests to compare groups with respect to a summary statistic (e.g., mean, diversity index).
perm.test.global performs a global test across all
groups simultaneously, using a weighted sum of squares between group summary
indices as the test statistic
perm.test.pairwise performs
pairwise tests between all combinations of groups, comparing the absolute
difference in summary statistics and optionally adjusting p-values for
multiple comparisons.
perm.test.global( x, group, fun, R = 1000, na.omit = TRUE, fun.args = list(), max.invalid = 0.25, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, seed = 123 ) perm.test.pairwise( x, group, fun, R = 1000, na.omit = TRUE, fun.args = list(), p.adjust.method = c("bonferroni", "holm"), max.invalid = 0.25, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, seed = 123 )perm.test.global( x, group, fun, R = 1000, na.omit = TRUE, fun.args = list(), max.invalid = 0.25, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, seed = 123 ) perm.test.pairwise( x, group, fun, R = 1000, na.omit = TRUE, fun.args = list(), p.adjust.method = c("bonferroni", "holm"), max.invalid = 0.25, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, seed = 123 )
x |
A numeric or factor vector of observations. |
group |
A factor vector indicating the group of each observation. Must
have the same length as |
fun |
A function to summarize values within each group. |
R |
Integer specifying the number of permutations. Default is 1000. |
na.omit |
logical. If |
fun.args |
Named list of additional arguments forwarded to 'fun'. |
max.invalid |
Numeric between 0 and 1. Maximum allowed proportion of invalid permutations (i.e., permutations for which the test statistic is non-finite). If the proportion of invalid permutations exceeds this threshold, the function stops execution with an error, indicating that the statistic function is not permutation-stable. |
parallel |
The type of parallel operation to be used (if any). If missing, the
default is taken from the option |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
An optional parallel or snow cluster for use if
|
seed |
Integer. Random seed used to ensure reproducibility of permutations. Default is 123. |
p.adjust.method |
(perm.test.pairwise only) Method for adjusting
p-values for multiple comparisons. Options include |
perm.test.globalA list of the following elements.
The test statistic value.
The observed values for each group.
The p value.
perm.test.pairwiseA data frame of the following columns.
The comparison.
The p value.
The adjusted p value.
library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Conver '#t qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) # NOTE: Increase R to 10000 for more reliable (but slower) estimates. # Global tests ---- perm.test.global(x = pdata$NMSR, group = pdata$CUAL, fun = mean, R = 100) perm.test.global(x = pdata$LNGS, group = pdata$CUAL, fun = shannon, R = 100) perm.test.global(x = pdata$PTLC, group = pdata$CUAL, fun = simpson, R = 100) # Pairwise tests ---- perm.test.pairwise(x = pdata$NMSR, group = pdata$CUAL, fun = mean, R = 100) perm.test.pairwise(x = pdata$LNGS, group = pdata$CUAL, fun = shannon, R = 100) perm.test.pairwise(x = pdata$PTLC, group = pdata$CUAL, fun = simpson, R = 100)library(EvaluateCore) pdata <- cassava_CC qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") # Conver '#t qualitative data columns to factor pdata[, qual] <- lapply(pdata[, qual], as.factor) str(pdata) # NOTE: Increase R to 10000 for more reliable (but slower) estimates. # Global tests ---- perm.test.global(x = pdata$NMSR, group = pdata$CUAL, fun = mean, R = 100) perm.test.global(x = pdata$LNGS, group = pdata$CUAL, fun = shannon, R = 100) perm.test.global(x = pdata$PTLC, group = pdata$CUAL, fun = simpson, R = 100) # Pairwise tests ---- perm.test.pairwise(x = pdata$NMSR, group = pdata$CUAL, fun = mean, R = 100) perm.test.pairwise(x = pdata$LNGS, group = pdata$CUAL, fun = shannon, R = 100) perm.test.pairwise(x = pdata$PTLC, group = pdata$CUAL, fun = simpson, R = 100)