Title: | Quality Evaluation of Core Collections |
---|---|
Description: | Implements various quality evaluation statistics to assess the value of plant germplasm core collections using qualitative and quantitative phenotypic trait data according to Odong et al. (2015) <doi:10.1007/s00122-012-1971-y>. |
Authors: | J. Aravind [aut, cre] , Vikender Kaur [aut] , Dhammaprakash Pandhari Wankhede [aut] , Joghee Nanjundan [aut] , ICAR-NBGPR [cph] (www.nbpgr.ernet.in) |
Maintainer: | J. Aravind <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.1.3.9000 |
Built: | 2024-11-16 05:36:27 UTC |
Source: | https://github.com/aravind-j/evaluatecore |
Plot Bar plots to graphically compare the frequency distributions of qualitative traits between entire collection (EC) and core set (CS).
bar.evaluate.core(data, names, qualitative, selected)
bar.evaluate.core(data, names, qualitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A list with the ggplot
objects of relative frequency bar plots
of CS and EC for each trait specified as qualitative
.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) bar.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) bar.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
Plot Box-and-Whisker plots (Tukey 1970; McGill et al. 1978) to graphically compare the probability distributions of quantitative traits between entire collection (EC) and core set (CS).
box.evaluate.core(data, names, quantitative, selected)
box.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A list with the ggplot
objects of box plots of CS and EC for
each trait specified as quantitative
.
McGill R, Tukey JW, Larsen WA (1978).
“Variations of box plots.”
The American Statistician, 32(1), 12.
Tukey JW (1970).
Exploratory Data Analysis. Preliminary edition.
Addison-Wesley.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) box.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) box.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
An example germplasm characterisation data of a core collection generated
from 1591 accessions of IITA Cassava collection
(International Institute of Tropical Agriculture et al. 2019) using 10 quantitative and 48
qualitative trait data with CoreHunter3
(corehunter
). The core set was generated using
distance based measures giving equal weightage to Average
entry-to-nearest-entry distance (EN) and Average accession-to-nearest-entry
distance (AN). Includes data on 26 descriptors for 168 (10 % of
cassava_EC
) accessions. It is used to demonstrate
the various functions of EvaluateCore
package.
cassava_CC
cassava_CC
A data frame with 58 columns:
Colour of unexpanded apical leaves
Length of stipules
Petiole colour
Distribution of anthocyanin
Leaf retention
Level of branching at the end of flowering
Colour of boiled tuberous root
Number of levels of branching
Angle of branching
Colours of unexpanded apical leaves at 9 months
Leaf vein colour at 9 months
Total number of plants remaining per accession at 9 months
Petiole length at 9 months
Storage root peduncle
Storage root constrictions
Position of root
Number of storage root per plant
Total root number per plant
Total fresh weight of storage root per plant
Total root weight per plant
Total fresh weight of storage shoot per plant
Total shoot weight per plant
Total plant weight
Average plant weight
Amount of rotted storage root per plant
Storage root dry matter
Further details on how the example dataset was built from the original data is available online.
International Institute of Tropical Agriculture, Benjamin F, Marimagne T (2019). “Cassava morphological characterization. Version 2018.1.” www.genesys-pgr.org.
data(cassava_CC) summary(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") lapply(seq_along(cassava_CC[, qual]), function(i) barplot(table(cassava_CC[, qual][, i]), xlab = names(cassava_CC[, qual])[i])) lapply(seq_along(cassava_CC[, quant]), function(i) hist(table(cassava_CC[, quant][, i]), xlab = names(cassava_CC[, quant])[i], main = ""))
data(cassava_CC) summary(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") lapply(seq_along(cassava_CC[, qual]), function(i) barplot(table(cassava_CC[, qual][, i]), xlab = names(cassava_CC[, qual])[i])) lapply(seq_along(cassava_CC[, quant]), function(i) hist(table(cassava_CC[, quant][, i]), xlab = names(cassava_CC[, quant])[i], main = ""))
An example germplasm characterisation data of a subset of IITA Cassava
collection (International Institute of Tropical Agriculture et al. 2019). Includes data on
26 (out of 62) descriptors for 1684 (out of 2170) accessions. It is used to
demonstrate the various functions of EvaluateCore
package.
cassava_EC
cassava_EC
A data frame with 58 columns:
Colour of unexpanded apical leaves
Length of stipules
Petiole colour
Distribution of anthocyanin
Leaf retention
Level of branching at the end of flowering
Colour of boiled tuberous root
Number of levels of branching
Angle of branching
Colours of unexpanded apical leaves at 9 months
Leaf vein colour at 9 months
Total number of plants remaining per accession at 9 months
Petiole length at 9 months
Storage root peduncle
Storage root constrictions
Position of root
Number of storage root per plant
Total root number per plant
Total fresh weight of storage root per plant
Total root weight per plant
Total fresh weight of storage shoot per plant
Total shoot weight per plant
Total plant weight
Average plant weight
Amount of rotted storage root per plant
Storage root dry matter
Further details on how the example dataset was built from the original data is available online.
International Institute of Tropical Agriculture, Benjamin F, Marimagne T (2019). “Cassava morphological characterization. Version 2018.1.” www.genesys-pgr.org.
data(cassava_EC) summary(cassava_EC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") lapply(seq_along(cassava_EC[, qual]), function(i) barplot(table(cassava_EC[, qual][, i]), xlab = names(cassava_EC[, qual])[i])) lapply(seq_along(cassava_EC[, quant]), function(i) hist(table(cassava_EC[, quant][, i]), xlab = names(cassava_EC[, quant])[i], main = ""))
data(cassava_EC) summary(cassava_EC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") lapply(seq_along(cassava_EC[, qual]), function(i) barplot(table(cassava_EC[, qual][, i]), xlab = names(cassava_EC[, qual])[i])) lapply(seq_along(cassava_EC[, quant]), function(i) hist(table(cassava_EC[, quant][, i]), xlab = names(cassava_EC[, quant])[i], main = ""))
Compare the distribution frequencies of qualitative traits between entire collection (EC) and core set (CS) by Chi-squared test for homogeneity (Pearson 1900; Snedecor and Irwin 1933).
chisquare.evaluate.core(data, names, qualitative, selected)
chisquare.evaluate.core(data, names, qualitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A a data frame with the following columns.
Trait |
The qualitative trait. |
EC_No.Classes |
The number of classes in the trait for EC. |
EC_Classes |
The frequency of the classes in the trait for EC. |
CS_No.Classes |
The number of classes in the trait for CS. |
CS_Classes |
The frequency of the classes in the trait for CS. |
chisq_statistic |
The \(\chi^{2}\) test statistic. |
chisq_pvalue |
The p value for the test statistic. |
chisq_significance |
The significance of the test statistic (*: p \(\leq\) 0.01; **: p \(\leq\) 0.05; ns: p \( > \) 0.05). |
Pearson K (1900).
“X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.”
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175.
Snedecor G, Irwin MR (1933).
“On the chi-square test for homogeneity.”
Iowa State College Journal of Science, 8, 75–81.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) chisquare.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) chisquare.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
Compute phenotypic correlations (Pearson 1895) between traits, plot correlation matrices as correlograms (Friendly 2002) and calculate mantel correlation (Legendre and Legendre 2012) between them to compare entire collection (EC) and core set (CS).
corr.evaluate.core(data, names, quantitative, qualitative, selected)
corr.evaluate.core(data, names, quantitative, qualitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A list with the following components.
Correlation Matrix |
The matrix with phenotypic correlations between traits in EC (below diagonal) and CS (above diagonal). |
Correologram |
A correlogram of phenotypic
correlations between traits in EC (below diagonal) and CS (above diagonal)
as a |
Mantel Correlation |
A data frame with Mantel correlation coefficient (\(r\)) between EC and CS phenotypic correlation matrices, it's p value and significance (*: p \(\leq\) 0.01; **: p \(\leq\) 0.05; ns: p \( > \) 0.05). |
Friendly M (2002).
“Corrgrams.”
The American Statistician, 56(4), 316–324.
Legendre P, Legendre L (2012).
“Interpretation of ecological structures.”
In Developments in Environmental Modelling, volume 24, 521–624.
Elsevier.
Pearson K (1895).
“Note on regression and inheritance in the case of two parents.”
Proceedings of the Royal Society of London, 58, 240–242.
cor
,
cor_pmat
ggcorrplot
, mantel
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) corr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) corr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core)
Compute the Class Coverage (Kim et al. 2007) to compare the distribution frequencies of qualitative traits between entire collection (EC) and core set (CS).
coverage.evaluate.core(data, names, qualitative, selected)
coverage.evaluate.core(data, names, qualitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
Class Coverage (Kim et al. 2007) is computed as follows.
\[Class\, Coverage = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{k_{CS_{i}}}{k_{EC_{i}}} \right ) \times 100\]Where, \(k_{CS_{i}}\) is the number of phenotypic classes in CS for the \(i\)th trait, \(k_{EC_{i}}\) is the number of phenotypic classes in EC for the \(i\)th trait and \(n\) is the total number of traits.
The Class Coverage value.
Kim K, Chung H, Cho G, Ma K, Chandrabalan D, Gwag J, Kim T, Cho E, Park Y (2007). “PowerCore: A program applying the advanced M strategy with a heuristic search for establishing core sets.” Bioinformatics, 23(16), 2155–2162.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) coverage.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) coverage.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
Compute the following metrics to compare quantitative traits of the entire collection (EC) and core set (CS).
Coincidence Rate of Range (\(CR\)) (Hu et al. 2000) (originally described by (Diwan et al. 1995) as Mean range ratio)
Changeable Rate of Maximum (\(CR_{\max}\)) (Wang et al. 2007)
Changeable Rate of Minimum (\(CR_{\min}\)) (Wang et al. 2007)
Changeable Rate of Mean (\(CR_{\mu}\)) (Wang et al. 2007)
cr.evaluate.core(data, names, quantitative, selected)
cr.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
The Coincidence Rate of Range (\(CR\)) is computed as follows.
\[CR = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{R_{CS_{i}}}{R_{EC_{i}}} \right ) \times 100\]Where, \(R_{CS_{i}}\) is the range of the \(i\)th trait in the CS, \(R_{EC_{i}}\) is the range of the \(i\)th trait in the EC and \(n\) is the total number of traits.
A representative CS should have a \(CR\) value no less than 70% (Diwan et al. 1995) or 80% (Hu et al. 2000).
The Changeable Rate of Maximum (\(CR_{\max}\)) is computed as follows.
\[CR_{\max} = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{\max_{CS_{i}}}{\max_{EC_{i}}} \right ) \times 100\]Where, \(\max_{CS_{i}}\) is the maximum value of the \(i\)th trait in the CS, \(\max_{EC_{i}}\) is the maximum value of the \(i\)th trait in the EC and \(n\) is the total number of traits.
The Changeable Rate of Minimum (\(CR_{\min}\)) is computed as follows.
\[CR_{\min} = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{\min_{CS_{i}}}{\min_{EC_{i}}} \right ) \times 100\]Where, \(\min_{CS_{i}}\) is the minimum value of the \(i\)th trait in the CS, \(\min_{EC_{i}}\) is the minimum value of the \(i\)th trait in the EC and \(n\) is the total number of traits.
The Changeable Rate of Mean (\(CR_{\mu}\)) is computed as follows.
\[CR_{\mu} = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{\mu_{CS_{i}}}{\mu_{EC_{i}}} \right ) \times 100\]Where, \(\mu_{CS_{i}}\) is the mean value of the \(i\)th trait in the CS, \(\mu_{EC_{i}}\) is the mean value of the \(i\)th trait in the EC and \(n\) is the total number of traits.
The \(CR\) value.
NaN
or Inf
values for \[CR_{\min}\] occurs when the
minimum values for some of the traits are zero.
Diwan N, McIntosh MS, Bauchan GR (1995).
“Methods of developing a core collection of annual Medicago species.”
Theoretical and Applied Genetics, 90(6), 755–761.
Hu J, Zhu J, Xu HM (2000).
“Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops.”
Theoretical and Applied Genetics, 101(1), 264–268.
Wang J, Hu J, Zhang C, Zhang S (2007).
“Assessment on evaluating parameters of rice core collections constructed by genotypic values and molecular marker information.”
Rice Science, 14(2), 101–110.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) cr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) cr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Compute average Entry-to-nearest-entry distance (\(E\text{-}EN\)), Accession-to-nearest-entry distance (\(E\text{-}EN\)) and Entry-to-entry distance (\(E\text{-}EN\)) (Odong et al. 2013) to evaluate a core set (CS) selected from an entire collection (EC).
dist.evaluate.core(data, names, quantitative, qualitative, selected, d = NULL)
dist.evaluate.core(data, names, quantitative, qualitative, selected, d = NULL)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
d |
A distance matrix of class " |
A data frame with the average values of \(E\text{-}EN\), \(E\text{-}EN\) and \(E\text{-}EN\).
Gower JC (1971).
“A general coefficient of similarity and some of its properties.”
Biometrics, 27(4), 857–871.
Odong TL, Jansen J, van Eeuwijk FA, van Hintum TJL (2013).
“Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation.”
Theoretical and Applied Genetics, 126(2), 289–305.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) dist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core) #################################### # Compare with corehunter #################################### library(corehunter) # Prepare phenotype dataset dtype <- c(rep("RD", length(quant)), rep("NS", length(qual))) rownames(ec) <- ec[, "genotypes"] ecdata <- corehunter::phenotypes(data = ec[, c(quant, qual)], types = dtype) # Compute average distances EN <- evaluateCore(core = rownames(cassava_CC), data = ecdata, objective = objective("EN", "GD")) AN <- evaluateCore(core = rownames(cassava_CC), data = ecdata, objective = objective("AN", "GD")) EE <- evaluateCore(core = rownames(cassava_CC), data = ecdata, objective = objective("EE", "GD")) EN AN EE
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) dist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core) #################################### # Compare with corehunter #################################### library(corehunter) # Prepare phenotype dataset dtype <- c(rep("RD", length(quant)), rep("NS", length(qual))) rownames(ec) <- ec[, "genotypes"] ecdata <- corehunter::phenotypes(data = ec[, c(quant, qual)], types = dtype) # Compute average distances EN <- evaluateCore(core = rownames(cassava_CC), data = ecdata, objective = objective("EN", "GD")) AN <- evaluateCore(core = rownames(cassava_CC), data = ecdata, objective = objective("AN", "GD")) EE <- evaluateCore(core = rownames(cassava_CC), data = ecdata, objective = objective("EE", "GD")) EN AN EE
Compute the following diversity indices and perform corresponding statistical tests to compare the phenotypic diversity for qualitative traits between entire collection (EC) and core set (CS).
Simpson's and related indices
Simpson's Index (\(d\)) (Simpson 1949; Peet 1974)
Simpson's Index of Diversity or Gini's Diversity Index or Gini-Simpson Index or Nei's Diversity Index or Nei's Variation Index (\(D\)) (Gini 1912, 1912; Greenberg 1956; Berger and Parker 1970; Nei 1973; Peet 1974)
Maximum Simpson's Index of Diversity or Maximum Nei's Diversity/Variation Index (\(D_{max}\)) (Hennink and Zeven 1990)
Simpson's Reciprocal Index or Hill's \(N_{2}\) (\(D_{R}\)) (Williams 1964; Hill 1973)
Relative Simpson's Index of Diversity or Relative Nei's Diversity/Variation Index (\(D'\)) (Hennink and Zeven 1990)
Shannon-Weaver and related indices
Shannon or Shannon-Weaver or Shannon-Weiner Diversity Index (\(H\)) (Shannon and Weaver 1949; Peet 1974)
Maximum Shannon-Weaver Diversity Index (\(H_{max}\)) (Hennink and Zeven 1990)
Relative Shannon-Weaver Diversity Index or Shannon Equitability Index (\(H'\)) (Hennink and Zeven 1990)
McIntosh Diversity Index
McIntosh Diversity Index (\(D_{Mc}\)) (McIntosh 1967; Peet 1974)
diversity.evaluate.core(data, names, qualitative, selected, base = 2, R = 1000)
diversity.evaluate.core(data, names, qualitative, selected, base = 2, R = 1000)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
base |
The logarithm base to be used for computation of Shannon-Weaver Diversity Index (\(I\)). Default is 2. |
R |
The number of bootstrap replicates. Default is 1000. |
A list with three data frames as follows.
simpson |
|
shannon |
|
mcintosh |
|
The diversity indices and the corresponding statistical
tests implemented in diversity.evaluate.core
are as follows.
Simpson's index (\(d\)) which estimates the probability that two accessions randomly selected will belong to the same phenotypic class of a trait, is computed as follows (Simpson 1949; Peet 1974).
\[d = \sum_{i = 1}^{k}p_{i}^{2}\]Where, \(p_{i}\) denotes the proportion/fraction/frequency of accessions in the \(i\)th phenotypic class for a trait and \(k\) is the number of phenotypic classes for the trait.
The value of \(d\) can range from 0 to 1 with 0 representing maximum diversity and 1, no diversity.
\(d\) is subtracted from 1 to give Simpson's index of diversity (\(D\)) (Greenberg 1956; Berger and Parker 1970; Peet 1974; Hennink and Zeven 1990) originally suggested by Gini (1912, 1912) and described in literature as Gini's diversity index or Gini-Simpson index. It is the same as Nei's diversity index or Nei's variation index (Nei 1973; Hennink and Zeven 1990). Greater the value of \(D\), greater the diversity with a range from 0 to 1.
\[D = 1 - d\]The maximum value of \(D\), \(D_{max}\) occurs when accessions are uniformly distributed across the phenotypic classes and is computed as follows (Hennink and Zeven 1990).
\[D_{max} = 1 - \frac{1}{k}\]Reciprocal of \(d\) gives the Simpson's reciprocal index (\(D_{R}\)) (Williams 1964; Hennink and Zeven 1990) and can range from 1 to \(k\). This was also described in Hill (1973) as (\(N_{2}\)).
\[D_{R} = \frac{1}{d}\]Relative Simpson's index of diversity or Relative Nei's diversity/variation index (\(H'\)) (Hennink and Zeven 1990) is defined as follows (Peet 1974).
\[D' = \frac{D}{D_{max}}\]Differences in Simpson's diversity index for qualitative traits of EC and CS can be tested by a t-test using the associated variance estimate described in Simpson (1949) (Lyons and Hutcheson 1978).
The t statistic is computed as follows.
\[t = \frac{d_{EC} - d_{CS}}{\sqrt{V_{d_{EC}} + V_{d_{CS}}}}\]Where, the variance of \(d\) (\(V_{d}\)) is,
\[V_{d} = \frac{4N(N-1)(N-2)\sum_{i=1}^{k}(p_{i})^{3} + 2N(N-1)\sum_{i=1}^{k}(p_{i})^{2} - 2N(N-1)(2N-3) \left( \sum_{i=1}^{k}(p_{i})^{2} \right)^{2}}{[N(N-1)]^{2}}\]The associated degrees of freedom is computed as follows.
\[df = (k_{EC} - 1) + (k_{CS} - 1)\]Where, \(k_{EC}\) and \(k_{CS}\) are the number of phenotypic classes in the trait for EC and CS respectively.
An index of information \(H\), was described by Shannon and Weaver (1949) as follows.
\[H = -\sum_{i=1}^{k}p_{i} \log_{2}(p_{i})\]\(H\) is described as Shannon or Shannon-Weaver or Shannon-Weiner diversity index in literature.
Alternatively, \(H\) is also computed using natural logarithm instead of logarithm to base 2.
\[H = -\sum_{i=1}^{k}p_{i} \ln(p_{i})\]The maximum value of \(H\) (\(H_{max}\)) is \(\ln(k)\). This value occurs when each phenotypic class for a trait has the same proportion of accessions.
\[H_{max} = \log_{2}(k)\;\; \textrm{OR} \;\; H_{max} = \ln(k)\]The relative Shannon-Weaver diversity index or Shannon equitability index (\(H'\)) is the Shannon diversity index (\(I\)) divided by the maximum diversity (\(H_{max}\)).
\[H' = \frac{H}{H_{max}}\]Differences in Shannon-Weaver diversity index for qualitative traits of EC and CS can be tested by Hutcheson t-test (Hutcheson 1970).
The Hutcheson t statistic is computed as follows.
\[t = \frac{H_{EC} - H_{CS}}{\sqrt{V_{H_{EC}} + V_{H_{CS}}}}\]Where, the variance of \(H\) (\(V_{H}\)) is,
\[V_{H} = \frac{\sum_{i=1}^{k}n_{i}(\log_{2}{n_{i}})^{2} \frac{(\sum_{i=1}^{k}\log_{2}{n_{i}})^2}{N}}{N^{2}}\] \[\textrm{OR}\] \[V_{H} = \frac{\sum_{i=1}^{k}n_{i}(\ln{n_{i}})^{2} \frac{(\sum_{i=1}^{k}\ln{n_{i}})^2}{N}}{N^{2}}\]The associated degrees of freedom is approximated as follows.
\[df = \frac{(V_{H_{EC}} + V_{H_{CS}})^{2}}{\frac{V_{H_{EC}}^{2}}{N_{EC}} + \frac{V_{H_{CS}}^{2}}{N_{CS}}}\]A similar index of diversity was described by McIntosh (1967) as follows (\(D_{Mc}\)) (Peet 1974).
\[D_{Mc} = \frac{N - \sqrt{\sum_{i=1}^{k}n_{i}^2}}{N - \sqrt{N}}\]Where, \(n_{i}\) denotes the number of accessions in the \(i\)th phenotypic class for a trait and \(N\) is the total number of accessions so that \(p_{i} = {n_{i}}/{N}\).
Bootstrap statistics are employed to test the difference between the Simpson, Shannon-Weaver and McIntosh indices for qualitative traits of EC and CS (Solow 1993).
If \(I_{EC}\) and \(I_{CS}\) are the diversity indices with the original number of accessions, then random samples of the same size as the original are repeatedly generated (with replacement) \(R\) times and the corresponding diversity index is computed for each sample.
\[I_{EC}^{*} = \lbrace H_{EC_{1}}, H_{EC_{}}, \cdots, H_{EC_{R}} \rbrace\] \[I_{CS}^{*} = \lbrace H_{CS_{1}}, H_{CS_{}}, \cdots, H_{CS_{R}} \rbrace\]Then the bootstrap null sample \(I_{0}\) is computed as follows.
\[\Delta^{*} = I_{EC}^{*} - I_{CS}^{*}\] \[I_{0} = \Delta^{*} - \overline{\Delta^{*}}\]Where, \(\overline{\Delta^{*}}\) is the mean of \(\Delta^{*}\).
Now the original difference in diversity indices (\(\Delta_{0} = I_{EC} - I_{CS}\)) is tested against mean of bootstrap null sample (\(I_{0}\)) by a z test. The z score test statistic is computed as follows.
\[z = \frac{\Delta_{0} - \overline{H_{0}}}{\sqrt{V_{H_{0}}}}\]Where, \(\overline{H_{0}}\) and \(V_{H_{0}}\) are the mean and variance of the bootstrap null sample \(H_{0}\).
The corresponding degrees of freedom is estimated as follows.
\[df = (k_{EC} - 1) + (k_{CS} - 1)\]Berger WH, Parker FL (1970).
“Diversity of planktonic foraminifera in deep-sea sediments.”
Science, 168(3937), 1345–1347.
Gini C (1912).
Variabilita e Mutabilita. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche. [Fasc. I.].
Tipogr. di P. Cuppini, Bologna.
Gini C (1912).
“Variabilita e mutabilita.”
In Pizetti E, Salvemini T (eds.), Memorie di Metodologica Statistica.
Liberia Eredi Virgilio Veschi, Roma, Italy.
Greenberg JH (1956).
“The measurement of linguistic diversity.”
Language, 32(1), 109.
Hennink S, Zeven AC (1990).
“The interpretation of Nei and Shannon-Weaver within population variation indices.”
Euphytica, 51(3), 235–240.
Hill MO (1973).
“Diversity and evenness: A unifying notation and its consequences.”
Ecology, 54(2), 427–432.
Hutcheson K (1970).
“A test for comparing diversities based on the Shannon formula.”
Journal of Theoretical Biology, 29(1), 151–154.
Lyons NI, Hutcheson K (1978).
“C20. Comparing diversities: Gini's index.”
Journal of Statistical Computation and Simulation, 8(1), 75–78.
McIntosh RP (1967).
“An index of diversity and the relation of certain concepts to diversity.”
Ecology, 48(3), 392–404.
Nei M (1973).
“Analysis of gene diversity in subdivided populations.”
Proceedings of the National Academy of Sciences, 70(12), 3321–3323.
Peet RK (1974).
“The measurement of species diversity.”
Annual Review of Ecology and Systematics, 5(1), 285–307.
Shannon CE, Weaver W (1949).
The Mathematical Theory of Communication, number v. 2 in The Mathematical Theory of Communication.
University of Illinois Press.
Simpson EH (1949).
“Measurement of diversity.”
Nature, 163(4148), 688–688.
Solow AR (1993).
“A simple test for change in community structure.”
The Journal of Animal Ecology, 62(1), 191.
Williams CB (1964).
Patterns in the Balance of Nature and Related Problems in Quantitative Ecology.
Academic Press.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) diversity.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) diversity.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
Plot stacked frequency distribution histogram to graphically compare the probability distributions of traits between entire collection (EC) and core set (CS).
freqdist.evaluate.core( data, names, quantitative, qualitative, selected, highlight = NULL, include.highlight = TRUE, highlight.se = NULL, highlight.col = "red" )
freqdist.evaluate.core( data, names, quantitative, qualitative, selected, highlight = NULL, include.highlight = TRUE, highlight.se = NULL, highlight.col = "red" )
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
highlight |
Individual names to be highlighted as a character vector. |
include.highlight |
If |
highlight.se |
Optional data frame of standard errors for the
individuals specified in |
highlight.col |
The colour(s) to be used to highlighting individuals in
the plot as a character vector of the same length as |
A list with the ggplot
objects of stacked frequency
distribution histograms plots for each trait specified as
quantitative
and qualitative
.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) freqdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core) checks <- c("TMe-1199", "TMe-1957", "TMe-3596", "TMe-3392") freqdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core, highlight = checks, highlight.col = "red") quant.se <- data.frame(genotypes = checks, NMSR = c(0.107, 0.099, 0.106, 0.062), TTRN = c(0.081, 0.072, 0.057, 0.049), TFWSR = c(0.089, 0.031, 0.092, 0.097), TTRW = c(0.064, 0.031, 0.071, 0.071), TFWSS = c(0.106, 0.071, 0.121, 0.066), TTSW = c(0.084, 0.045, 0.066, 0.054), TTPW = c(0.098, 0.052, 0.111, 0.082), AVPW = c(0.074, 0.038, 0.054, 0.061), ARSR = c(0.104, 0.019, 0.204, 0.044), SRDM = c(0.078, 0.138, 0.076, 0.079)) freqdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, highlight = checks, highlight.col = "red", highlight.se = quant.se)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) freqdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core) checks <- c("TMe-1199", "TMe-1957", "TMe-3596", "TMe-3392") freqdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, qualitative = qual, selected = core, highlight = checks, highlight.col = "red") quant.se <- data.frame(genotypes = checks, NMSR = c(0.107, 0.099, 0.106, 0.062), TTRN = c(0.081, 0.072, 0.057, 0.049), TFWSR = c(0.089, 0.031, 0.092, 0.097), TTRW = c(0.064, 0.031, 0.071, 0.071), TFWSS = c(0.106, 0.071, 0.121, 0.066), TTSW = c(0.084, 0.045, 0.066, 0.054), TTPW = c(0.098, 0.052, 0.111, 0.082), AVPW = c(0.074, 0.038, 0.054, 0.061), ARSR = c(0.104, 0.019, 0.204, 0.044), SRDM = c(0.078, 0.138, 0.076, 0.079)) freqdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, highlight = checks, highlight.col = "red", highlight.se = quant.se)
Compute the Interquartile Range (IQR) (Upton and Cook 1996) to compare quantitative traits of the entire collection (EC) and core set (CS).
iqr.evaluate.core(data, names, quantitative, selected)
iqr.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A data frame with the IQR values of the EC and CS for the traits
specified as quantitative
.
Upton G, Cook I (1996). “General summary statistics.” In Understanding statistics. Oxford University Press.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) iqr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) iqr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Test for of variances of the entire collection (EC) and core set (CS) for quantitative traits by Levene's test (Levene 1960).
levene.evaluate.core(data, names, quantitative, selected)
levene.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A data frame with the following columns
Trait |
The quantitative trait. |
EC_V |
The variance of the EC. |
CS_V |
The variance of the CS. |
EC_CV |
The coefficient of variance of the EC. |
CS_CV |
The coefficient of variance of the CS. |
Levene_Fvalue |
The test statistic. |
Levene_pvalue |
The p value for the test statistic. |
Levene_significance |
The significance of the test statistic (*: p \(\leq\) 0.01; **: p \(\leq\) 0.05; ns: p \( > \) 0.05). |
Levene H (1960). “Robust tests for equality of variances.” In Olkin I, Ghurye SG, Hoeffding W, Madow WG, Mann HB (eds.), Contribution to Probability and Statistics: Essays in Honor of Harold Hotelling, 278–292. Stanford University Press, Palo Alto, CA.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) levene.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) levene.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Compute Principal Component Analysis Statistics (Mardia et al. 1979) to compare the probability distributions of quantitative traits between entire collection (EC) and core set (CS).
pca.evaluate.core( data, names, quantitative, selected, center = TRUE, scale = TRUE, npc.plot = 6 )
pca.evaluate.core( data, names, quantitative, selected, center = TRUE, scale = TRUE, npc.plot = 6 )
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
center |
either a logical value or numeric-alike vector of length
equal to the number of columns of |
scale |
either a logical value or a numeric-alike vector of length
equal to the number of columns of |
npc.plot |
The number of principal components for which eigen values are to be plotted. The default value is 6. |
A list with the following components.
EC PC Importance |
A data frame of importance of principal components for EC |
EC PC Loadings |
A data frame with eigen vectors of principal components for EC |
CS PC
Importance |
A data frame of importance of principal components for CS |
CS PC Loadings |
A data frame with eigen vectors of principal components for CS |
Scree Plot |
The scree plot of principal components
for EC and CS as a |
PC Loadings Plot |
A plot of
the eigen vector values of principal components for EC and CS as specified
by |
Mardia KV, Kent JT, Bibby JM (1979). Multivariate analysis. Academic Press, London; New York. ISBN 0-12-471250-9 978-0-12-471250-8 0-12-471252-5 978-0-12-471252-2.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) pca.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, center = TRUE, scale = TRUE, npc.plot = 4)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) pca.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, center = TRUE, scale = TRUE, npc.plot = 4)
Compute Kullback-Leibler (Kullback and Leibler 1951), Kolmogorov-Smirnov (Kolmogorov 1933; Smirnov 1948) and Anderson-Darling distances (Anderson and Darling 1952) between the probability distributions of collection (EC) and core set (CS) for quantitative traits.
pdfdist.evaluate.core(data, names, quantitative, selected)
pdfdist.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A data frame with the following columns.
Trait |
The quantitative trait. |
KL_Distance |
The Kullback-Leibler distance (Kullback and Leibler 1951) between EC and CS. |
KS_Distance |
The Kolmogorov-Smirnov distance (Kolmogorov 1933; Smirnov 1948) between EC and CS. |
KS_pvalue |
The p value of the Kolmogorov-Smirnov distance. |
AD_Distance |
Anderson-Darling distance (Anderson and Darling 1952) between EC and CS. |
AD_pvalue |
The p value of the Anderson-Darling distance. |
KS_significance |
The significance of the Kolmogorov-Smirnov distance (*: p \(\leq\) 0.01; **: p \(\leq\) 0.05; ns: p \(>\) 0.05). |
AD_pvalue |
The significance of the Anderson-Darling distance (*: p \(\leq\) 0.01; **: p \(\leq\) 0.05; ns: p \(>\) 0.05). |
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) pdfdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) pdfdist.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Compute the following differences between the entire collection (EC) and core set (CS).
Percentage of significant differences of mean (\(MD\%_{Hu}\)) (Hu et al. 2000)
Percentage of significant differences of variance (\(VD\%_{Hu}\)) (Hu et al. 2000)
Average of absolute differences between means (\(MD\%_{Kim}\)) (Kim et al. 2007)
Average of absolute differences between variances (\(VD\%_{Kim}\)) (Kim et al. 2007)
Percentage difference between the mean squared Euclidean distance among accessions (\(\overline{d}D\%\)) (Studnicki et al. 2013)
Percentage of range ratios smaller than 0.70 (\(S_{RR_{0.7}}\)) (Diwan et al. 1995)
percentdiff.evaluate.core( data, names, quantitative, selected, alpha = 0.05, rr.crit = 0.7 )
percentdiff.evaluate.core( data, names, quantitative, selected, alpha = 0.05, rr.crit = 0.7 )
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
alpha |
Type I error probability (Significance level) of difference. |
rr.crit |
The critical value of range ratio considered to be acceptable for a representative CS. The default value is 0.7. |
The differences are computed as follows.
\[MD\%_{Hu} = \left ( \frac{S_{t}}{n} \right ) \times 100\]Where, \(S_{t}\) is the number of traits with a significant difference between the means of the EC and the CS and \(n\) is the total number of traits. A representative core should have \(MD\%_{Hu}\) < 20 % and \(CR\) > 80 % (Hu et al. 2000).
\[VD\%_{Hu} = \left ( \frac{S_{F}}{n} \right ) \times 100\]Where, \(S_{F}\) is the number of traits with a significant difference between the variances of the EC and the CS and \(n\) is the total number of traits. Larger \(VD\%_{Hu}\) value indicates a more diverse core set.
\[MD\%_{Kim} = \left ( \frac{1}{n}\sum_{i=1}^{n} \frac{\left | M_{EC_{i}}-M_{CS_{i}} \right |}{M_{CS_{i}}} \right ) \times 100\]Where, \(M_{EC_{i}}\) is the mean of the EC for the \(i\)th trait, \(M_{CS_{i}}\) is the mean of the CS for the \(i\)th trait and \(n\) is the total number of traits.
\[VD\%_{Kim} = \left ( \frac{1}{n}\sum_{i=1}^{n} \frac{\left | V_{EC_{i}}-V_{CS_{i}} \right |}{V_{CS_{i}}} \right ) \times 100\]Where, \(V_{EC_{i}}\) is the variance of the EC for the \(i\)th trait, \(V_{CS_{i}}\) is the variance of the CS for the \(i\)th trait and \(n\) is the total number of traits.
\[\overline{d}D\% = \frac{\overline{d}_{CS}-\overline{d}_{EC}}{\overline{d}_{EC}} \times 100\]Where, \(\overline{d}_{CS}\) is the mean squared Euclidean distance among accessions in the CS and \(\overline{d}_{EC}\) is the mean squared Euclidean distance among accessions in the EC.
Percentage of range ratios smaller than 0.70 (Diwan et al. 1995) is computed as follows.
\[RR\%_{0.7} = \left ( \frac{S_{RR_{0.7}}}{n} \right ) \times 100\]Where, \(S_{RR_{0.7}}\) is the number of traits with a range ratio smaller than 0.7 (\(\frac{R_{CS_{i}}}{R_{EC_{i}}} < 0.7\)) \(R_{CS_{i}}\) is the range of the \(i\)th trait in the CS, \(R_{EC_{i}}\) is the range of the \(i\)th trait in the EC and \(n\) is the total number of traits.
A data frame with the values of \(MD\%_{Hu}\), \(VD\%_{Hu}\), \(MD\%_{Kim}\), \(VD\%_{Kim}\) and \(\overline{d}D\%\).
Diwan N, McIntosh MS, Bauchan GR (1995).
“Methods of developing a core collection of annual Medicago species.”
Theoretical and Applied Genetics, 90(6), 755–761.
Hu J, Zhu J, Xu HM (2000).
“Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops.”
Theoretical and Applied Genetics, 101(1), 264–268.
Kim K, Chung H, Cho G, Ma K, Chandrabalan D, Gwag J, Kim T, Cho E, Park Y (2007).
“PowerCore: A program applying the advanced M strategy with a heuristic search for establishing core sets.”
Bioinformatics, 23(16), 2155–2162.
Studnicki M, Madry W, Schmidt J (2013).
“Comparing the efficiency of sampling strategies to establish a representative in the phenotypic-based genetic diversity core collection of orchardgrass (Dactylis glomerata L.).”
Czech Journal of Genetics and Plant Breeding, 49(1), 36–47.
snk.evaluate.core
,
snk.evaluate.core
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) percentdiff.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) percentdiff.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Plot Quantile-Quantile (QQ) plots (Wilk and Gnanadesikan 1968) to graphically compare the probability distributions of quantitative traits between entire collection (EC) and core set (CS).
qq.evaluate.core( data, names, quantitative, selected, annotate = c("none", "kl", "ks", "ad") )
qq.evaluate.core( data, names, quantitative, selected, annotate = c("none", "kl", "ks", "ad") )
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
annotate |
Adds the divergence/distance value between probability
distributions of CS and EC as an annotation to the QQ plot. Either
|
A list with the ggplot
objects of QQ plots of CS vs EC for
each trait specified as quantitative
.
Wilk MB, Gnanadesikan R (1968). “Probability plotting methods for the analysis for the analysis of data.” Biometrika, 55(1), 1–17.
qqplot
KL.plugin
,
ks.test
, ad.test
,
pdfdist.evaluate.core
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core) qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, annotate = "kl") qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, annotate = "ks") qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, annotate = "ad")
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core) qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, annotate = "kl") qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, annotate = "ks") qq.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core, annotate = "ad")
Compute the Ratio of Phenotype Retained (\(RPR\)) (Li et al. 2002) to compare qualitative traits between entire collection (EC) and core set (CS).
rpr.evaluate.core(data, names, qualitative, selected)
rpr.evaluate.core(data, names, qualitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
Ratio of Phenotype Retained (\(RPR\)) (Kim et al. 2007) is computed as follows.
\[RPR = \frac{\sum_{i=1}^{n} k_{CS_{i}}}{\sum_{i=1}^{n} k_{EC_{i}}}\]Where, \(k_{CS_{i}}\) is the number of phenotypic classes in CS for the \(i\)th trait, \(k_{EC_{i}}\) is the number of phenotypic classes in EC for the \(i\)th trait and \(n\) is the total number of traits.
The Ratio of Phenotype Retained value.
Kim K, Chung H, Cho G, Ma K, Chandrabalan D, Gwag J, Kim T, Cho E, Park Y (2007).
“PowerCore: A program applying the advanced M strategy with a heuristic search for establishing core sets.”
Bioinformatics, 23(16), 2155–2162.
Li Z, Zhang H, Zeng Y, Yang Z, Shen S, Sun C, Wang X (2002).
“Studies on sampling schemes for the establishment of corecollection of rice landraces in Yunnan, China.”
Genetic Resources and Crop Evolution, 49(1), 67–74.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) rpr.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) rpr.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
Compute the Synthetic Variation Coefficient (\(CV\%\)) (Dong 1998; Dong et al. 2001) to compare quantitative traits of the entire collection (EC) and core set (CS).
scv.evaluate.core(data, names, quantitative, selected)
scv.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
Synthetic Variation Coefficient (\(CV\%\)) (Dong 1998; Dong et al. 2001) is computed as follows for the core set (CS).
\[CV(\%) = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{SE_{j}}{\mu_{i}} \right ) \times 100\]Where, \(SE_{i}\) is the standard error of the \(i\)th trait, \(\mu_{i}\) is the mean of the \(i\)th trait and \(n\) is the total number of traits.
The Synthetic Variation Coefficient values for EC and CS
Dong YS (1998).
“Exploration on genetic diversity center for cultivated soybean in China.”
Chinese Crops Journal, 1, 18–19.
Dong YS, Zhuang BC, Zhao LM, Sun H, He MY (2001).
“The genetic diversity of annual wild soybeans grown in China.”
Theoretical and Applied Genetics, 103(1), 98–103.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) scv.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) scv.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Test difference between means and variances of entire collection (EC) and core set (CS) for quantitative traits by Sign test (\(+\) versus \(-\)) (Basigalup et al. 1995; Tai and Miller 2001).
signtest.evaluate.core(data, names, quantitative, selected)
signtest.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
The test statistic for Sign test (\(\chi^{2}\)) is computed as follows.
\[\chi^{2} = \frac{(N_{1}-N_{2})^{2}}{N_{1}+N_{2}}\]Where, where \(N_{1}\) is the number of variables for which the mean or variance of the CS is greater than the mean or variance of the EC (number of \(+\) signs); \(N_{2}\) is the number of variables for which the mean or variance of the CS is less than the mean or variance of the EC (number of \(-\) signs). The value of \(\chi^{2}\) is compared with a Chi-square distribution with 1 degree of freedom.
A data frame with the following components.
Comparison |
The comparison measure. |
ChiSq |
The test statistic (\(\chi^{2}\)). |
p.value |
The p value for the test statistic. |
significance |
The significance of the test statistic (*: p \(\leq\) 0.01; **: p \(\leq\) 0.05; ns: p \( > \) 0.05). |
Basigalup DH, Barnes DK, Stucker RE (1995).
“Development of a core collection for perennial Medicago plant introductions.”
Crop Science, 35(4), 1163–1168.
Tai PYP, Miller JD (2001).
“A Core Collection for Saccharum spontaneum L. from the World Collection of Sugarcane.”
Crop Science, 41(3), 879–885.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) signtest.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) signtest.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Test difference between means of entire collection (EC) and core set (CS) for quantitative traits by Newman-Keuls or Student-Newman-Keuls test (Newman 1939; Keuls 1952).
snk.evaluate.core(data, names, quantitative, selected)
snk.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
A data frame with the following components.
Trait |
The quantitative trait. |
EC_Min |
The minimum value of the trait in EC. |
EC_Max |
The maximum value of the trait in EC. |
EC_Mean |
The mean value of the trait in EC. |
EC_SE |
The standard error of the trait in EC. |
CS_Min |
The minimum value of the trait in CS. |
CS_Max |
The maximum value of the trait in CS. |
CS_Mean |
The mean value of the trait in CS. |
CS_SE |
The standard error of the trait in CS. |
SNK_pvalue |
The p value of the Student-Newman-Keuls test for equality of means of EC and CS. |
SNK_significance |
The significance of the Student-Newman-Keuls test for equality of means of EC and CS. |
Keuls M (1952).
“The use of the ,,studentized range" in connection with an analysis of variance.”
Euphytica, 1(2), 112–122.
Newman D (1939).
“The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation.”
Biometrika, 31(1-2), 20–30.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) snk.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) snk.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Test difference between means of entire collection (EC) and core set (CS) for quantitative traits by Student's t test (Student 1908).
ttest.evaluate.core(data, names, quantitative, selected)
ttest.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
Trait |
The quantitative trait. |
EC_Min |
The minimum value of the trait in EC. |
EC_Max |
The maximum value of the trait in EC. |
EC_Mean |
The mean value of the trait in EC. |
EC_SE |
The standard error of the trait in EC. |
CS_Min |
The minimum value of the trait in CS. |
CS_Max |
The maximum value of the trait in CS. |
CS_Mean |
The mean value of the trait in CS. |
CS_SE |
The standard error of the trait in CS. |
ttest_pvalue |
The p value of the Student's t test for equality of means of EC and CS. |
ttest_significance |
The significance of the Student's t test for equality of means of EC and CS. |
Student (1908). “The probable error of a mean.” Biometrika, 6(1), 1–25.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) ttest.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) ttest.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Compute the Variance of Phenotypic Frequency (\(VPF\)) (Li et al. 2002) to compare qualitative traits between entire collection (EC) and core set (CS).
vpf.evaluate.core(data, names, qualitative, selected)
vpf.evaluate.core(data, names, qualitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
qualitative |
Name of columns with the qualitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
Variance of Phenotypic Frequency (\(VPF\)) (Li et al. 2002) is computed as follows.
\[VPF = \frac{1}{n} \sum_{i=1}^{n}\left ( \frac{\sum_{j=1}^{k} (p_{ij} - \overline{p_{i}})^{2}}{k - 1} \right )\]Where, \(p_{ij}\) denotes the proportion/fraction/frequency of accessions in the \(i\)th phenotypic class for the \(i\)th trait, \(\overline{p_{i}}\) is the mean frequency of phenotypic classes for the \(i\)th trait, \(k\) is the number of phenotypic classes for the \(i\)th trait and \(n\) is the total number of traits.
The Variance of Phenotypic Frequency values for EC and CS.
Li Z, Zhang H, Zeng Y, Yang Z, Shen S, Sun C, Wang X (2002). “Studies on sampling schemes for the establishment of corecollection of rice landraces in Yunnan, China.” Genetic Resources and Crop Evolution, 49(1), 67–74.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) vpf.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) vpf.evaluate.core(data = ec, names = "genotypes", qualitative = qual, selected = core)
Compute the Variable Rate of Coefficient of Variation (\(VR\)) (Hu et al. 2000) to compare quantitative traits of the entire collection (EC) and core set (CS).
vr.evaluate.core(data, names, quantitative, selected)
vr.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
The Variable Rate of Coefficient of Variation (\(VR\)) is computed as follows.
\[VR = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{CV_{CS_{i}}}{CV_{EC_{i}}} \right ) \times 100\]Where, \(CV_{CS_{i}}\) is the coefficients of variation for the \(i\)th trait in the CS, \(CV_{EC_{i}}\) is the coefficients of variation for the \(i\)th trait in the EC and \(n\) is the total number of traits
The \(VR\) value.
Hu J, Zhu J, Xu HM (2000). “Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops.” Theoretical and Applied Genetics, 101(1), 264–268.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) vr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) vr.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
Compare the medians of quantitative traits between entire collection (EC) and core set (CS) by Wilcoxon rank sum test or Mann-Whitney-Wilcoxon test or Mann-Whitney U test (Wilcoxon 1945; Mann and Whitney 1947).
wilcox.evaluate.core(data, names, quantitative, selected)
wilcox.evaluate.core(data, names, quantitative, selected)
data |
The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data. |
names |
Name of column with the individual names as a character string. |
quantitative |
Name of columns with the quantitative traits as a character vector. |
selected |
Character vector with the names of individuals selected in
core collection and present in the |
Trait |
The quantitative trait. |
EC_Med |
The median value of the trait in EC. |
CS_Med |
The median value of the trait in CS. |
Wilcox_pvalue |
The p value of the Wilcoxon test for equality of medians of EC and CS. |
Wilcox_significance |
The significance of the Wilcoxon test for equality of medians of EC and CS. |
Mann HB, Whitney DR (1947).
“On a test of whether one of two random variables is stochastically larger than the other.”
The Annals of Mathematical Statistics, 18(1), 50–60.
Wilcoxon F (1945).
“Individual comparisons by ranking methods.”
Biometrics Bulletin, 1(6), 80.
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) wilcox.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)
data("cassava_CC") data("cassava_EC") ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC) ec$genotypes <- as.character(ec$genotypes) rownames(ec) <- NULL core <- rownames(cassava_CC) quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW", "ARSR", "SRDM") qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB", "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC", "PSTR") ec[, qual] <- lapply(ec[, qual], function(x) factor(as.factor(x))) wilcox.evaluate.core(data = ec, names = "genotypes", quantitative = quant, selected = core)