Package 'EvaluateCore' reference manual

Title:	Quality Evaluation of Core Collections
Description:	Implements various quality evaluation statistics to assess the value of plant germplasm core collections using qualitative and quantitative phenotypic trait data according to Odong et al. (2015) <doi:10.1007/s00122-012-1971-y>.
Authors:	J. Aravind [aut, cre] , Vikender Kaur [aut] , Dhammaprakash Pandhari Wankhede [aut] , Joghee Nanjundan [aut] , ICAR-NBGPR [cph] (www.nbpgr.ernet.in)
Maintainer:	J. Aravind <[email protected]>
License:	GPL-2 \| GPL-3
Version:	0.1.3.9000
Built:	2025-02-14 04:12:30 UTC
Source:	https://github.com/aravind-j/evaluatecore

Bar Plots

Description

Plot Bar plots to graphically compare the frequency distributions of qualitative traits between entire collection (EC) and core set (CS).

Usage

bar.evaluate.core(data, names, qualitative, selected)
bar.evaluate.core(data, names, qualitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A list with the ggplot objects of relative frequency bar plots of CS and EC for each trait specified as qualitative.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

bar.evaluate.core(data = ec, names = "genotypes",
                  qualitative = qual, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

bar.evaluate.core(data = ec, names = "genotypes",
                  qualitative = qual, selected = core)

Box Plots

Description

Plot Box-and-Whisker plots (Tukey 1970; McGill et al. 1978) to graphically compare the probability distributions of quantitative traits between entire collection (EC) and core set (CS).

Usage

box.evaluate.core(data, names, quantitative, selected)
box.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A list with the ggplot objects of box plots of CS and EC for each trait specified as quantitative.

References

McGill R, Tukey JW, Larsen WA (1978). “Variations of box plots.” The American Statistician, 32(1), 12.

Tukey JW (1970). Exploratory Data Analysis. Preliminary edition. Addison-Wesley.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

box.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

box.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

IITA Cassava Germplasm Data - Core Collection

Description

An example germplasm characterisation data of a core collection generated from 1591 accessions of IITA Cassava collection (International Institute of Tropical Agriculture et al. 2019) using 10 quantitative and 48 qualitative trait data with CoreHunter3 (corehunter). The core set was generated using distance based measures giving equal weightage to Average entry-to-nearest-entry distance (EN) and Average accession-to-nearest-entry distance (AN). Includes data on 26 descriptors for 168 (10 % of cassava_EC) accessions. It is used to demonstrate the various functions of EvaluateCore package.

Usage

cassava_CC
cassava_CC

Format

A data frame with 58 columns:

CUAL: Colour of unexpanded apical leaves
LNGS: Length of stipules
PTLC: Petiole colour
DSTA: Distribution of anthocyanin
LFRT: Leaf retention
LBTEF: Level of branching at the end of flowering
CBTR: Colour of boiled tuberous root
NMLB: Number of levels of branching
ANGB: Angle of branching
CUAL9M: Colours of unexpanded apical leaves at 9 months
LVC9M: Leaf vein colour at 9 months
TNPR9M: Total number of plants remaining per accession at 9 months
PL9M: Petiole length at 9 months
STRP: Storage root peduncle
STRC: Storage root constrictions
PSTR: Position of root
NMSR: Number of storage root per plant
TTRN: Total root number per plant
TFWSR: Total fresh weight of storage root per plant
TTRW: Total root weight per plant
TFWSS: Total fresh weight of storage shoot per plant
TTSW: Total shoot weight per plant
TTPW: Total plant weight
AVPW: Average plant weight
ARSR: Amount of rotted storage root per plant
SRDM: Storage root dry matter

Details

Further details on how the example dataset was built from the original data is available online.

References

International Institute of Tropical Agriculture, Benjamin F, Marimagne T (2019). “Cassava morphological characterization. Version 2018.1.” www.genesys-pgr.org.

Examples


data(cassava_CC)
summary(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

lapply(seq_along(cassava_CC[, qual]),
       function(i) barplot(table(cassava_CC[, qual][, i]),
                           xlab = names(cassava_CC[, qual])[i]))

lapply(seq_along(cassava_CC[, quant]),
       function(i) hist(table(cassava_CC[, quant][, i]),
                        xlab = names(cassava_CC[, quant])[i],
                        main = ""))

data(cassava_CC)
summary(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

lapply(seq_along(cassava_CC[, qual]),
       function(i) barplot(table(cassava_CC[, qual][, i]),
                           xlab = names(cassava_CC[, qual])[i]))

lapply(seq_along(cassava_CC[, quant]),
       function(i) hist(table(cassava_CC[, quant][, i]),
                        xlab = names(cassava_CC[, quant])[i],
                        main = ""))

IITA Cassava Germplasm Data - Entire Collection

Description

An example germplasm characterisation data of a subset of IITA Cassava collection (International Institute of Tropical Agriculture et al. 2019). Includes data on 26 (out of 62) descriptors for 1684 (out of 2170) accessions. It is used to demonstrate the various functions of EvaluateCore package.

Usage

cassava_EC
cassava_EC

Format

A data frame with 58 columns:

CUAL: Colour of unexpanded apical leaves
LNGS: Length of stipules
PTLC: Petiole colour
DSTA: Distribution of anthocyanin
LFRT: Leaf retention
LBTEF: Level of branching at the end of flowering
CBTR: Colour of boiled tuberous root
NMLB: Number of levels of branching
ANGB: Angle of branching
CUAL9M: Colours of unexpanded apical leaves at 9 months
LVC9M: Leaf vein colour at 9 months
TNPR9M: Total number of plants remaining per accession at 9 months
PL9M: Petiole length at 9 months
STRP: Storage root peduncle
STRC: Storage root constrictions
PSTR: Position of root
NMSR: Number of storage root per plant
TTRN: Total root number per plant
TFWSR: Total fresh weight of storage root per plant
TTRW: Total root weight per plant
TFWSS: Total fresh weight of storage shoot per plant
TTSW: Total shoot weight per plant
TTPW: Total plant weight
AVPW: Average plant weight
ARSR: Amount of rotted storage root per plant
SRDM: Storage root dry matter

Details

Further details on how the example dataset was built from the original data is available online.

References

International Institute of Tropical Agriculture, Benjamin F, Marimagne T (2019). “Cassava morphological characterization. Version 2018.1.” www.genesys-pgr.org.

Examples


data(cassava_EC)
summary(cassava_EC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

lapply(seq_along(cassava_EC[, qual]),
       function(i) barplot(table(cassava_EC[, qual][, i]),
                           xlab = names(cassava_EC[, qual])[i]))

lapply(seq_along(cassava_EC[, quant]),
       function(i) hist(table(cassava_EC[, quant][, i]),
                        xlab = names(cassava_EC[, quant])[i],
                        main = ""))

data(cassava_EC)
summary(cassava_EC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

lapply(seq_along(cassava_EC[, qual]),
       function(i) barplot(table(cassava_EC[, qual][, i]),
                           xlab = names(cassava_EC[, qual])[i]))

lapply(seq_along(cassava_EC[, quant]),
       function(i) hist(table(cassava_EC[, quant][, i]),
                        xlab = names(cassava_EC[, quant])[i],
                        main = ""))

Chi-squared Test for Homogeneity

Description

Compare the distribution frequencies of qualitative traits between entire collection (EC) and core set (CS) by Chi-squared test for homogeneity (Pearson 1900; Snedecor and Irwin 1933).

Usage

chisquare.evaluate.core(data, names, qualitative, selected)
chisquare.evaluate.core(data, names, qualitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A a data frame with the following columns.

`Trait`	The qualitative trait.
`EC_No.Classes`	The number of classes in the trait for EC.
`EC_Classes`	The frequency of the classes in the trait for EC.
`CS_No.Classes`	The number of classes in the trait for CS.
`CS_Classes`	The frequency of the classes in the trait for CS.
`chisq_statistic`	The \(\chi^{2}\) test statistic.
`chisq_pvalue`	The p value for the test statistic.
`chisq_significance`	The significance of the test statistic (: p \(\leq\) 0.01; *: p \(\leq\) 0.05; ns: p \( > \) 0.05).

References

Pearson K (1900). “X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175.

Snedecor G, Irwin MR (1933). “On the chi-square test for homogeneity.” Iowa State College Journal of Science, 8, 75–81.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

chisquare.evaluate.core(data = ec, names = "genotypes",
                        qualitative = qual, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

chisquare.evaluate.core(data = ec, names = "genotypes",
                        qualitative = qual, selected = core)

Phenotypic Correlations

Description

Compute phenotypic correlations (Pearson 1895) between traits, plot correlation matrices as correlograms (Friendly 2002) and calculate mantel correlation (Legendre and Legendre 2012) between them to compare entire collection (EC) and core set (CS).

Usage

corr.evaluate.core(data, names, quantitative, qualitative, selected)
corr.evaluate.core(data, names, quantitative, qualitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A list with the following components.

`Correlation Matrix`	The matrix with phenotypic correlations between traits in EC (below diagonal) and CS (above diagonal).
`Correologram`	A correlogram of phenotypic correlations between traits in EC (below diagonal) and CS (above diagonal) as a `ggplot` object.
`Mantel Correlation`	A data frame with Mantel correlation coefficient (\(r\)) between EC and CS phenotypic correlation matrices, it's p value and significance (: p \(\leq\) 0.01; *: p \(\leq\) 0.05; ns: p \( > \) 0.05).

References

Friendly M (2002). “Corrgrams.” The American Statistician, 56(4), 316–324.

Legendre P, Legendre L (2012). “Interpretation of ecological structures.” In Developments in Environmental Modelling, volume 24, 521–624. Elsevier.

Pearson K (1895). “Note on regression and inheritance in the case of two parents.” Proceedings of the Royal Society of London, 58, 240–242.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

corr.evaluate.core(data = ec, names = "genotypes", quantitative = quant,
                   qualitative = qual, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

corr.evaluate.core(data = ec, names = "genotypes", quantitative = quant,
                   qualitative = qual, selected = core)

Class Coverage

Description

Compute the Class Coverage (Kim et al. 2007) to compare the distribution frequencies of qualitative traits between entire collection (EC) and core set (CS).

Usage

coverage.evaluate.core(data, names, qualitative, selected)
coverage.evaluate.core(data, names, qualitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Details

Class Coverage (Kim et al. 2007) is computed as follows.

\[Class\, Coverage = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{k_{CS_{i}}}{k_{EC_{i}}} \right ) \times 100\]

Where, \(k_{CS_{i}}\) is the number of phenotypic classes in CS for the \(i\)th trait, \(k_{EC_{i}}\) is the number of phenotypic classes in EC for the \(i\)th trait and \(n\) is the total number of traits.

Value

The Class Coverage value.

References

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

coverage.evaluate.core(data = ec, names = "genotypes",
                       qualitative = qual, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

coverage.evaluate.core(data = ec, names = "genotypes",
                       qualitative = qual, selected = core)

Coincidence Rate of Range

Description

Compute the following metrics to compare quantitative traits of the entire collection (EC) and core set (CS).

Coincidence Rate of Range (\(CR\)) (Hu et al. 2000) (originally described by (Diwan et al. 1995) as Mean range ratio)
Changeable Rate of Maximum (\(CR_{\max}\)) (Wang et al. 2007)
Changeable Rate of Minimum (\(CR_{\min}\)) (Wang et al. 2007)
Changeable Rate of Mean (\(CR_{\mu}\)) (Wang et al. 2007)

Usage

cr.evaluate.core(data, names, quantitative, selected)
cr.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Details

The Coincidence Rate of Range (\(CR\)) is computed as follows.

\[CR = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{R_{CS_{i}}}{R_{EC_{i}}} \right ) \times 100\]

Where, \(R_{CS_{i}}\) is the range of the \(i\)th trait in the CS, \(R_{EC_{i}}\) is the range of the \(i\)th trait in the EC and \(n\) is the total number of traits.

A representative CS should have a \(CR\) value no less than 70% (Diwan et al. 1995) or 80% (Hu et al. 2000).

The Changeable Rate of Maximum (\(CR_{\max}\)) is computed as follows.

\[CR_{\max} = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{\max_{CS_{i}}}{\max_{EC_{i}}} \right ) \times 100\]

Where, \(\max_{CS_{i}}\) is the maximum value of the \(i\)th trait in the CS, \(\max_{EC_{i}}\) is the maximum value of the \(i\)th trait in the EC and \(n\) is the total number of traits.

The Changeable Rate of Minimum (\(CR_{\min}\)) is computed as follows.

\[CR_{\min} = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{\min_{CS_{i}}}{\min_{EC_{i}}} \right ) \times 100\]

Where, \(\min_{CS_{i}}\) is the minimum value of the \(i\)th trait in the CS, \(\min_{EC_{i}}\) is the minimum value of the \(i\)th trait in the EC and \(n\) is the total number of traits.

The Changeable Rate of Mean (\(CR_{\mu}\)) is computed as follows.

\[CR_{\mu} = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{\mu_{CS_{i}}}{\mu_{EC_{i}}} \right ) \times 100\]

Where, \(\mu_{CS_{i}}\) is the mean value of the \(i\)th trait in the CS, \(\mu_{EC_{i}}\) is the mean value of the \(i\)th trait in the EC and \(n\) is the total number of traits.

Value

The \(CR\) value.

Note

NaN or Inf values for \[CR_{\min}\] occurs when the minimum values for some of the traits are zero.

References

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

cr.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

cr.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core)

Distance Measures

Description

Compute average Entry-to-nearest-entry distance (\(E\text{-}EN\)), Accession-to-nearest-entry distance (\(E\text{-}EN\)) and Entry-to-entry distance (\(E\text{-}EN\)) (Odong et al. 2013) to evaluate a core set (CS) selected from an entire collection (EC).

Usage

dist.evaluate.core(data, names, quantitative, qualitative, selected, d = NULL)
dist.evaluate.core(data, names, quantitative, qualitative, selected, d = NULL)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.
`d`	A distance matrix of class "`dist`" with individual names in the `names` column in `data` as labels. If `NULL` (default), then a distance matrix is computed using Gower's metric. (Gower 1971).

Value

A data frame with the average values of \(E\text{-}EN\), \(E\text{-}EN\) and \(E\text{-}EN\).

References

Gower JC (1971). “A general coefficient of similarity and some of its properties.” Biometrics, 27(4), 857–871.

Odong TL, Jansen J, van Eeuwijk FA, van Hintum TJL (2013). “Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation.” Theoretical and Applied Genetics, 126(2), 289–305.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

dist.evaluate.core(data = ec, names = "genotypes", quantitative = quant,
                   qualitative = qual, selected = core)


####################################
# Compare with corehunter
####################################

library(corehunter)
# Prepare phenotype dataset
dtype <- c(rep("RD", length(quant)),
           rep("NS", length(qual)))
rownames(ec) <- ec[, "genotypes"]
ecdata <- corehunter::phenotypes(data = ec[, c(quant, qual)],
                                 types = dtype)

# Compute average distances
EN <- evaluateCore(core = rownames(cassava_CC), data = ecdata,
                   objective = objective("EN", "GD"))
AN <- evaluateCore(core = rownames(cassava_CC), data = ecdata,
                   objective = objective("AN", "GD"))
EE <- evaluateCore(core = rownames(cassava_CC), data = ecdata,
                   objective = objective("EE", "GD"))
EN
AN
EE


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

dist.evaluate.core(data = ec, names = "genotypes", quantitative = quant,
                   qualitative = qual, selected = core)


####################################
# Compare with corehunter
####################################

library(corehunter)
# Prepare phenotype dataset
dtype <- c(rep("RD", length(quant)),
           rep("NS", length(qual)))
rownames(ec) <- ec[, "genotypes"]
ecdata <- corehunter::phenotypes(data = ec[, c(quant, qual)],
                                 types = dtype)

# Compute average distances
EN <- evaluateCore(core = rownames(cassava_CC), data = ecdata,
                   objective = objective("EN", "GD"))
AN <- evaluateCore(core = rownames(cassava_CC), data = ecdata,
                   objective = objective("AN", "GD"))
EE <- evaluateCore(core = rownames(cassava_CC), data = ecdata,
                   objective = objective("EE", "GD"))
EN
AN
EE

Diversity Indices

Description

Compute the following diversity indices and perform corresponding statistical tests to compare the phenotypic diversity for qualitative traits between entire collection (EC) and core set (CS).

Simpson's and related indices
- Simpson's Index (\(d\)) (Simpson 1949; Peet 1974)
- Simpson's Index of Diversity or Gini's Diversity Index or Gini-Simpson Index or Nei's Diversity Index or Nei's Variation Index (\(D\)) (Gini 1912, 1912; Greenberg 1956; Berger and Parker 1970; Nei 1973; Peet 1974)
- Maximum Simpson's Index of Diversity or Maximum Nei's Diversity/Variation Index (\(D_{max}\)) (Hennink and Zeven 1990)
- Simpson's Reciprocal Index or Hill's \(N_{2}\) (\(D_{R}\)) (Williams 1964; Hill 1973)
- Relative Simpson's Index of Diversity or Relative Nei's Diversity/Variation Index (\(D'\)) (Hennink and Zeven 1990)
Shannon-Weaver and related indices
- Shannon or Shannon-Weaver or Shannon-Weiner Diversity Index (\(H\)) (Shannon and Weaver 1949; Peet 1974)
- Maximum Shannon-Weaver Diversity Index (\(H_{max}\)) (Hennink and Zeven 1990)
- Relative Shannon-Weaver Diversity Index or Shannon Equitability Index (\(H'\)) (Hennink and Zeven 1990)
McIntosh Diversity Index
- McIntosh Diversity Index (\(D_{Mc}\)) (McIntosh 1967; Peet 1974)

Usage

diversity.evaluate.core(data, names, qualitative, selected, base = 2, R = 1000)
diversity.evaluate.core(data, names, qualitative, selected, base = 2, R = 1000)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.
`base`	The logarithm base to be used for computation of Shannon-Weaver Diversity Index (\(I\)). Default is 2.
`R`	The number of bootstrap replicates. Default is 1000.

Value

A list with three data frames as follows.

`simpson`	Trait The qualitative trait. EC_No.Classes The number of classes in the trait for EC. CS_No.Classes The number of classes in the trait for CS. EC_d The Simpson's Index (\(d\)) for EC. EC_D The Simpson's Index of Diversity (\(D\)) for EC. EC_D.max The Maximum Simpson's Index of Diversity (\(D_{max}\)) for EC. EC_D.inv The Simpson's Reciprocal Index (\(D_{R}\)) for EC. EC_D.rel The Relative Reciprocal Index (\(D'\)) for EC. EC_d.V The variance of \(d\) for EC according to (Simpson 1949). EC_d.boot.V The bootstrap variance of \(d\) for EC. CS_d The Simpson's Index (\(d\)) for CS. CS_D The Simpson's Index of Diversity (\(D\)) for CS. CS_D.max The Maximum Simpson's Index of Diversity (\(D_{max}\)) for CS. CS_D.inv The Simpson's Reciprocal Index (\(D_{R}\)) for CS. CS_D.rel The Relative Reciprocal Index (\(D'\)) for CS. CS_d.V The variance of \(d\) for CS according to (Simpson 1949). CS_d.boot.V The bootstrap variance of \(d\) for CS. d.t.df The degrees of freedom for t test. d.t.stat The t statistic. d.t.pvalue The p value for t test. d.t.significance The significance of t test for t-test d.boot.z.df The degrees of freedom for bootstrap z score. d.boot.z.stat The bootstrap z score. d.boot.z.pvalue The p value of z score. d.boot.z.significance The significance of z score.
`shannon`	Trait The qualitative trait. EC_No.Classes The number of classes in the trait for EC. CS_No.Classes The number of classes in the trait for CS. EC_I The Shannon-Weaver Diversity Index (\(I\)) for EC. EC_I.max The Maximum Shannon-Weaver Diversity Index (\(I_{max}\)) for EC. EC_I.rel The Relative Shannon-Weaver Diversity Index (\(I'\)) for EC. EC_I.V The variance of \(I\) for EC according to (Hutcheson 1970). EC_I.boot.V The bootstrap variance of \(I\) for EC. CS_I The Shannon-Weaver Diversity Index (\(I\)) for CS. CS_I.max The Maximum Shannon-Weaver Diversity Index (\(I_{max}\)) for CS. CS_I.rel The Relative Shannon-Weaver Diversity Index (\(I'\)) for CS. CS_I.V The variance of \(I\) for CS according to (Hutcheson 1970). CS_I.boot.V The bootstrap variance of \(I\) for CS. I.t.stat The t statistic. I.t.df The degrees of freedom for t test. I.t.pvalue The p value for t test. I.t.significance The significance of t test for t-test I.boot.z.df The degrees of freedom for bootstrap z score. I.boot.z.stat The bootstrap z score. I.boot.z.pvalue The p value of z score. I.boot.z.significance The significance of z score.
`mcintosh`	EC_No.Classes The number of classes in the trait for EC. CS_No.Classes The number of classes in the trait for CS. EC_D.Mc The McIntosh Index (\(D_{Mc}\)) for EC. CS_D.Mc The McIntosh Index (\(D_{Mc}\)) for CS. M.boot.z.stat The bootstrap z score. M.boot.z.df The degrees of freedom for bootstrap z score. M.boot.z.pvalue The p value of z score. M.boot.z.significance The significance of z score.

Details

The diversity indices and the corresponding statistical tests implemented in diversity.evaluate.core are as follows.

Simpson's and related indices

Simpson's index (\(d\)) which estimates the probability that two accessions randomly selected will belong to the same phenotypic class of a trait, is computed as follows (Simpson 1949; Peet 1974).

\[d = \sum_{i = 1}^{k}p_{i}^{2}\]

Where, \(p_{i}\) denotes the proportion/fraction/frequency of accessions in the \(i\)th phenotypic class for a trait and \(k\) is the number of phenotypic classes for the trait.

The value of \(d\) can range from 0 to 1 with 0 representing maximum diversity and 1, no diversity.

\(d\) is subtracted from 1 to give Simpson's index of diversity (\(D\)) (Greenberg 1956; Berger and Parker 1970; Peet 1974; Hennink and Zeven 1990) originally suggested by Gini (1912, 1912) and described in literature as Gini's diversity index or Gini-Simpson index. It is the same as Nei's diversity index or Nei's variation index (Nei 1973; Hennink and Zeven 1990). Greater the value of \(D\), greater the diversity with a range from 0 to 1.

\[D = 1 - d\]

The maximum value of \(D\), \(D_{max}\) occurs when accessions are uniformly distributed across the phenotypic classes and is computed as follows (Hennink and Zeven 1990).

\[D_{max} = 1 - \frac{1}{k}\]

Reciprocal of \(d\) gives the Simpson's reciprocal index (\(D_{R}\)) (Williams 1964; Hennink and Zeven 1990) and can range from 1 to \(k\). This was also described in Hill (1973) as (\(N_{2}\)).

\[D_{R} = \frac{1}{d}\]

Relative Simpson's index of diversity or Relative Nei's diversity/variation index (\(H'\)) (Hennink and Zeven 1990) is defined as follows (Peet 1974).

\[D' = \frac{D}{D_{max}}\]

Differences in Simpson's diversity index for qualitative traits of EC and CS can be tested by a t-test using the associated variance estimate described in Simpson (1949) (Lyons and Hutcheson 1978).

The t statistic is computed as follows.

\[t = \frac{d_{EC} - d_{CS}}{\sqrt{V_{d_{EC}} + V_{d_{CS}}}}\]

Where, the variance of \(d\) (\(V_{d}\)) is,

\[V_{d} = \frac{4N(N-1)(N-2)\sum_{i=1}^{k}(p_{i})^{3} + 2N(N-1)\sum_{i=1}^{k}(p_{i})^{2} - 2N(N-1)(2N-3) \left( \sum_{i=1}^{k}(p_{i})^{2} \right)^{2}}{[N(N-1)]^{2}}\]

The associated degrees of freedom is computed as follows.

\[df = (k_{EC} - 1) + (k_{CS} - 1)\]

Where, \(k_{EC}\) and \(k_{CS}\) are the number of phenotypic classes in the trait for EC and CS respectively.

Shannon-Weaver and related indices

An index of information \(H\), was described by Shannon and Weaver (1949) as follows.

\[H = -\sum_{i=1}^{k}p_{i} \log_{2}(p_{i})\]

\(H\) is described as Shannon or Shannon-Weaver or Shannon-Weiner diversity index in literature.

Alternatively, \(H\) is also computed using natural logarithm instead of logarithm to base 2.

\[H = -\sum_{i=1}^{k}p_{i} \ln(p_{i})\]

The maximum value of \(H\) (\(H_{max}\)) is \(\ln(k)\). This value occurs when each phenotypic class for a trait has the same proportion of accessions.

\[H_{max} = \log_{2}(k)\;\; \textrm{OR} \;\; H_{max} = \ln(k)\]

The relative Shannon-Weaver diversity index or Shannon equitability index (\(H'\)) is the Shannon diversity index (\(I\)) divided by the maximum diversity (\(H_{max}\)).

\[H' = \frac{H}{H_{max}}\]

Differences in Shannon-Weaver diversity index for qualitative traits of EC and CS can be tested by Hutcheson t-test (Hutcheson 1970).

The Hutcheson t statistic is computed as follows.

\[t = \frac{H_{EC} - H_{CS}}{\sqrt{V_{H_{EC}} + V_{H_{CS}}}}\]

Where, the variance of \(H\) (\(V_{H}\)) is,

\[V_{H} = \frac{\sum_{i=1}^{k}n_{i}(\log_{2}{n_{i}})^{2} \frac{(\sum_{i=1}^{k}\log_{2}{n_{i}})^2}{N}}{N^{2}}\] \[\textrm{OR}\] \[V_{H} = \frac{\sum_{i=1}^{k}n_{i}(\ln{n_{i}})^{2} \frac{(\sum_{i=1}^{k}\ln{n_{i}})^2}{N}}{N^{2}}\]

The associated degrees of freedom is approximated as follows.

\[df = \frac{(V_{H_{EC}} + V_{H_{CS}})^{2}}{\frac{V_{H_{EC}}^{2}}{N_{EC}} + \frac{V_{H_{CS}}^{2}}{N_{CS}}}\]

McIntosh Diversity Index

A similar index of diversity was described by McIntosh (1967) as follows (\(D_{Mc}\)) (Peet 1974).

\[D_{Mc} = \frac{N - \sqrt{\sum_{i=1}^{k}n_{i}^2}}{N - \sqrt{N}}\]

Where, \(n_{i}\) denotes the number of accessions in the \(i\)th phenotypic class for a trait and \(N\) is the total number of accessions so that \(p_{i} = {n_{i}}/{N}\).

Testing for difference with bootstrapping

Bootstrap statistics are employed to test the difference between the Simpson, Shannon-Weaver and McIntosh indices for qualitative traits of EC and CS (Solow 1993).

If \(I_{EC}\) and \(I_{CS}\) are the diversity indices with the original number of accessions, then random samples of the same size as the original are repeatedly generated (with replacement) \(R\) times and the corresponding diversity index is computed for each sample.

\[I_{EC}^{*} = \lbrace H_{EC_{1}}, H_{EC_{}}, \cdots, H_{EC_{R}} \rbrace\] \[I_{CS}^{*} = \lbrace H_{CS_{1}}, H_{CS_{}}, \cdots, H_{CS_{R}} \rbrace\]

Then the bootstrap null sample \(I_{0}\) is computed as follows.

\[\Delta^{*} = I_{EC}^{*} - I_{CS}^{*}\] \[I_{0} = \Delta^{*} - \overline{\Delta^{*}}\]

Where, \(\overline{\Delta^{*}}\) is the mean of \(\Delta^{*}\).

Now the original difference in diversity indices (\(\Delta_{0} = I_{EC} - I_{CS}\)) is tested against mean of bootstrap null sample (\(I_{0}\)) by a z test. The z score test statistic is computed as follows.

\[z = \frac{\Delta_{0} - \overline{H_{0}}}{\sqrt{V_{H_{0}}}}\]

Where, \(\overline{H_{0}}\) and \(V_{H_{0}}\) are the mean and variance of the bootstrap null sample \(H_{0}\).

The corresponding degrees of freedom is estimated as follows.

\[df = (k_{EC} - 1) + (k_{CS} - 1)\]

References

Berger WH, Parker FL (1970). “Diversity of planktonic foraminifera in deep-sea sediments.” Science, 168(3937), 1345–1347.

Gini C (1912). Variabilita e Mutabilita. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche. [Fasc. I.]. Tipogr. di P. Cuppini, Bologna.

Gini C (1912). “Variabilita e mutabilita.” In Pizetti E, Salvemini T (eds.), Memorie di Metodologica Statistica. Liberia Eredi Virgilio Veschi, Roma, Italy.

Greenberg JH (1956). “The measurement of linguistic diversity.” Language, 32(1), 109.

Hennink S, Zeven AC (1990). “The interpretation of Nei and Shannon-Weaver within population variation indices.” Euphytica, 51(3), 235–240.

Hill MO (1973). “Diversity and evenness: A unifying notation and its consequences.” Ecology, 54(2), 427–432.

Hutcheson K (1970). “A test for comparing diversities based on the Shannon formula.” Journal of Theoretical Biology, 29(1), 151–154.

Lyons NI, Hutcheson K (1978). “C20. Comparing diversities: Gini's index.” Journal of Statistical Computation and Simulation, 8(1), 75–78.

McIntosh RP (1967). “An index of diversity and the relation of certain concepts to diversity.” Ecology, 48(3), 392–404.

Nei M (1973). “Analysis of gene diversity in subdivided populations.” Proceedings of the National Academy of Sciences, 70(12), 3321–3323.

Peet RK (1974). “The measurement of species diversity.” Annual Review of Ecology and Systematics, 5(1), 285–307.

Shannon CE, Weaver W (1949). The Mathematical Theory of Communication, number v. 2 in The Mathematical Theory of Communication. University of Illinois Press.

Simpson EH (1949). “Measurement of diversity.” Nature, 163(4148), 688–688.

Solow AR (1993). “A simple test for change in community structure.” The Journal of Animal Ecology, 62(1), 191.

Williams CB (1964). Patterns in the Balance of Nature and Related Problems in Quantitative Ecology. Academic Press.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))


diversity.evaluate.core(data = ec, names = "genotypes",
                        qualitative = qual, selected = core)


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))


diversity.evaluate.core(data = ec, names = "genotypes",
                        qualitative = qual, selected = core)

Frequency Distribution Histogram

Description

Plot stacked frequency distribution histogram to graphically compare the probability distributions of traits between entire collection (EC) and core set (CS).

Usage

freqdist.evaluate.core(
  data,
  names,
  quantitative,
  qualitative,
  selected,
  highlight = NULL,
  include.highlight = TRUE,
  highlight.se = NULL,
  highlight.col = "red"
)
freqdist.evaluate.core(
  data,
  names,
  quantitative,
  qualitative,
  selected,
  highlight = NULL,
  include.highlight = TRUE,
  highlight.se = NULL,
  highlight.col = "red"
)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.
`highlight`	Individual names to be highlighted as a character vector.
`include.highlight`	If `TRUE`, the highlighted individuals are included in the frequency distribution histogram. Default is `TRUE`.
`highlight.se`	Optional data frame of standard errors for the individuals specified in `highlight`. It should have the same column names as in `data`.
`highlight.col`	The colour(s) to be used to highlighting individuals in the plot as a character vector of the same length as `highlight`. Must be valid colour values in R (named colours, hexadecimal representation, index of colours [`1:8`] in default R `palette()` etc.).

Value

A list with the ggplot objects of stacked frequency distribution histograms plots for each trait specified as quantitative and qualitative.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))


freqdist.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant, qualitative = qual,
                       selected = core)

checks <- c("TMe-1199", "TMe-1957", "TMe-3596", "TMe-3392")

freqdist.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant, qualitative = qual,
                       selected = core,
                       highlight = checks, highlight.col = "red")

quant.se <- data.frame(genotypes = checks,
                       NMSR = c(0.107, 0.099, 0.106, 0.062),
                       TTRN = c(0.081, 0.072, 0.057, 0.049),
                       TFWSR = c(0.089, 0.031, 0.092, 0.097),
                       TTRW = c(0.064, 0.031, 0.071, 0.071),
                       TFWSS = c(0.106, 0.071, 0.121, 0.066),
                       TTSW = c(0.084, 0.045, 0.066, 0.054),
                       TTPW = c(0.098, 0.052, 0.111, 0.082),
                       AVPW = c(0.074, 0.038, 0.054, 0.061),
                       ARSR = c(0.104, 0.019, 0.204, 0.044),
                       SRDM = c(0.078, 0.138, 0.076, 0.079))

freqdist.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant,
                       selected = core,
                       highlight = checks, highlight.col = "red",
                       highlight.se = quant.se)


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))


freqdist.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant, qualitative = qual,
                       selected = core)

checks <- c("TMe-1199", "TMe-1957", "TMe-3596", "TMe-3392")

freqdist.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant, qualitative = qual,
                       selected = core,
                       highlight = checks, highlight.col = "red")

quant.se <- data.frame(genotypes = checks,
                       NMSR = c(0.107, 0.099, 0.106, 0.062),
                       TTRN = c(0.081, 0.072, 0.057, 0.049),
                       TFWSR = c(0.089, 0.031, 0.092, 0.097),
                       TTRW = c(0.064, 0.031, 0.071, 0.071),
                       TFWSS = c(0.106, 0.071, 0.121, 0.066),
                       TTSW = c(0.084, 0.045, 0.066, 0.054),
                       TTPW = c(0.098, 0.052, 0.111, 0.082),
                       AVPW = c(0.074, 0.038, 0.054, 0.061),
                       ARSR = c(0.104, 0.019, 0.204, 0.044),
                       SRDM = c(0.078, 0.138, 0.076, 0.079))

freqdist.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant,
                       selected = core,
                       highlight = checks, highlight.col = "red",
                       highlight.se = quant.se)

Interquartile Range

Description

Compute the Interquartile Range (IQR) (Upton and Cook 1996) to compare quantitative traits of the entire collection (EC) and core set (CS).

Usage

iqr.evaluate.core(data, names, quantitative, selected)
iqr.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A data frame with the IQR values of the EC and CS for the traits specified as quantitative.

References

Upton G, Cook I (1996). “General summary statistics.” In Understanding statistics. Oxford University Press.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

iqr.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

iqr.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

Levene's Test

Description

Test for of variances of the entire collection (EC) and core set (CS) for quantitative traits by Levene's test (Levene 1960).

Usage

levene.evaluate.core(data, names, quantitative, selected)
levene.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A data frame with the following columns

`Trait`	The quantitative trait.
`EC_V`	The variance of the EC.
`CS_V`	The variance of the CS.
`EC_CV`	The coefficient of variance of the EC.
`CS_CV`	The coefficient of variance of the CS.
`Levene_Fvalue`	The test statistic.
`Levene_pvalue`	The p value for the test statistic.
`Levene_significance`	The significance of the test statistic (: p \(\leq\) 0.01; *: p \(\leq\) 0.05; ns: p \( > \) 0.05).

References

Levene H (1960). “Robust tests for equality of variances.” In Olkin I, Ghurye SG, Hoeffding W, Madow WG, Mann HB (eds.), Contribution to Probability and Statistics: Essays in Honor of Harold Hotelling, 278–292. Stanford University Press, Palo Alto, CA.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

levene.evaluate.core(data = ec, names = "genotypes",
                     quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

levene.evaluate.core(data = ec, names = "genotypes",
                     quantitative = quant, selected = core)

Principal Component Analysis

Description

Compute Principal Component Analysis Statistics (Mardia et al. 1979) to compare the probability distributions of quantitative traits between entire collection (EC) and core set (CS).

Usage

pca.evaluate.core(
  data,
  names,
  quantitative,
  selected,
  center = TRUE,
  scale = TRUE,
  npc.plot = 6
)
pca.evaluate.core(
  data,
  names,
  quantitative,
  selected,
  center = TRUE,
  scale = TRUE,
  npc.plot = 6
)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.
`center`	either a logical value or numeric-alike vector of length equal to the number of columns of `x`, where ‘numeric-alike’ means that `as.numeric(.)` will be applied successfully if `is.numeric(.)` is not true.
`scale`	either a logical value or a numeric-alike vector of length equal to the number of columns of `x`.
`npc.plot`	The number of principal components for which eigen values are to be plotted. The default value is 6.

Value

A list with the following components.

`EC PC Importance`	A data frame of importance of principal components for EC
`EC PC Loadings`	A data frame with eigen vectors of principal components for EC
`CS PC Importance`	A data frame of importance of principal components for CS
`CS PC Loadings`	A data frame with eigen vectors of principal components for CS
`Scree Plot`	The scree plot of principal components for EC and CS as a `ggplot` object.
`PC Loadings Plot`	A plot of the eigen vector values of principal components for EC and CS as specified by `npc.plot` as a `ggplot2` object.

References

Mardia KV, Kent JT, Bibby JM (1979). Multivariate analysis. Academic Press, London; New York. ISBN 0-12-471250-9 978-0-12-471250-8 0-12-471252-5 978-0-12-471252-2.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

pca.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core,
                  center = TRUE, scale = TRUE, npc.plot = 4)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

pca.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core,
                  center = TRUE, scale = TRUE, npc.plot = 4)

Distance Between Probability Distributions

Description

Compute Kullback-Leibler (Kullback and Leibler 1951), Kolmogorov-Smirnov (Kolmogorov 1933; Smirnov 1948) and Anderson-Darling distances (Anderson and Darling 1952) between the probability distributions of collection (EC) and core set (CS) for quantitative traits.

Usage

pdfdist.evaluate.core(data, names, quantitative, selected)
pdfdist.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A data frame with the following columns.

`Trait`	The quantitative trait.
`KL_Distance`	The Kullback-Leibler distance (Kullback and Leibler 1951) between EC and CS.
`KS_Distance`	The Kolmogorov-Smirnov distance (Kolmogorov 1933; Smirnov 1948) between EC and CS.
`KS_pvalue`	The p value of the Kolmogorov-Smirnov distance.
`AD_Distance`	Anderson-Darling distance (Anderson and Darling 1952) between EC and CS.
`AD_pvalue`	The p value of the Anderson-Darling distance.
`KS_significance`	The significance of the Kolmogorov-Smirnov distance (: p \(\leq\) 0.01; *: p \(\leq\) 0.05; ns: p \(>\) 0.05).
`AD_pvalue`	The significance of the Anderson-Darling distance (: p \(\leq\) 0.01; *: p \(\leq\) 0.05; ns: p \(>\) 0.05).

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

pdfdist.evaluate.core(data = ec, names = "genotypes",
                      quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

pdfdist.evaluate.core(data = ec, names = "genotypes",
                      quantitative = quant, selected = core)

Percentage Difference of Means and Variances

Description

Compute the following differences between the entire collection (EC) and core set (CS).

Percentage of significant differences of mean (\(MD\%_{Hu}\)) (Hu et al. 2000)
Percentage of significant differences of variance (\(VD\%_{Hu}\)) (Hu et al. 2000)
Average of absolute differences between means (\(MD\%_{Kim}\)) (Kim et al. 2007)
Average of absolute differences between variances (\(VD\%_{Kim}\)) (Kim et al. 2007)
Percentage difference between the mean squared Euclidean distance among accessions (\(\overline{d}D\%\)) (Studnicki et al. 2013)
Percentage of range ratios smaller than 0.70 (\(S_{RR_{0.7}}\)) (Diwan et al. 1995)

Usage

percentdiff.evaluate.core(
  data,
  names,
  quantitative,
  selected,
  alpha = 0.05,
  rr.crit = 0.7
)
percentdiff.evaluate.core(
  data,
  names,
  quantitative,
  selected,
  alpha = 0.05,
  rr.crit = 0.7
)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.
`alpha`	Type I error probability (Significance level) of difference.
`rr.crit`	The critical value of range ratio considered to be acceptable for a representative CS. The default value is 0.7.

Details

The differences are computed as follows.

\[MD\%_{Hu} = \left ( \frac{S_{t}}{n} \right ) \times 100\]

Where, \(S_{t}\) is the number of traits with a significant difference between the means of the EC and the CS and \(n\) is the total number of traits. A representative core should have \(MD\%_{Hu}\) < 20 % and \(CR\) > 80 % (Hu et al. 2000).

\[VD\%_{Hu} = \left ( \frac{S_{F}}{n} \right ) \times 100\]

Where, \(S_{F}\) is the number of traits with a significant difference between the variances of the EC and the CS and \(n\) is the total number of traits. Larger \(VD\%_{Hu}\) value indicates a more diverse core set.

\[MD\%_{Kim} = \left ( \frac{1}{n}\sum_{i=1}^{n} \frac{\left | M_{EC_{i}}-M_{CS_{i}} \right |}{M_{CS_{i}}} \right ) \times 100\]

Where, \(M_{EC_{i}}\) is the mean of the EC for the \(i\)th trait, \(M_{CS_{i}}\) is the mean of the CS for the \(i\)th trait and \(n\) is the total number of traits.

\[VD\%_{Kim} = \left ( \frac{1}{n}\sum_{i=1}^{n} \frac{\left | V_{EC_{i}}-V_{CS_{i}} \right |}{V_{CS_{i}}} \right ) \times 100\]

Where, \(V_{EC_{i}}\) is the variance of the EC for the \(i\)th trait, \(V_{CS_{i}}\) is the variance of the CS for the \(i\)th trait and \(n\) is the total number of traits.

\[\overline{d}D\% = \frac{\overline{d}_{CS}-\overline{d}_{EC}}{\overline{d}_{EC}} \times 100\]

Where, \(\overline{d}_{CS}\) is the mean squared Euclidean distance among accessions in the CS and \(\overline{d}_{EC}\) is the mean squared Euclidean distance among accessions in the EC.

Percentage of range ratios smaller than 0.70 (Diwan et al. 1995) is computed as follows.

\[RR\%_{0.7} = \left ( \frac{S_{RR_{0.7}}}{n} \right ) \times 100\]

Where, \(S_{RR_{0.7}}\) is the number of traits with a range ratio smaller than 0.7 (\(\frac{R_{CS_{i}}}{R_{EC_{i}}} < 0.7\)) \(R_{CS_{i}}\) is the range of the \(i\)th trait in the CS, \(R_{EC_{i}}\) is the range of the \(i\)th trait in the EC and \(n\) is the total number of traits.

Value

A data frame with the values of \(MD\%_{Hu}\), \(VD\%_{Hu}\), \(MD\%_{Kim}\), \(VD\%_{Kim}\) and \(\overline{d}D\%\).

References

Diwan N, McIntosh MS, Bauchan GR (1995). “Methods of developing a core collection of annual Medicago species.” Theoretical and Applied Genetics, 90(6), 755–761.

Hu J, Zhu J, Xu HM (2000). “Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops.” Theoretical and Applied Genetics, 101(1), 264–268.

Kim K, Chung H, Cho G, Ma K, Chandrabalan D, Gwag J, Kim T, Cho E, Park Y (2007). “PowerCore: A program applying the advanced M strategy with a heuristic search for establishing core sets.” Bioinformatics, 23(16), 2155–2162.

Studnicki M, Madry W, Schmidt J (2013). “Comparing the efficiency of sampling strategies to establish a representative in the phenotypic-based genetic diversity core collection of orchardgrass (Dactylis glomerata L.).” Czech Journal of Genetics and Plant Breeding, 49(1), 36–47.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

percentdiff.evaluate.core(data = ec, names = "genotypes",
                          quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

percentdiff.evaluate.core(data = ec, names = "genotypes",
                          quantitative = quant, selected = core)

Quantile-Quantile Plots

Description

Plot Quantile-Quantile (QQ) plots (Wilk and Gnanadesikan 1968) to graphically compare the probability distributions of quantitative traits between entire collection (EC) and core set (CS).

Usage

qq.evaluate.core(
  data,
  names,
  quantitative,
  selected,
  annotate = c("none", "kl", "ks", "ad")
)
qq.evaluate.core(
  data,
  names,
  quantitative,
  selected,
  annotate = c("none", "kl", "ks", "ad")
)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.
`annotate`	Adds the divergence/distance value between probability distributions of CS and EC as an annotation to the QQ plot. Either `"none"` (no annotation (Default)) or `"kl"` (Kullback-Leibler divergence) or `"ks"` (Kolmogorov-Smirnov distance) or `"ad"` (Anderson-Darling distance).

Value

A list with the ggplot objects of QQ plots of CS vs EC for each trait specified as quantitative.

References

Wilk MB, Gnanadesikan R (1968). “Probability plotting methods for the analysis for the analysis of data.” Biometrika, 55(1), 1–17.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core)

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core, annotate = "kl")

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core, annotate = "ks")

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core, annotate = "ad")

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core)

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core, annotate = "kl")

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core, annotate = "ks")

qq.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core, annotate = "ad")

Ratio of Phenotype Retained

Description

Compute the Ratio of Phenotype Retained (\(RPR\)) (Li et al. 2002) to compare qualitative traits between entire collection (EC) and core set (CS).

Usage

rpr.evaluate.core(data, names, qualitative, selected)
rpr.evaluate.core(data, names, qualitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Details

Ratio of Phenotype Retained (\(RPR\)) (Kim et al. 2007) is computed as follows.

\[RPR = \frac{\sum_{i=1}^{n} k_{CS_{i}}}{\sum_{i=1}^{n} k_{EC_{i}}}\]

Value

The Ratio of Phenotype Retained value.

References

Kim K, Chung H, Cho G, Ma K, Chandrabalan D, Gwag J, Kim T, Cho E, Park Y (2007). “PowerCore: A program applying the advanced M strategy with a heuristic search for establishing core sets.” Bioinformatics, 23(16), 2155–2162.

Li Z, Zhang H, Zeng Y, Yang Z, Shen S, Sun C, Wang X (2002). “Studies on sampling schemes for the establishment of corecollection of rice landraces in Yunnan, China.” Genetic Resources and Crop Evolution, 49(1), 67–74.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

rpr.evaluate.core(data = ec, names = "genotypes",
                  qualitative = qual, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

rpr.evaluate.core(data = ec, names = "genotypes",
                  qualitative = qual, selected = core)

Synthetic Variation Coefficient

Description

Compute the Synthetic Variation Coefficient (\(CV\%\)) (Dong 1998; Dong et al. 2001) to compare quantitative traits of the entire collection (EC) and core set (CS).

Usage

scv.evaluate.core(data, names, quantitative, selected)
scv.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Details

Synthetic Variation Coefficient (\(CV\%\)) (Dong 1998; Dong et al. 2001) is computed as follows for the core set (CS).

\[CV(\%) = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{SE_{j}}{\mu_{i}} \right ) \times 100\]

Where, \(SE_{i}\) is the standard error of the \(i\)th trait, \(\mu_{i}\) is the mean of the \(i\)th trait and \(n\) is the total number of traits.

Value

The Synthetic Variation Coefficient values for EC and CS

References

Dong YS (1998). “Exploration on genetic diversity center for cultivated soybean in China.” Chinese Crops Journal, 1, 18–19.

Dong YS, Zhuang BC, Zhao LM, Sun H, He MY (2001). “The genetic diversity of annual wild soybeans grown in China.” Theoretical and Applied Genetics, 103(1), 98–103.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                      function(x) factor(as.factor(x)))

scv.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                      function(x) factor(as.factor(x)))

scv.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

Sign Test

Description

Test difference between means and variances of entire collection (EC) and core set (CS) for quantitative traits by Sign test (\(+\) versus \(-\)) (Basigalup et al. 1995; Tai and Miller 2001).

Usage

signtest.evaluate.core(data, names, quantitative, selected)
signtest.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Details

The test statistic for Sign test (\(\chi^{2}\)) is computed as follows.

\[\chi^{2} = \frac{(N_{1}-N_{2})^{2}}{N_{1}+N_{2}}\]

Where, where \(N_{1}\) is the number of variables for which the mean or variance of the CS is greater than the mean or variance of the EC (number of \(+\) signs); \(N_{2}\) is the number of variables for which the mean or variance of the CS is less than the mean or variance of the EC (number of \(-\) signs). The value of \(\chi^{2}\) is compared with a Chi-square distribution with 1 degree of freedom.

Value

A data frame with the following components.

`Comparison`	The comparison measure.
`ChiSq`	The test statistic (\(\chi^{2}\)).
`p.value`	The p value for the test statistic.
`significance`	The significance of the test statistic (: p \(\leq\) 0.01; *: p \(\leq\) 0.05; ns: p \( > \) 0.05).

References

Basigalup DH, Barnes DK, Stucker RE (1995). “Development of a core collection for perennial Medicago plant introductions.” Crop Science, 35(4), 1163–1168.

Tai PYP, Miller JD (2001). “A Core Collection for Saccharum spontaneum L. from the World Collection of Sugarcane.” Crop Science, 41(3), 879–885.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

signtest.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

signtest.evaluate.core(data = ec, names = "genotypes",
                       quantitative = quant, selected = core)

Student-Newman-Keuls Test

Description

Test difference between means of entire collection (EC) and core set (CS) for quantitative traits by Newman-Keuls or Student-Newman-Keuls test (Newman 1939; Keuls 1952).

Usage

snk.evaluate.core(data, names, quantitative, selected)
snk.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

A data frame with the following components.

`Trait`	The quantitative trait.
`EC_Min`	The minimum value of the trait in EC.
`EC_Max`	The maximum value of the trait in EC.
`EC_Mean`	The mean value of the trait in EC.
`EC_SE`	The standard error of the trait in EC.
`CS_Min`	The minimum value of the trait in CS.
`CS_Max`	The maximum value of the trait in CS.
`CS_Mean`	The mean value of the trait in CS.
`CS_SE`	The standard error of the trait in CS.
`SNK_pvalue`	The p value of the Student-Newman-Keuls test for equality of means of EC and CS.
`SNK_significance`	The significance of the Student-Newman-Keuls test for equality of means of EC and CS.

References

Keuls M (1952). “The use of the ,,studentized range" in connection with an analysis of variance.” Euphytica, 1(2), 112–122.

Newman D (1939). “The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation.” Biometrika, 31(1-2), 20–30.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

snk.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

snk.evaluate.core(data = ec, names = "genotypes",
                  quantitative = quant, selected = core)

Student's t Test

Description

Test difference between means of entire collection (EC) and core set (CS) for quantitative traits by Student's t test (Student 1908).

Usage

ttest.evaluate.core(data, names, quantitative, selected)
ttest.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

`Trait`	The quantitative trait.
`EC_Min`	The minimum value of the trait in EC.
`EC_Max`	The maximum value of the trait in EC.
`EC_Mean`	The mean value of the trait in EC.
`EC_SE`	The standard error of the trait in EC.
`CS_Min`	The minimum value of the trait in CS.
`CS_Max`	The maximum value of the trait in CS.
`CS_Mean`	The mean value of the trait in CS.
`CS_SE`	The standard error of the trait in CS.
`ttest_pvalue`	The p value of the Student's t test for equality of means of EC and CS.
`ttest_significance`	The significance of the Student's t test for equality of means of EC and CS.

References

Student (1908). “The probable error of a mean.” Biometrika, 6(1), 1–25.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

ttest.evaluate.core(data = ec, names = "genotypes",
                    quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

ttest.evaluate.core(data = ec, names = "genotypes",
                    quantitative = quant, selected = core)

Variance of Phenotypic Frequency

Description

Compute the Variance of Phenotypic Frequency (\(VPF\)) (Li et al. 2002) to compare qualitative traits between entire collection (EC) and core set (CS).

Usage

vpf.evaluate.core(data, names, qualitative, selected)
vpf.evaluate.core(data, names, qualitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`qualitative`	Name of columns with the qualitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Details

Variance of Phenotypic Frequency (\(VPF\)) (Li et al. 2002) is computed as follows.

\[VPF = \frac{1}{n} \sum_{i=1}^{n}\left ( \frac{\sum_{j=1}^{k} (p_{ij} - \overline{p_{i}})^{2}}{k - 1} \right )\]

Where, \(p_{ij}\) denotes the proportion/fraction/frequency of accessions in the \(i\)th phenotypic class for the \(i\)th trait, \(\overline{p_{i}}\) is the mean frequency of phenotypic classes for the \(i\)th trait, \(k\) is the number of phenotypic classes for the \(i\)th trait and \(n\) is the total number of traits.

Value

The Variance of Phenotypic Frequency values for EC and CS.

References

Li Z, Zhang H, Zeng Y, Yang Z, Shen S, Sun C, Wang X (2002). “Studies on sampling schemes for the establishment of corecollection of rice landraces in Yunnan, China.” Genetic Resources and Crop Evolution, 49(1), 67–74.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

vpf.evaluate.core(data = ec, names = "genotypes",
                  qualitative = qual, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

vpf.evaluate.core(data = ec, names = "genotypes",
                  qualitative = qual, selected = core)

Variable Rate of Coefficient of Variation

Description

Compute the Variable Rate of Coefficient of Variation (\(VR\)) (Hu et al. 2000) to compare quantitative traits of the entire collection (EC) and core set (CS).

Usage

vr.evaluate.core(data, names, quantitative, selected)
vr.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Details

The Variable Rate of Coefficient of Variation (\(VR\)) is computed as follows.

\[VR = \left ( \frac{1}{n} \sum_{i=1}^{n} \frac{CV_{CS_{i}}}{CV_{EC_{i}}} \right ) \times 100\]

Where, \(CV_{CS_{i}}\) is the coefficients of variation for the \(i\)th trait in the CS, \(CV_{EC_{i}}\) is the coefficients of variation for the \(i\)th trait in the EC and \(n\) is the total number of traits

Value

The \(VR\) value.

References

Hu J, Zhu J, Xu HM (2000). “Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops.” Theoretical and Applied Genetics, 101(1), 264–268.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

vr.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

vr.evaluate.core(data = ec, names = "genotypes",
                 quantitative = quant, selected = core)

Wilcoxon Rank Sum Test

Description

Compare the medians of quantitative traits between entire collection (EC) and core set (CS) by Wilcoxon rank sum test or Mann-Whitney-Wilcoxon test or Mann-Whitney U test (Wilcoxon 1945; Mann and Whitney 1947).

Usage

wilcox.evaluate.core(data, names, quantitative, selected)
wilcox.evaluate.core(data, names, quantitative, selected)

Arguments

`data`	The data as a data frame object. The data frame should possess one row per individual and columns with the individual names and multiple trait/character data.
`names`	Name of column with the individual names as a character string.
`quantitative`	Name of columns with the quantitative traits as a character vector.
`selected`	Character vector with the names of individuals selected in core collection and present in the `names` column.

Value

`Trait`	The quantitative trait.
`EC_Med`	The median value of the trait in EC.
`CS_Med`	The median value of the trait in CS.
`Wilcox_pvalue`	The p value of the Wilcoxon test for equality of medians of EC and CS.
`Wilcox_significance`	The significance of the Wilcoxon test for equality of medians of EC and CS.

References

Mann HB, Whitney DR (1947). “On a test of whether one of two random variables is stochastically larger than the other.” The Annals of Mathematical Statistics, 18(1), 50–60.

Wilcoxon F (1945). “Individual comparisons by ranking methods.” Biometrics Bulletin, 1(6), 80.

Examples


data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

wilcox.evaluate.core(data = ec, names = "genotypes",
                     quantitative = quant, selected = core)

data("cassava_CC")
data("cassava_EC")

ec <- cbind(genotypes = rownames(cassava_EC), cassava_EC)
ec$genotypes <- as.character(ec$genotypes)
rownames(ec) <- NULL

core <- rownames(cassava_CC)

quant <- c("NMSR", "TTRN", "TFWSR", "TTRW", "TFWSS", "TTSW", "TTPW", "AVPW",
           "ARSR", "SRDM")
qual <- c("CUAL", "LNGS", "PTLC", "DSTA", "LFRT", "LBTEF", "CBTR", "NMLB",
          "ANGB", "CUAL9M", "LVC9M", "TNPR9M", "PL9M", "STRP", "STRC",
          "PSTR")

ec[, qual] <- lapply(ec[, qual],
                     function(x) factor(as.factor(x)))

wilcox.evaluate.core(data = ec, names = "genotypes",
                     quantitative = quant, selected = core)

Package 'EvaluateCore'

Help Index

Bar Plots

Description

Usage

Arguments

Value

See Also

Examples

Box Plots

Description

Usage

Arguments

Value

References

See Also

Examples

IITA Cassava Germplasm Data - Core Collection

Description

Usage

Format

Details

References

Examples

IITA Cassava Germplasm Data - Entire Collection

Description

Usage

Format

Details

References

Examples

Chi-squared Test for Homogeneity

Description

Usage

Arguments

Value

References

See Also

Examples

Phenotypic Correlations

Description

Usage

Arguments

Value

References

See Also

Examples

Class Coverage

Description

Usage

Arguments

Details

Value

References

Examples

Coincidence Rate of Range

Description

Usage

Arguments

Details

Value

Note

References

See Also

Examples

Distance Measures

Description

Usage

Arguments

Value

References

See Also

Examples

Diversity Indices

Description

Usage

Arguments

Value

Details

Simpson's and related indices