ToxPanel is a tool to
calculate gene-set scores based on fold-change (FC) values relative
to randomly selected sets of genes. We provide two different gene
sets: our own gene sets indicative of liver and kidney injuries
[1-4] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways
[5]. However, users can customize their own gene sets or, for
example, use gene sets from the Molecular Signatures Database
(MSigDB) [6].
Detailed descriptions and performance characteristics of the AFC
method can be found in the original literature [7, 8]. In this
method, the gene-set or KEGG pathway score is defined as the sum of
the log-transformed FC values of all genes in the set or pathway. We
then use the pathway scores to perform null hypothesis tests and
estimate the significance of each pathway by its p-value, defined as
the probability that the pathway score for a random data set is
greater than the score from the actual data. The z-score is the
number of standard deviations by which the actual gene-set value
differs from the mean of randomly selected FC values (10,000 times).
The sign of the gene-set score represents the direction of
regulation: the pathway is considered up-regulated (overexpressed
genes) if the net sum of the gene-expression levels after treatment
is increased relative to control and down-regulated (suppressed
genes) if decreased.
The AAFC method calculates the activation score of a gene set [2].
This method identifies gene sets (e.g., modules) that are
significantly changed or disrupted without considering the direction
of change. The method, which takes the absolute values of the
log-transformed FC values, performs well in identifying
significantly altered pathways [7]. Its drawback is the lack of
information about the direction of change in a pathway (whether it
is up- or down-regulated, i.e., if the sum of the activation scores
of genes in a pathway increases or decreases relative to control).
The AAFC method first reads a list of gene FC values uploaded by the
user and calculates the absolute value of the log-transformed FC
value for each gene. For each gene set (i.e., module or pathway), it
then sums all of the absolute values to calculate the total absolute
FC value. Subsequently, we use the gene-set scores to perform null
hypothesis tests and estimate the significance of each gene set by
its p-value, defined as the probability that the score for randomly
selected FC values (10,000 times) is greater than the score from the
actual gene set. A small p-value implies that the gene set value is
significant. The z-score is the number of standard deviations by
which the actual gene set value differs from the mean of the
randomly selected FC values. In the AAFC method, we are only
interested in positive z-score values, as negative z-score values
indicate FC values smaller than the absolute average FC value.
We have provided an option to include a significance value (i.e., p,
q, or false discovery rate) associated with each log-transformed FC
value in the input file. These values can be used to calculate a
combined p-value for each gene set (module), using Fisher’s
probability test as an indicator of robustness of the reliability of
the gene set [9].
Fisher’s method combines the p-values from k independent tests into
the test statistic
, which has
a χ2 distribution with 2k degrees of freedom. Fisher’s
combined p-value is then 1-ϕ(T), where ϕ is the cumulative
distribution function of the χ2 distribution.