Creating the Synthetic Data
We used compcodeR to investigate the different tools by first creating the synthetic data
using the built-in function, generateSyntheticData. For the distinct 11 simulated datasets,
we specified the parameters: `n.vars` = 12,500, `samples.per.cond` = 5,
`dispersions` = # |{g; 𝜙g = 0}| column in Figure 1. To produce the fraction of differentially
expressed genes that is upregulated in S2 compared to S1, `fraction.upregulated` = the ratios
shown in Figure 1 (i.e. 0.5 for B625625). For the single outlier fraction datasets,
`single.outlier.high.prob` = 0.05 (fraction of single outlier has unusually high counts)
and `single.outlier.low.prob` = 0.05 (fraction of single outlier has unusually low counts).
As for the random outlier fraction datasets, `random.outlier.high.prob` = 0.025 (fraction
of random outliers with unusually high counts) and `random.outlier.low.prob` = 0.025
(fraction of random outliers with unusually low counts).
Performing Differential Expression
For performing tools that were supported by compcodeR, we used the built-in function
called `runDiffExp` where we specify the `result.extent` parameter
as: DESeq2, edgeR, NOISeq, voom.limma, and ttest. ABSSeq and PoissonSeq, which are
not as commonly used, were not supported by compcodeR and needed to be run separately.
For both tools, the count matrix was extracted from the compcodeR object for each
dataset and labeled to distinguish between conditions based on the number of samples
per condition. Both were then run using default parameters and labeled 1 or 0 based
on a cutoff value of 0.05 for adjusted p-value.
ABSSeq
This tool performs differential expression analysis of RNAseq data by absolute counts difference between two groups, utilizing Negative binomial distribution and moderating fold-change according to heterogeneity of dispersion across expression level.
voom.limma
This tool performs differential expression analysis of RNAseq data (comparing two conditions) by applying the voom transformation (from the limma package) followed by differential expression analysis with a t-test. Voom precision weights unlock linear model analysis tools for RNA-seq read counts. Then, limma fits a linear model to the expression data for each gene
PoissonSeq
This tool estimates the sequencing depths of experiments using a new method based on Poisson goodness-of-fit statistic, and calculates a core statistic on the basis of a Poisson log-linear model, and then estimates the false discovery date using a modified version of the permutation plug-in method.
DESeq2
This tool performs differential gene expression analysis by estimating the variance-mean dependence in count data using the Negative Binomial Distribution. More specifically, DESeq2 estimates the size factors using the gene counts of the samples, then estimation of dispersion, and lastly performs Negative Binomial Distribution GLM fitting and Wald significance tests. For our purpose, we created a parametric type of fitting of the dispersions to the mean intensity and utilized the Wald test to perform differential gene analysis.
NOISeq
This tool performs differential gene expression analysis of RNA-seq expression data between two conditions without parametric assumptions. NOISeq models the noise distribution of count changes by contrasting fold-change differences and absolute expression differences for all the features in samples within the same condition. NOISeq has 3 main functions: performing quality control of count data, normalization and filter low-counts, and performing differential expression analysis.
ttest
This tool uses the edgeR package to perform differential expression analysis of RNAseq data (comparing two conditions) using a t-test, applied to the normalized counts.
edgeR.exact
This tool performs differential gene expression analysis of RNA-seq expression profiles with biological replication. This R package contains multiple statistical methodology based on the Negative Binomial Distributions, including empirical Bayes estimation, exact tests, generalized linear models, and quasi-likelihood tests. For our investigation, we implemented genewise exact tests for differences in the means between the 2 conditions of negative-binomially distributed counts