Empirical analysis of DGE - implementation parameters

The 'Empirical analysis of DGE' algorithm in the CLC Genomics Workbench is a re-implementation of the "Exact Test", available as part of the edgeR Bioconductor package.

The parameter values used in the CLC Genomics Workbench implementation are the default values for the equivalent parameters in the edgeR Bioconductor implementation in all but one case. The exception is the estimateCommonDisp tol parameter, where the default is more stringent than that of edgeR. The advantage of using a more stringent value for this parameter is that the results will be more accurate. The disadvantage is that the algorithm will be slightly slower, however according to our performance tests, this change has only a marginal impact on the run time of the tool.

The parameter values used in the CLC Genomics Workbench implementation, with reference to the edgeR function names for clarity, are provided in the table below.

Function in BioC package Parameter name Value used and comments
calcNormFactors method "TMM"
  refColumn NULL (automatically selected)
  logratioTrim 0.3
  sumTrim 0.05
  doWeighting TRUE
  Acutoff -1e10
estimateCommonDisp tol 1e-14 (default in edgeR: 1e-6)
  rowsum.filter Set by user in wizard ("Total count filter cutoff", default 5)
estimateTagewiseDisp prior.df 10
  trend "movingave"
  span NULL
  method "grid"
  grid.length 11
  grid.range c(-6, 6)
mglmOneGroup maxit 50
  tol 1e-10
aveLogCPM prior.count 2
  dispersion 0.05
exactTest pair Set by user in wizard ("Exact test comparisons")
  dispersion "auto" (tagwise if available, otherwise common)
  rejection.region "doubletail"
  big.count 900
  prior.count 0.125