The report output from Single Cell ATAC-Seq Analysis

The report contains the following sections:

Reads

For each sample, the following information is shown:

Comparing these values across samples may reveal biases. For example, if control samples have more reads than case samples, then one might expect to see a higher proportion of cells for each peak for the control samples.

Fragments

A single fragment size distribution plot is shown for all the data. This plot has a characteristic shape for scATAC-Seq data, as seen in figure 12.1. The absence of this shape may indicate failed library preparation.

Image atac_fragment_size
Figure 12.1: A characteristic ATAC-seq fragment size distribution. The fragment size distribution should have few fragments <30 nt as this is too small for the Tn5 transposase to bind. Short fragments are usually most abundant. A peak should be seen at about 180 nt. Subsequent peaks may be present with nucleosome spacing i.e. a new peak approximately 147 nt after each previous peak. A high frequency periodicity may be observed for small fragment sizes. This is related to the DNA helix pitch. Data is for two samples from [Taavitsainen et al., 2021].

Two additional metrics are shown per sample:

Tn5 bias correction

The Tn5 enzyme has a bias towards certain sequences. This should be seen in the "before" lines of the nucleotide frequency plots (figure 12.2). An absence of a detectable bias indicates problems with library preparation. A different bias may reflect use of a different enzyme.

The "after" lines should show markedly less bias. Bias correction is used to improve the assignment of transcription factors to peaks via footprinting. Failure to correct for bias may lead to more transcription factors being associated with each peak.

Image atac_bias
Figure 12.2: A characteristic Tn5 insertion bias is seen in the "before" lines. This is reduced after bias correction as part of footprinting. Data is for two samples from [Taavitsainen et al., 2021].

Cells

A barcode rank plot is shown for all the samples. An example is shown in figure 12.3. The red horizontal line shows the cutoff specified by the Minimum peak count option. All barcodes above the red line are retained as cells, and all barcodes below the line are discarded. The lines for each sample should be nearly vertical at the point where they cross the threshold line, indicating an abrupt fall in the number of peaks at the threshold. If this is not the case, consider re-running the tool with a different Minimum peak count.

Image atac_barcode_ranks
Figure 12.3: A barcode rank plot is a log-log plot of the total number of peaks for each barcode vs the rank of the barcode, in decreasing order of the number of peaks. Barcodes above the red threshold line are retained as cells. Data is for two samples from [Taavitsainen et al., 2021].

Two additional metrics are shown per sample:

Peaks

A summary table is shown for all peaks:

Details are provided for peaks with nearby genes: