Single Cell ATAC-Seq Analysis
Single Cell ATAC-Seq Analysis can be found in the Toolbox here:
Chromatin Accessibility () | Single Cell ATAC-Seq Analysis ()
The tool takes as input a single read mapping () of reads that have been annotated using Annotate Reads with Cell and UMI. The tool outputs:
- A Peak Count Matrix () with annotated nearby genes and transcription factors.
- The Read Mapping () that was used for peak calling.
- An Annotation Track () of transcription factor motifs found within the peaks.
- A Graph Track () showing the footprint score at each position.
- A Report () providing a summary of the data and diagnostic plots for quality control.
It is important that the input read mapping contains all the samples that will be used in a downstream analysis. This is because it is not possible to combine Peak Count Matrices as they will typically have different coordinates for shared peaks. There are two ways to generate a single read mapping from multiple samples:
- Provide multiple read lists to Map Reads to Reference http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Map_Reads_Reference.html.
- Merge existing read mappings using Merge Read Mappings http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Merge_Read_Mappings.html.
The tool requires a Peak Shape Filter () for calling scATAC-Seq peaks, and both a Gene track () and a corresponding mRNA track () for assigning nearby genes to peaks. These data can be directly downloaded using the Reference Data Manager (see The Reference Data Manager).
It is also possible to supply custom Peak Shape Filter, Gene track and mRNA track as follows:
- Peak Shape Filters can be generated by Learn Peak Shape Filter (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Learn_Peak_Shape_Filter.html).
- Gene and mRNA tracks can be imported from gff/gff3/gtf files (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Import_tracks.html).
The following additional options are available:
- Maximum P-value for peak calling. The threshold for reporting peaks, higher values will increase the number of called peaks.
- Minimum peak count. The number of peaks a barcode must have to be called as a cell. Barcodes that do not have this many peaks will not be present in the Peak Count Matrix. This option is the scATAC-Seq equivalent of QC for Single Cell. It is effective despite its simplicity because:
- Peaks must be shared by other cells to have been detected by the peak caller, meaning that this metric is not affected by the presence of large numbers of randomly mapping reads.
- The minimum number of peaks is related to the amount of open chromatin per cell, which is presumed to have a high lower bound for any active cell.
- Sequencing is expected to sample peaks uniformly, so identifying non-cells is easier than for gene expression, where a cell might have so much expression of one gene that it is hard to detect others even though they are present.
- Chromosomes to ignore. As it lacks chromatin, many reads map to the mitochondria chromosome. Ignoring the mitochondria chromosome can therefore speed up analysis and improve results by removing the possibility that peaks are called there. If viral genomes have been added to the reference as decoys, then these should also be ignored. When configuring this option in a workflow, multiple chromosome names can be provided as a comma-separated list.
Subsections
- Interpreting the output of Single Cell ATAC-Seq Analysis
- The report output from Single Cell ATAC-Seq Analysis
- The Single Cell ATAC-Seq Analysis algorithm