The QIAGEN GeneRead Panel Analysis is a template workflow that can identify and annotate variants in Targeted Amplicon Sequencing data generated with GeneRead DNAseq Gene Panels. The GeneRead DNAseq Gene Panels can either be standard panels focused on a specific set of genes or can be customized to include genes tailored to specific research interests.
The first step in the template workflow is mapping of the sequencing reads to the human reference sequence. This is followed by a local realignment step, which is included to improve the variant detection that follows directly after a primer trimming step. After variant detection, the variants are annotated with gene names, exon numbers, amino acid changes, conservation scores, information from relevant variants present in the ClinVar database, and information from common variants present in the common dbSNP, HapMap, and 1000 Genomes database. Furthermore, a detailed target regions mapping report is created that allows inspection of the coverage and mapping specificity in the target regions.
The QIAGEN GeneRead Panel Analysis template workflow assumes that the sequences used as input do not contain adapters as the removal of adapters is often done directly on the sequencing machine. If adapters have not been trimmed off, please do so before proceeding with your analysis by using the Trim Reads tool (from the "Prepare Raw Data" folder) with the "Automatic read-through adapter trimming" option enabled.
The QIAGEN GeneRead Panel Analysis workflow can be found in the toolbox under "Targeted Amplicon Sequencing":
Toolbox | Template Workflows | Biomedical Workflows () | Targeted Amplicon Sequencing () | Somatic Cancer (TAS) () | QIAGEN GeneRead Panel Analysis ()
Double-click on the QIAGEN GeneRead Panel Analysis workflow to run the analysis.
If you are connected to a CLC Server via the CLC Workbench, you will be asked where you would like to run the analysis. We recommend that you run the analysis on a CLC Server when possible. Click Next.
Select the sequencing reads that should be analyzed (figure 21.58).
If you would like to analyze more than one sample you can choose to run the analysis in batch mode. This is done by ticking "Batch" in the lower left side of the wizard and selecting the folder(s) that holds the data you wish to analyze. If you have your sequencing data in separate folders, you should choose to run the analysis in batch mode. You can learn more about batch analysis in the CLC Workbench user manual (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Batch_processing.html).
In the next window, specify the relevant QIAGEN reference data set to be used with the workflow (figure 21.59)
In the next wizard window (figure 21.60), you must specify the target regions fitting your sample from the drop down menu.
In the next dialog, Hapmap, you can specify the populations that fit your dataset. Indeed, detected variants are annotated with a range of different data in this template workflow, but for databases that provide data from more than one population as HapMap does, the populations relevant to the data set can be specified by the user (figure 21.61).
From the list that can be accessed by clicking on the plus symbol () you can choose the population that matches the population your samples are derived from. Please note that the populations available from the drop-down list can be specified with the Reference Data Manager found in the top right corner of the CLC Workbench.
In the Map Reads to Reference wizard step (figure 21.62), you can configure the read mapper by setting the "Cost of insertions and deletions" to either "Affine gap cost" (default) or "Linear gap cost".
- Linear gap cost The cost of a gap is computed directly from the length of the gap and the insertion or deletion cost. This model often favors small, fragmented gaps over long contiguous gaps.
- Affine gap cost An extra cost associated with opening a gap is introduced such that long contiguous gaps are favored over short gaps.
Specify the target primers for primer trimming in the Trim Primers and their Dimers of Mapped Reads window (figure 21.63). If you would like to add more GeneRead DNAseq Gene Panel target primers, this can be done using the Reference Data Manager as described in http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Custom_Sets.html. It is also possible to either enable or disable the parameter "Only keep reads that have hit a primer". Note that it is enabled by default.
In the Low Frequency Variant Detection wizard step (figure 21.64), you can specify the parameters for variant detection.
Please see the CLC Workbench user manual for a description of the different parameters that can be adjusted in the variant detection step. A description of the "Low Frequency Variant Detection" tool can be found here: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Low_Frequency_Variant_Detection.html.
In the QC for Targeted Sequencing wizard step (figure 21.65), you can specify:
- Minimum coverage i.e., the minimum coverage needed on all positions in a target, in order for that target to be considered covered.
- Ignore non-specific matches and/or broken pairs When these are applied reads that are non-specifically mapped or belong to broken pairs will be ignored.
Finally, in the last wizard step, pressing the button Preview All Parameters allows you to preview all parameters, but to make any changes, you must use the button Previous and Next to reach the relevant wizard window. If no change is necessary, choose to save the results and click Finish.