Demultiplex Reads
Multiplexing techniques are often used when sequencing different samples in one sequencing run. One method used is to tag the sequences with a unique identifier during the preparation of the sample for sequencing [Meyer et al., 2007].
With this technique, each sequence read will have a sample-specific tag, which is a specific sequence of nucleotides before and after the sequence of interest. This principle is shown in figure 25.16.
Figure 25.16: Tagging the target sequence, which in this case is single reads from one sample.
The sample-specific tag, also called the barcode or the index, can then be used to distinguish between the different samples when analyzing the sequencing data.
Post-processing of the sequencing data is required to separate the reads into their corresponding samples. The Demultiplex Reads tool does this, based on the sequence barcodes. Using this tool, sequences are associated with a particular sample when they contain an exact match to a particular barcode. (An option to allow one mismatch is available.) Sequences that do not match any barcode sequence are classified as not grouped and are put into a sequence list with the name "Not grouped".
When Demultiplex Reads is used within a workflow, the sets of reads to be analyzed together as a unit can be organized based on the barcode table. See Running part of a workflow multiple times for further details.
An example of demultiplexing reads using Illumina-barcoded sequences is provided in later in this section.
Demultiplexing is often carried out on the sequencing machine so that the sequencing reads are already separated according to sample before importing it into the CLC Genomics Workbench. This is often the best option, if it is available to you.
Subsections
- Demultiplexing single reads
- Demultiplexing paired reads
- Entering barcodes
- Demultiplexing output options
- An example using Illumina barcoded sequences