Quality scores in the Illumina platform
When using the Illumina importer, you can select the quality score scheme applicable for your data at the bottom of the dialog (figure 7.11).
Figure 7.11: Selecting the quality score scheme.
There are four options:
- NCBI/Sanger or Illumina 1.8 and later. Using a Phred scale encoded using ASCII 33 to 93. This is the standard for fastq formats except for the early Illumina data formats (this changed with version 1.8 of the Illumina Pipeline).
- Illumina Pipeline 1.2 and earlier. Using a Solexa/Illumina scale (-5 to 40) using ASCII 59 to 104. The Workbench automatically converts these quality scores to the Phred scale on import in order to ensure a common scale for analyses across data sets from different platforms (see details on the conversion next to the sample below).
- Illumina Pipeline 1.3 and 1.4. Using a Phred scale using ASCII 64 to 104.
- Illumina Pipeline 1.5 to 1.7. Using a Phred scale using ASCII 64 to 104. Values 0 (@) and 1 (A) are not used anymore. Value 2 (B) has special meaning and is used as a trim clipping. If this option is selected and the Trim reads option is checked, the reads are trimmed when a B is encountered at either end of the reads in the input file .
Further information about the FASTQ format, including quality score encoding, is available at http://en.wikipedia.org/wiki/FASTQ_format.
Small samples of three kinds of files are shown below. The names of the reads have no influence on the quality score format:
NCBI/Sanger Phred scores:
@SRR001926.1 FC00002:7:1:111:750 length=36 TTTTTGTAAGGAGGGGGGTCATCAAAATTTGCAAAA +SRR001926.1 FC00002:7:1:111:750 length=36 IIIIIIIIIIIIIIIIIIIIIIIIIFIIII'IB<IH @SRR001926.7 FC00002:7:1:110:453 length=36 TTATATGGAGGCTTTAAGAGTCATAGGTTGTTCCCC +SRR001926.7 FC00002:7:1:110:453 length=36 IIIIIIIIIII:'III?=IIIIII+&III/3I8F/&
Illumina Pipeline 1.2 and earlier (note the question mark at the end of line 4 - this is one of the values that are unique to the old Illumina pipeline format):
@SLXA-EAS1_89:1:1:672:654/1 GCTACGGAATAAAACCAGGAACAACAGACCCAGCA +SLXA-EAS1_89:1:1:672:654/1 cccccccccccccccccccc]c``cVcZccbSYb? @SLXA-EAS1_89:1:1:657:649/1 GCAGAAAATGGGAGTGAAAATCTCCGATGAGCAGC +SLXA-EAS1_89:1:1:657:649/1 ccccccccccbccbccb``cccbcccZcc`^bR^`The formulas used for converting the special Solexa-scale quality scores to Phred-scale:
A sample of the quality scores of the Illumina Pipeline 1.3 and 1.4:
@HWI-E4_9_30WAF:1:1:8:178 GCCAGCGGCGCAAAATGNCGGCGGCGATGACCTTC +HWI-E4_9_30WAF:1:1:8:178 babaaaa\ababaaaaREXabaaaaaaaaaaaaaa @HWI-E4_9_30WAF:1:1:8:1689 GATGGAGATCTCGACCTNATAGGTGCCCTCATCGG +HWI-E4_9_30WAF:1:1:8:1689 aab`]_aaaaaaaaaa[ER`abaaa\aaaaaaaa[Note that it is not possible to see from that data itself that it is actually not Illumina Pipeline 1.2 and earlier, since they use the same range of ASCII values.
To learn more about ASCII values, please see http://en.wikipedia.org/wiki/Ascii#ASCII_printable_characters.