SOLiD from Life Technologies

Choosing the SOLiD import will open the dialog shown in figure 6.7.

Image importngsdialog-solid
Figure 6.7: Importing data from SOLiD from Applied Biosystems.

The file format accepted is the csfasta format which is the color space version of fasta format. If you want to import quality scores, a qual files should also be provided. The reads in a csfasta file look like this:

>2_14_26_F3
T011213122200221123032111221021210131332222101
>2_14_192_F3
T110021221100310030120022032222111321022112223
>2_14_233_F3
T011001332311121212312022310203312201132111223
>2_14_294_F3
T213012132300000021323212232.03300033102330332
All reads start with a T which specifies the right phasing of the color sequence.

If a reads has a . as you can see in the last read in the example above, it means that the color calling was ambiguous (this would have been an N if we were in base space). In this case, the Workbench simply cuts off the rest of the read, since there is no way to know the right phase of the rest of the colors in the read. If the read starts with a dot, it is not imported. If all reads start with a dot, a warning dialog will be displayed. In the quality file, the equivalent value is -1, and this will also cause the read to be clipped.

When the example above is imported into the Workbench, it looks as shown in figure 6.8.

Image solidimported
Figure 6.8: Importing data from SOLiD from Applied Biosystems. Note that the fourth read is cut off so that the color following the dot are not included

For more information about color space, please see Color space.

In addition to the native csfasta format used by SOLiD, you can also input data in fastq format. This is particularly useful for data downloaded from the Sequence Read Archive at NCBI (http://www.ncbi.nlm.nih.gov/Traces/sra/). An example of a SOLiD fastq file is shown here with both quality scores and the color space encoding:

@SRR016056.1.1 AMELIA_20071210_2_YorubanCGB_Frag_16bit_2_51_130.1 length=50
T31000313121310211022312223311212113022121201332213
+SRR016056.1.1 AMELIA_20071210_2_YorubanCGB_Frag_16bit_2_51_130.1 length=50
!*%;2'%%050%'0'3%%5*.%%%),%%%%&%%%%%%'%%%%%'%%3+%%%
@SRR016056.2.1 AMELIA_20071210_2_YorubanCGB_Frag_16bit_2_51_223.1 length=50
T20002201120021211012010332211122133212331221302222
+SRR016056.2.1 AMELIA_20071210_2_YorubanCGB_Frag_16bit_2_51_223.1 length=50
!%%)%'))'&'%(((&%/&)%+(%%%&%%%%%%%%%%%%%%%+%%%%%%+'

For all formats, compressed data in gzip format is also supported (.gz).

The General options to the left are:

Click Next to adjust how to handle the results. We recommend choosing Save in order to save the results directly to a folder, since you probably want to save anyway before proceeding with your analysis. There is an option to put the import data into a separate folder. This can be handy for better organizing subsequent analysis results and for batch processing.