Complete Genomics

With CLC Genomics Workbench 7.5 you can import evidence and variation files from Complete Genomics.

The variation files can be imported as tracks (see Import tracks).

The evidence files can be imported using the SAM and BAM mapping files import.

In order to import the evidence data file it need to be converted first. This is achieved using the CGA tools that can be downloaded from http://www.completegenomics.com/sequence-data/cgatools/.

The procedure for converting the data is the following.

  1. Download the human genome in fasta format and make sure the chromosomes are named chr<number>.fa, e.g. chr9.fa.
  2. Run the fasta2crr tool with a command like this:
    cgatools fasta2crr --input chr9.fa --output chr9.crr
  3. Run the evidence2sam tool with a command like this:
    cgatools evidence2sam --beta -e evidenceDnbs-chr9-.tsv -o chr9.sam -s chr9.crr where the .tsv file is the evidence file provided by Complete Genomics (you can find sample data sets on their ftp server: ftp://ftp2.completegenomics.com/).
  4. Import (Image Next_Folder_16_n_p) the fasta file from 1. into the Workbench.
  5. Use the SAM and BAM mapping files importer to import the file created by the evidence2sam tool.

Please refer to the CGA documentation for a description about these tools. Note that this is not software supported by CLC bio.