Introduction to the Ab Initio Transcript Discovery Plug-in

The Ab Initio Transcript Discovery Plug-in is designed to discover transcripts by mapping RNA-Seq sequencing reads to a genomic reference, allowing large gaps (for introns), followed by a transcript discovery process where transcripts are inferred from the read mappings. Relying heavily on reads mapped with a gap as evidence for transcripts, it is primarily developed for eukaryotic genomes.

The proposed work flow for using the Ab Initio Transcript Discovery Plug-in in combination with the existing RNA-Seq tool in the CLC Genomics Workbench is this:

  1. Run the large gap mapper using all your RNA-Seq reads and a genomic reference sequence.
  2. Run the transcript discovery algorithm on the resulting read mapping to predict transcripts and genes.
  3. Inspect the results and if necessary re-run the transcript discovery to refine the settings to produce the desired result.
  4. Part of the result from the transcript discovery is a copy of the reference genome including the new transcript and gene annotations. This can now be used as a common reference for measuring gene expression using the existing RNA-Seq tool in the Workbench
If you have sequenced several samples that need to be compared, we suggest using the reads from all samples for the large gap mapping and subsequent transcript discovery. In this way, you can establish a common set of reference transcripts and genes that makes it possible to compare gene expression levels across samples (using the RNA-Seq tool in the CLC Genomics Workbench). The initial read mapping created by the large gap mapper is then no longer used and can be deleted, unless you wish to be able to go back and double-check the basis of the prediction.

The current release is a beta version with full functionality for single reads. If you have paired reads, they are treated as single reads.