Considerations and limitations
The cas file format is designed with high volume assembly data in mind. However, there are certain considerations that should be kept in mind:
- There is a limit of one alignment position per read. In other words, a read matching in multiple locations can only be assigned to one of these locations within the cas file. This limitation is in place because when assembling short reads to a large genome, some reads may match hundreds of thousands of locations. Keeping track of all such alignments would be problematic.
- If you are planning to send your assembly to someone else for viewing or further processing, you need to include your read and reference files in addition to the cas assembly file. This is because the cas file contains information about the assembly, and does not contain any sequence information.
- If you are planning to send your assembly to someone else, they must put the read and reference files in the same relative location to the cas file, as you did when you ran the assembly. This is because the cas file stores relative file names, and these must match the location of the read and reference files when further processing is undertaken. Please note though that the program change_assembly_files can be used to change the file names and locations.
- If you plan to convert your cas file to SAM or BAM format, which include read information, you need to have the read data used for your mapping, as well as the cas file, available when you run the clc_cas_to_sam program.