Paired Read File Input
Paired data may be contained in a single file, where the pairs are sorted such that the the first two sequences are one pair, the second two sequences the next pair, and so on. Paired data may also exist in two files, with one file containing the first member of all pairs, and the other file containing the second member of all pairs, with each member appearing in the same ordered position in each file. For example, the 51st sequence in file A is the mate of the 51st sequence in file B.The CLC Assembly Cell programs assume the single file form for paired data as the default. For paired data with separate files for first and second members of the pair, both files need to be included as input, with each of these files being preceeded by the `-i' option (for interleave). The order of the files on the command line matters. The first file should contain the first member of the pair. The second file should contain the second member of the pair.
To further illustrate this, consider a situation where we have two fasta files like this (first.fasta):
>pair_1/1 ACTGTCTAGCTACTGCATTGACTGCGAC >pair_2/1 TAGCGACGATGCTACTACTCTACTCGAC >pair_3/1 GATCTCTAGGACTACGCTACGAGCCTCA
and this (second.fasta):
>pair_1/2 GGATCATCTACGTCATCGACTAGTACAC >pair_2/2 AAGCGACACCTACTCATCGATCATCAGA >pair_3/2 TATCGACTCAGACACTCTATACTACCAT
where _1/1 and _1/2 belong together, pair_2/1 and _2/2 belong together, etc. The programs expect to see these sequences as one fasta file like this (joint.fasta):
>pair_1/1 ACTGTCTAGCTACTGCATTGACTGCGAC >pair_1/2 GGATCATCTACGTCATCGACTAGTACAC >pair_2/1 TAGCGACGATGCTACTACTCTACTCGAC >pair_2/2 AAGCGACACCTACTCATCGATCATCAGA >pair_3/1 GATCTCTAGGACTACGCTACGAGCCTCA >pair_3/2 TATCGACTCAGACACTCTATACTACCAT
This is accomplished using the `-i' option like this:
clc_mapper -o assembly.cas -d human.gb -q -p fb ss 180 250 -i first.fasta second.fasta
This is identical to:
clc_mapper -o assembly.cas -d human.gb -q -p fb ss 180 250 joint.fasta
Note that the `-i' option has to immediately proceed the input files.