IMGT

For the IMGT format, the header contains 15 elements, separated by "|". Only the following are read and used during import:

The IMGT database contains chains, segment types and labels that are not listed above and are not supported. These are silently ignored.

While the IMSEQ format provides the position of the conserved amino acid, this needs to be calculated for the IMGT format. For this, the V region needs to be provided with gaps such that the conserved amino acid is found at approximately position 104 in the translated amino acid sequence. When downloading sequences from the IMGT database in fasta format, the "F+ORF+in-frame P nucleotide sequences with IMGT gaps" should be used. Alternatively, the corresponding "nt-WithGaps-F+ORF+inframeP" flat file can be downloaded from IMGT/GENE-DB.

If using custom reference data that is not downloaded from the IMGT database, it is recommended to use the IMSEQ format and specify the position of the conserved amino acid.

When importing files in the IMGT format, the following options are available (see figure 7.3):

If element (9) in the header is not empty, the corresponding number of nucleotides are removed from the 5' end of the sequence.

Identification of the conserved amino acid

The nucleotide sequence (with IMGT gaps for the V segments), starting from position in element (8) in the header, is first translated to amino acids using the standard genetic code. The position of the conserved amino acid is calculated, and, if identified, translated to the position of the first nucleotide in the corresponding codon. Segments where the amino acid cannot be identified are silently ignored.

For the V segments, the amino acid position is calculated as follows:

For the J segments, all 3 open reading frames (starting from nucleotide position 1, 2 or 3) are used. Note that "." below denotes any amino acid. The amino acid position is calculated as follows:

V and J segments for which the amino acid position cannot be successfully identified are silently ignored.