miRBase data file format
The miRBase database is available for download and installation via the CLC Genomics Workbench, as described in Quantify miRNA.
MiRBase .dat files can also be imported using Standard Import functionality, and selecting the miRBase dat
in the Force import as type menu of the Standard Import dialog.
A *.dat file has the following format:
ID cel-let-7
XX
DE Caenorhabditis elegans let-7 stem-loop
XX
FH Key Location/Qualifiers
FH
FT miRNA 17..38
FT /product="cel-let-7-5p"
FT miRNA 60..81
FT /product="cel-let-7-3p"
XX
SQ Sequence 99 BP; 26 A; 19 C; 24 G; 0 T; 30 other;
uacacugugg auccggugag guaguagguu guauaguuug gaauauuacc accggugaac 60
uaugcaauuu ucuaccuuac cggagacaga acucuucga 99
//
ID cel-lin-4
XX
DE Caenorhabditis elegans lin-4 stem-loop
XX
FH Key Location/Qualifiers
FH
FT miRNA 16..36
FT /product="cel-lin-4-5p"
FT miRNA 55..76
FT /product="cel-lin-4-3p"
XX
SQ Sequence 94 BP; 17 A; 25 C; 26 G; 0 T; 26 other;
augcuuccgg ccuguucccu gagaccucaa gugugagugu acuauugaug cuucacaccu 60
gggcucuccg gguaccagga cgguuugagc agau 94
//
If the above formatting is followed, the dat file can be imported as a miRBase file for annotation purposes. In particular, the following needs to be in place:
- The sequences needs "miRNA" annotation on the precursor sequences. In the CLC Genomics Workbench, you can add a miRNA annotation by selecting a region and right clicking on Add Annotation. You should have a maximum of 2 miRNA annotations per precursor sequence. Matches to first miRNA annotation are counting in
5'
column. Matches to second miRNA annotation are counted as3'
matches. - If you have sequence list containing sequences from multiple species, the Latin name of the sequences should be set. This is used in the annotation dialog where you can select the species. If the Latin name is not set, the dialog will show "N/A".