How to make use of reference data in external applications

The external application gets a CLC URL pointing to:

CLC_References/<organism>/<data type>/<version>/<filename>

and must convert it, using string manipulation, to the following file path:

<import/export location>/<organism>/<data type>/<version>/<filename>

An example of the above translation could be a CLC object, persisted in the file-system as the file:

/opt/gatk/CLC_References/homo_sapiens/sequence/hg19_chr_5/Homo_sapiens_sequence_hg19.clc

That CLC object is pointed to by this CLC URL:

clc://server/CLC_References/homo_sapiens/sequence/hg19_chr_5/Homo_sapiens_sequence_hg19

The external application making use of the exported version of this CLC reference would then have to convert the CLC URL to this file path:

/impexp/CLC_References/homo_sapiens/sequence/hg19_chr_5/Homo_sapiens_sequence_hg19.fasta

by replacing the "clc://server" part of the URL with "/impexp". In this example, it is assumed that "/impexp" is the file path of the import/export location, where the reference data has been exported to.

An important note about CLC URLs is that in addition to the more human-readable forms of CLC URL that we have seen above, a CLC URL can equally well take on an ID based form, such as:

clc://server//[...]1073273041-BAAAAAAAAAAAAAP90a0346df401e2e8-448c7e40-152545ded0d-8000

Obviously, the external application framework would have a hard time translating this URL to anything meaningful in an external context. This challenge has been solved by letting the external application framework always generate an object name form URL (in contrast to an id based URL), when the external application argument has type CLC Object url. Thus, the id based URLs will never be sent to the external application by the external application framework - only the name based form will.