GEO (Gene Expression Omnibus)
The GEO (Gene Expression Omnibus) sample and series formats are supported. Figure 32.10 shows how to download the data from GEO in the right format. GEO is located at http://www.ncbi.nlm.nih.gov/geo/.
Figure 43.1: Selecting Samples, SOFT and Data before clicking go will give you the format supported by the CLC Genomics Workbench.
The GEO sample files are tab-delimited .txt files. They have three required lines:
^SAMPLE = GSM21610 !sample_table_begin ... !sample_table_endThe first line should start with
^SAMPLE =
followed by the sample name, the line !sample_table_begin
and the line !sample_table_end
. Between the !sample_table_begin
and !sample_table_end
, lines are the column contents of the sample.
Note that GEO sample importer will also work for concatenated GEO sample files -- allowing multiple samples to be imported in one go. Download a sample file containing concatenated sample files here:
http://www.clcbio.com/madata/GEOSampleFilesConcatenated.txt
Below you can find examples of the formatting of the GEO formats.
Subsections
- GEO sample file, simple
- GEO sample file, including present/absent calls
- GEO sample file, including present/absent calls and p-values
- GEO sample file: using absent/present call and p-value columns for sequence information
- GEO series file, simple