The scale of genomic data is growing and to minimize the duplication of resources spent to obtain and store data of interest to the wider scientific community, providing data server access is becoming more common. In addition to the specialized data portals provided under GenomeSpace Tools, GenomeSpace enables use of other public data resources. If a public data resource provides URL addresses for data files, then you can easily access the files from your GenomeSpace account. On this page we list the example datasets used in the Analysis Recipes as well as provide links to alternative datasets that showcase GenomeSpace functionality.

GenomeSpace Public Folder Datasets

The Public folder in your left directories panel contains user specific public data and a folder titled SharedData. Example datasets used in GenomeSpace Recipes are stored in the Demos subfolder of the SharedData folder. Here we briefly describe each shared dataset. Find more information about each dataset on the corresponding Recipe page, provided as links in the table below.

  • Link to open the Demos folder: /Home/Public/SharedData/Demos
  • Some Recipes use data from other sources and do not have a corresponding shared dataset.
  • Some datasets were used in past workshops and do not have a corresponding recipe or a corresponding presentation. These will be removed from the SharedData folder in the upcoming months as we organize these datasets.

Table matching /Home/Public/SharedData/Demos subfolders to Analysis Recipes as of February 3, 2015

Folder   Corresponding Analysis Recipe
Analysis Recipe 1
  • Normals_Leu

Find subnetworks of differentially expressed genes and identify associated biological functions

Analysis Recipe 2
  • SFP1KO and WT

Find differentially expressed genes in RNA-Seq data

Analysis Recipe 3
  • Gistic_regions.bed

Obsolete. The files relevant to this recipe are now in folder Analysis Recipe 7.

Analysis Recipe 4
  • RNA_seq

Identify and visualize expressed transcripts in RNA-Seq data

Analysis Recipe 5
  • GISTIC_regions.bed
  • USCS_hg18_entrex.bed
  • mrna_orig.gct

Build and visualize a module network using putative aberrant regions and expression data

Analysis Recipe 6
  • test_reads.r1.fastq
  • test_reads.r2.fastq

Identify and annotate coding variants from exome sequencing data

Analysis Recipe 7

Identify biological functions for genes in copy number variation regions

  • K562 and others

None. See presentations in /Home/Public/SharedData/Presentations/Old/CEGS_Sept 23_2014 for demonstrations using these datasets.

  • K562

None and obsolete. The demonstration relevant to this dataset is described for the CEGS folder

  • all_aml
  • glucocorticoid_receptor
  • mTor_network


  • TF


  • Recipe2

Preprocess and quality check RNA-Seq data

  • Recipe1


  • Exercise1


  • Expression data from InSilicoDB

Identify an up- or down-regulated pathway from expression data


Amazon S3 Public Datasets

See for the comprehensive list of public S3 datasets. Of note because of their scale are two projects, the 1000 Genomes Project and the Human Microbiome Project. The User Guide details how to Connect an S3 Bucket to your GenomeSpace account.


Other Public Datasets

Please let us know of other public datasets GenomeSpace should link to. We refer you to curations on the following websites listing other public datasets.

