GenomeSpace Tools and Data Sources

GenomeSpace hosts a variety of tools and data sources that provide a wide spectrum of genomic analysis and bioinformatics capabilities. If you would like to add your tool to the GenomeSpace community, see our developer information or contact us.

Tool Description


The ArrayExpress Archive is a database of functional genomics data; it stores microarray and high-throughput sequencing (HTS) data that are described and archived according to the community guidelines for microarray (MIAME) and HTS (MINSEQE) data.

Cancer Cell Line Encyclopedia

Datasets from the Cancer Cell Line Encyclopedia (CCLE) project are now linked to GenomeSpace. The CCLE project aims to characterize the genetics and pharmacology of a large panel of human cancer models and to develop integrated computational analyses that link pharmacologic sensitivities to genomic patterns. In addition to the pharmacologic analysis and annotations for about 1000 cell lines, data include DNA copy number, mRNA expression, Oncomap 3.0 mutation, and hybrid capture mutation analyses. The project is ongoing and datasets may be released before publication.


The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets.


A cistrome is defined as the set of cis-acting targets of a trans-acting factor on a genome scale. The Cistrome project integrates ChIP-chip/seq data for a cistrome with standard analysis pipelines, while  providing users with a user-friendly, powerful, flexible web interface with which they can access the data and pipelines.

In addition to the standard Galaxy functions, Cistrome has 29 ChIP-chip- and ChIP-seq-specific tools in three major categories, from preliminary peak calling and correlation analyses, to downstream genome feature association, gene expression analyses, and motif discovery.


Cytoscape is an open-source bioinformatics software platform for visualizing molecular interaction networks and biological pathways, and integrating these networks with annotations, gene expression profiles, and other state data.   Cytoscape core distribution provides a basic set of features for data integration and visualization.  Additional features are available as plugins.  Plugins are available for network and molecular profiling analyses, new layouts, additional file format support, scripting, and connection with databases.  Plugins may be developed by anyone using the Cytoscape open API based on Java™ technology and plugin community development is encouraged. Most of the plugins are freely available.


FireBrowse is a companion portal to the Broad Institute GDAC Firehose analysis pipeline, and was developed to cull and analyze data generated by The Cancer Genome Atlas (TCGA), which characterizes and identifies genomic patterns in human cancer models. Backed by a powerful compute infrastructure, programming interface, online reports and modern graphical tools, FireBrowse provides a simple yet capable means of visually and programmatically exploring one of the most comprehensive and deeply-characterized open cancer datasets in the world. FireBrowse provides access to a variety of cancer genomics data, such as clinical annotations, DNA copy number, miR, miRseq, mRNA and mRNAseq; as well as a comprehensive suite of more than 100 interdependent analyses of those data, including: correlations, clustering, and GISTIC and MutSigCV.


The Galaxy project is an open, web-based platform for performing accessible, reproducible, and transparent biomedical research. Complete computational analyses can be built, saved, rerun, modified, and shared. Galaxy is available as a free public web server; as open-source software that can be installed and customized to address specific needs; as a virtual machine that can be installed on the cloud; and on many specialized public web servers at user sites.


GenePattern is a powerful genomic analysis platform that provides access to hundreds of tools for gene expression analysis, proteomics, SNP analysis, flow cytometry, RNA-seq analysis, and common data processing tasks. A web-based interface provides easy access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research.


Genomica is an analysis and visualization tool for genomic data, which can integrate gene expression data, DNA sequence data, and gene and experiment annotation information. Using Genomica, you can do the following:


geWorkbench (genomics Workbench) is an open source Java desktop application that provides access to an integrated suite of tools for the analysis and visualization of data from a wide range of genomics domains (gene expression, sequence, protein structure, and systems biology). More than 70 distinct plug-in modules are currently available, implementing both classical analyses (several variants of clustering, classification, homology detection, etc.) as well as state-of-the-art algorithms for the reverse engineering of regulatory networks and for protein structure prediction, among many others. geWorkbench leverages standards-based middleware technologies to provide seamless access to remote data, annotation, and computational servers, thus enabling researchers with limited local resources to benefit from available public infrastructure.


Gitools is a framework for analysis and visualization of genomic data using interactive heatmaps that also allows data to be imported from various sources, including Biomart, Ensembl, IntOGen, and GenomeSpace.


The Integrative Genomics Viewer (IGV) is a light-weight, high-performance visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic data sets on standard desktop computers. It supports flexible integration of a wide variety of data types including aligned sequence reads, mutations, copy number, RNA interference screens, gene expression, methylation, and genomic annotations. IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large data sets over all resolution scales, while consuming minimal resources on the client computer.  Navigation through a data set is similar to that of Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole genome to base pair.


The open source ISA metadata tracking tools help to manage an increasingly diverse set of life science, environmental, and biomedical experiments that employ one or a combination of technologies.

MSigDB Online Tools

The Molecular Signatures Database (MSigDB) is a large curated collection of annotated gene sets.  It has some accompanying online tools, including one that allows you to compute the overlap between your own gene set and the MSigDB gene sets.  When gene sets share genes, examination of how they overlap can highlight common processes, pathways, and underlying biological themes. This tool evaluates the overlap of a user-provided gene set, and an estimate of the statistical significance, with as many MSigDB collections as you choose.

Multiple Myeloma Genomics Portal

Datasets on the Multiple Myeloma Genomics Portal are now linked to GenomeSpace. The portal itself provides access to and limited analysis of multiple myeloma datasets. These include the Multiple Myeloma Research Consortium funded reference aCGH and gene expression data and additional public multiple myeloma datasets. The portal is updated with new data as they become available.

Project Achilles

Datasets from Project Achilles are now linked to GenomeSpace. Project Achilles is creating a genome-wide catalog of tumor dependencies, to identify vulnerabilities associated with genetic and epigenetic alterations. Individual genes in hundreds of genomically characterized cancer cell lines are silenced using a genome-wide, pooled shRNA library. Genes that affect cell survival are identified using advanced computational methods, including ATARiS (Analytic Technique for Assessment of RNAi by Similarity). The publicly available 2011 v2.0 analysis is on 102 cell lines that include 25 ovarian, 18 colon, 13 pancreatic, 9 esophageal, 8 non-small-cell lung cancer (NSCLC), and 6 glioblastoma cancer cell lines. Release of v2.4 analyzing 216 cell lines is scheduled.


Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education.


Synapse, from Sage Bionetworks, is an open source, web-based platform designed to facilitate collaboration within and among scientific teams, access to large-scale analysis-ready genomics data sets, and integration with analysis tools and programming environments.  The portal allows users to define their own online project spaces to which they can post content (data, code, analysis history and project summaries) and document their work online immediately upon production. Project owners are able to control access to their own projects: users can share projects initially with only defined collaborators or make them publicly available. The platform supports a public API and integration with R, Python, Java, and command-line clients, allowing analysts to work directly with Synapse data and code using their existing tools.


The Trinity Cancer Transcriptome Analysis Toolkit (CTAT) aims to provide tools for leveraging RNA-Seq to gain insights into the biology of cancer transcriptomes. Bioinformatics tool support is provided for mutation detection, fusion transcript identification, and de novo transcriptome assembly.

UCSC Table Browser

The UCSC Genome Browser website contains the reference sequence and working draft assemblies for a large collection of genomes. The Table Browser provides convenient access to the underlying database, allowing you to retrieve the data associated with an annotation track (a means of organizing a particular set of information, such as known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, or chromosomal bands) in text format, to calculate intersections between tracks, and to retrieve DNA sequence covered by a track.