Print_icon

Convert file formats

GenomeSpace contains built-in file converters for frictionlessly moving your files from one tool to another and these are described in the three sections below. By no means do the given lists encompass all the accepted file formats used by the tools.

Please let us know of additional file conversions that would be helpful to your work, or if you have a converter you would like to contribute to GenomeSpace, at gs-help@broadinstitute.org.


Rename the file extension. A number of formats are based on the plain text format and you can convert these to a TXT file simply by renaming the file extension.

To rename a file, either:

  • Right-click on the file and select Rename from the pop-up menu.
  • Check to select the file and select File menu>Rename.

Edit the original file name and extension in the display and click Rename to save.

The converse requires additional consideration. You can transvert a plain text file to a tool-specific format by renaming the extension if the file is of a supported data type and has the required attributes of the format.

  • Supported data types and required attributes are described in detail for file formats at their respective websites. These required attributes may include specific usage for select cells such as specific headings, unique sample and probe identifiers, etc.
  • One example attribute is the delimiter. For example, GenePattern's GCT format data are tab-delimited and GenePattern's CLS format labels are space-delimited.
  • A file must conform to the required attributes for a true conversion. A file with the extension but without the required attributes results in a tool error.
  • Transversion may be useful to take advantage of certain GenomeSpace features, such as Heatmap previews, which are only available to GCT, GXP and Genomica TAB files.
  • Note rich text format TXT is noncomparable to plain text TXT. Mac users needing help on creating plain text files, see this external site.

 

Convert a file format

In sending files to different tools, you may automatically invoke a file conversion. Each tool handles these conversions differently, and may offer you options within the tool's user interface, or may convert the file behind the scenes. The option to manually convert the file will not be available if there is no converter for that format. The next section on this page lists available converters.

Manually convert file formats within GenomeSpace by:

  • Right-clicking the file and selecting Convert.
  • Checking a single file and selecting File menu>Convert.

This opens the Convert File Format dialog box.

  • Multiple converters for a file type are presented in a drop-down menu. Select the destination format from this menu.
  • To convert and download the file, click Download.
  • To convert and save in the same folder, click Convert on Server.

 

Available Converters

For the most current list of file converters, on the menu bar select Help>Format Converters. Hovering over formats in the new dialog displays brief descriptions of the format. Similarly, links in the table below take you to the next section where each format is described.

The nowhitespace converter for TXT files is for developer testing only. To convert TXT files, see Convert by renaming the file extension.

From format To format
ADJ XGMML
ATTR (Cytoscape) ATTR (Cytoscape GeneMania)
CDT GCT (GenePattern)
TAB (Genomica)
GCT ATTR (Cytoscape)
GXP
TAB (Genomica)

geneset.TAB

The input is a dataset with expression values for genes or probes, but the output lists just the probes.

EXP (geWorkbench Affymetrix EXPeriment file)
GMT TAB (Genomica)
GXP GCT
LST

geneset.TAB (Genomica)

A single column indicating a geneset is added and includes all genes from the LST file.

ODF ATTR (Cytoscape)
REG2TARGET

geneset.TAB

The input is a file that maps regulators to target genes, but the output lists just the probes.

GMT
RES GXP
TAB (Genomica)

geneset.TAB

The input is a dataset with expression values for genes or probes, but the output lists just the probes.

TAB (Genomica) GCT

 

Descriptions of file formats for which converters are available

Format extension Primary tool Description
ADJ GenePattern ARACNE module Adjacency file, tab-delimited. Used by the ARACNE module in GenePattern. The ARACNE module is an algorithm that reverse engineers a gene regulatory network from microarray gene expression data. Further information on this file format can be found here (PDF).
ATTR (Cytoscape) Cytoscape Cytoscape format that describes node and edge attributes. Further information on this file format can be found here.
ATTR (Cytoscape GeneMania) Cytoscape GeneMania plugin Cytoscape attribute format for GeneMania networks. See this page for more information about the GeneMania plugin for Cytoscape.
CDT GenePattern's HierarchicalClustering module CDT (clustered data table) files are created by GenePattern's HierarchicalClustering module. The CDT file reorders the original input data based on the clustering. An additional column or row is added and depends on if clustering was performed on genes or arrays. The additional column or row contains a unique identifier for each row or column that is linked to the description of the tree structure in the GTR or ATR file that is also output by the module. For details on the format, see here.
EXP geWorkbench The geWorkbench native tab-delimited format for saving microarray data, providing a way to include both the data matrix for a group of arrays and various set labels grouping these arrays in the same file. Further information on this file format can be found here.
GCT GenePattern A tab-delimited file format that describes an expression dataset. Further information on the file format can be found here.
GMT GenePattern GSEA module Tab-delimited file format that describes gene sets. Each row represents a gene set. Further information on this file format can be found here.
GXP Genomica Genomica proprietary expression file format. This file format can be used to store the results of complex analyses, and a single GXP file can store multiple annotation files and analyses. It presents the same data as TAB format but is not tab-delimited. More information is detailed here.
LST Genomica
Genomica outputs lists of genes as an LST format file. It contains the list of genes and descriptions given in the first two columns of GXP or TAB files but not expression data. Gene symbols are in square brackets, followed by the identifier and description as shown in the lower-right quadrant of the figure here.
ODF GenePattern The Output Description Format (ODF) is similar to the RES or GCT file formats for gene expression datasets. The main difference is in the header; the body of data still contains the expression level values for each gene in each sample. Further information on the file format can be found here.
REG2TARGET   A two-column format that contains the mapping between regulators (column 1) and target genes (column 2).
RES GenePattern

A tab-delimited file format that describes an expression dataset. Unlike the GCT file format, the RES file format contains labels for each gene's absent (A) versus present (P) calls, as generated by Affymetrix's GeneChip software and does not allow missing expression values.  Further information on the file format can be found here

TAB Genomica

Tab-delimited text file that contains gene expression data. It presents the same data as the GXP format. The first row is a header row, where the names of the arrays or experiments are specified from the third column onwards. Column one lists gene symbols and column two gives the name and description of the probe separated by space-hyphen-space " - ". Remaining columns give expression data for each gene across experiments.

The geneset.TAB format contains just the list of probes from a TAB file. This format is different from the LST format in that it omits the column of descriptions and adds columns that define gene sets, marked by 1 if a gene is included and 0 if it is not.

XGMML Cytoscape eXtensible Graph Markup and Modeling Language file. These files contain network data and node/edge/network attributes. Further information on the file format can be found here

 

<< Run an Analysis | Up | Extract file rows & columns >>