Driving Biological Project: Gene Networks in Cancer

Driving Biological Project: Gene networks in cancer: an example of a GenomeSpace analysis. Detailed descriptions of the datasets used, and the analysis steps taken, to examine gene regulatory networks (GRN) in human cancer stem cells through GenomeSpace. Videos illustrating the analysis can be found on the GenomeSpace YouTube channel, as the "Driving Biological Project: Identifying Gene Regulatory Networks in Human Cancer Stem Cells" playlist.

Legend:

  • provided datasets: located in the GenomeSpace SharedData account, under the folder: Driving Biological Projects → Identifying Gene Regulatory Networks in Human Cancer Stem Cells
  • derived datasets: files that are generated during the analysis
  • analysis steps
  • GenomeSpace tool used in analysis step
  • Module, subtool or plugin used in analysis step
  • Menu → indicates menu items or navigation through a tool's menu
  • parameter used in analysis step

Definitions:

Stemness signature: a set of genes which are upregulated in induced cancer stem cells (iCSCs), and enriched in embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs). These genes represent a stem cell state.

Non-stemness signature: a set of genes which are upregulated in iCSCs, but are not associated with the stemness signature.

Stemness "ON": a set of breast cancer tumor samples which have significant enrichment in the stemness signature.

Stemness "OFF": a set of breast cancer tumor samples which do not have enrichment of the stemness signature.

 

Dataset Descriptions:

Dataset 1: Microarray gene expression data of induced cancer stem cells (iCSCs), generated from primary human keratinocytes which were transformed into squamous cell carcinomas using Ras, IκBα, and either GFP, E2F3, or c-Myc1.
Provided files: MYC_E2F3GFP.gct, MYC_E2F3GFP.cls

Dataset 2: Microarray gene expression data of embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) created from subsets of several published datasets (Gene Expression Omnibus accession numbers: GSE224992, GSE222463, GSE123904, and GSE151485).
Provided files: diff_ESCiPSC.zip, diff_ESCiPSC.array.tab

Dataset 3: Data of primary human breast cancer tumor samples including gene expression data (GCT format, calculate from Affymetrix .CEL files), CNV data (aCGH format), and reference annotation data6.
Provided files: breasttumor.geneexp.gct, breasttumor.acgh.txt

Dataset 4: Array comparative genomic hybridization (aCGH) data of primary human breast tumor samples6.
Provided files: breasttumor.acgh.order.cn

Dataset 5: ChIP-seq data of identifying targets of the human c-MYC transcription factor7.
Provided files: SRR502406.fastq.gz

Gene sets (Published or Custom): These can be manually curated custom gene sets of the investigator's choice, or previously published gene sets (e.g. from MSigDB or Reactome). Example gene sets used in this analysis include stem cell expression signatures, human cancer modules, ChIP-seq targets, genes altered by RNAi, Gene Ontology terms, and KEGG pathways.

 

Analysis Steps:

1. Examine gene expression differences between induced cancer stem cells (iCSCs) and parental cells or non-tumorigenic transformants.

Input: MYC_E2F3GFP.gct and MYC_E2F3GFP.cls
Click here for detailed, step-by-step instructions for this step.
Output: a list of 3903 Gene Symbols of genes significantly up-regulated in induced iCSCs (up_regulated_in_MYC_over_E2F3GFP.slice.gct).
Click here to watch a video of this step.

 

2. Process and normalize gene expression datasets of embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs).

Input: diff_ESCiPSC.zip.
Click here for detailed, step-by-step instructions for this step.
Output: a normalized gene expression file that is row centered and collapsed (e.g., diff_ESCiPSC.preprocessed.collapsed.tab).
Click here to watch a video of this step.

 

3. Determine whether the genes up-regulated in iCSCs (from #1) share any similarity with genes expressed in ESCs or iPSCs (from #2).

Input: list of iCSC-associated genes (#1; up_regulated_in_MYC_over_E2F3GFP.slice.gct); other gene expression datasets (#2; diff_ESCiPSC.preprocessed.collapsed.tab); an accompanying array description file (diff_ESCiPSC.array.tab).
Click here for detailed, step-by-step instructions for this step.
Output: a list of 550 stemness genes (stemness.geneset.tab) and a list of 469 non-stemness genes (nonstemness.geneset.tab).
NOTE: Do not close Genomica.
Click here to watch a video of this step.

 

4. Determine if the stemness genes (#3A) are systematically regulated at a transcriptional level during differentiation.

Input: list of stemness genes (#3A; stemness.geneset.tab); gene sets of interest. In this example, we evaluate the MSigDB gene set for computationally derived human cancer gene expression (Collection 4), and the MSigDB Gene Ontology gene set (Collection 5).
Click here for detailed, step-by-step instructions for this step.
Output: a list of 50 cancer modules and 101 Gene Ontology terms that are enriched in stemness genes (P-value < 0.01, hypergeometric test, FDR < 0.05).
Click here to watch a video of this step.

 

5. Determine if the non-stemness genes (#3B) are systematically regulated at the transcriptional level during differentiation, by repeating #4A-4C on the non-stemness genes.

Input: list of non-stemness genes (#3B; nonstemness.geneset.tab); gene sets of interest. In this example, we evaluate the MSigDB gene set for computationally derived human cancer gene expression (Collection 4), and the MSigDB Gene Ontology gene set (Collection 5).
Click here for detailed, step-by-step instructions for this step.
Output: a list of 37 cancer modules and 11 Gene Ontology terms that are enriched in non-stemness genes (P-value < 0.01, hypergeometric test, FDR < 0.05).
Click here to watch a video of this step.

 

6. Determine how the stemness and non-stemness gene signatures are represented in differentiated cancer cells. To determine this, first process and normalize breast cancer gene expression profiles in GenePattern, for later analysis in Genomica.

Input: A breast cancer gene expression dataset (breasttumor.geneexp.gct)
Click here for detailed, step-by-step instructions for this step.
Output: a normalized gene expression file of normal/tumor breast cancer samples in Genomica tab format (breasttumor.geneexp.preprocessed.collapsed.tab).
Click here to watch a video of this step.

 

7. Determine the relationship between stemness genes and differentiated breast cancer tumor samples. To do this, separate the tumor samples into two groups based on whether they are enriched or depleted for expressed stemness genes. Then, use this categorization to process breast cancer tumor copy number variation (CNV) data.

Input: a normalized gene expression file of breast cancer samples (#6; breasttumor.geneexp.preprocessed.collapsed.tab), the list of stemness genes (#3A, stemness.geneset.tab), and the raw breast cancer CNV profiles (breasttumor.acgh.txt).
Click here for detailed, step-by-step instructions for this step.
Output: an experiment set file that separates breast cancer tumor samples based on presence (up-regulation) or absence (down-regulation) of stemness genes (INGENESET.array.tab), a normalized CNV profile (acgh.avergene.tab), arrays in which the stemness signature is present (INGENESET.present.txt; "Stemness ON" samples) and absent (INGENESET.absent.txt; "Stemness OFF" samples).
Click here to watch a video of this step.

 

8. Upload the C1 collection of chromosome cytobands gene sets from MSigDB to GenomeSpace, then convert the file to geneset tab format for Genomica.

Input: None.
Click here for detailed, step-by-step instructions for this step.
Output: a Genomica gene set of the Collection C1 from MSigDB (e.g., c1.all.v4.0.symbols.geneset.tab
Click here to watch a video of this step.

 

9. To identify if gene expression signatures (such as "Stemness ON" and "Stemness OFF") are associated with copy number aberrations in genes regulating transcription, perform Stepwise Linkage Analysis of Microarray Signatures (SLAMS).
     Description of the SLAMS procedure11:

  1. Sort tumor samples into groups based on whether the stemness signature is present ("ON") or absent ("OFF") (Completed in Step #7).
  2. Compare the DNA copy number changes (from the array CGH data) between the groups of tumor samples. Calculate the association between stemness expression and CNV datasets to identify amplifications/deletions associated with the stemness signature.
  3. Select genes which are potential candidate regulators of the stemness signature, based on coordinate gene amplification/deletion and gene expression upregulation/downregulation.
  4. Validate the candidate regulators by assessing their predictive ability in independent samples of tumor samples.
Input: the normalized breast cancer CNV profiles (#7D; acgh.avergene.tab), the experiment array set with stemness present/absent annotation (#7B; INGENESET.array.tab), the processed MSigDB C1 collection (#8A; c1.all.v4.0.symbols.geneset.tab), and the normalized breast cancer gene expression dataset (#6; breasttumor.geneexp.preprocessed.collapsed.tab).
Click here for detailed, step-by-step instructions for this step.
Output: visualize the enrichment of cytobands CHR8Q24, CHR8Q22 in the CNV data for tumors with the stemness signature. Obtain a list of identified regulators (e.g. stemness_regulators.geneset.tab). This file contains a list of 48 candidate regulators (out of the 145) that are "stemness regulators", which are also significantly enriched in "Stemness ON" tumors. One of these stemness regulators is MYC, suggesting that the iCSC model faithfully recapitulates genes reprogrammed in authentic human cancers by c-MYC amplification. Finally, visualize the expression of the breast cancer gene expression dataset for these regulators.
Click here to watch a video of this step.

 

10. Visualize CNV profiles of enriched chromosome cytobands in IGV.

Input: breasttumor.acgh.order.cn.
Click here for detailed, step-by-step instructions for this step.
Output: visualization of chromosome cytobands.
Click here to watch a video of this step.

 

11. Having determined that some transcriptional regulators have copy number aberrations that affect gene expression in breast cancer tumor samples, check if specific transcription factors and miRNAs (such as Oct4, Nanog, Sox2, Myc and the Polycomb complex) are regulating stemness and non-stemness genes.

Input: ChIP-seqs of TFs and epigenetic regulators such as Myc (SRR502406.fastq.gz).
Click here for detailed, step-by-step instructions for this step.
Output: a .bed file that can be visualized further.
Click here to watch a video of this step.

 

12. Visualize interesting TFs and epigenetic regulators's ChIP-seq signals (#11H) using IGV.

Input: The .bed file from #11H.
Click here for detailed, step-by-step instructions for this step.
Output: visualization of the ChIP-seq signals.
Click here to watch a video of this step.

 

13. Create a gene regulatory network of genes which regulate the transcriptional profile of stemness genes using the ModuleNetwork tool in Genomica.

Input: the normalized breast cancer gene expression dataset (#6; breasttumor.geneexp.preprocessed.collapsed.tab), the list of stemness regulators (#9J; stemness_regulators.geneset.tab).
Click here for detailed, step-by-step instructions for this step.
Output: a module network that can be passed to Cytoscape (stemness_network.ndb).
Click here to watch a video of this step.

 

14. Visualize the gene regulatory network in Cytoscape. In this example, we use Cytoscape 2.8.

Input: a module network (#13A).
Click here for detailed, step-by-step instructions for this step.
Output: visualization of the gene regulatory network.
Click here to watch a video of this step.

 

Non-stemness genes. Repeat steps 7, 9, 13 and 14 using the list of non-stemness genes, to identify genes regulating the non-stemness signature and to predict and visualize the iCSC unique regulatory network in Cytoscape.