GSCA

- Gene Set Control Analysis

Introduction

Transcription factors have long been recognized as major regulators of haematopoietic cell type specification. To understand the mechanisms underlying cell type specification by transcription factors, it will be essential to identify their transcriptional targets. As sequencing protocols mature, ChIP-sequencing is becoming a common technique to identify genome wide binding patterns of a given TF in a given cell type. There are over 100 individual studies now deposited in public databases for the murine haematopoietic system alone. This wealth of new data represents unprecedented opportunities to unravel the transcriptional control mechanisms that mediate expression of specific sets of genes within the various haematopoietic cell lineages.

Gene Ontology overrepresentation analysis provides information on various types of functional categories enriched within a given gene set of interest and GSEA determines whether a gene set of interest shows statistically significant expression differences between two or more cell types. Complementary to these approaches, we developed a new computational framework for linking gene sets with transcriptional control, called Gene Set Control Analysis (GSCA). By exploiting multiple transcription factor binding patterns from genome-wide ChIP-Seq studies, GSCA can provide previously unattainable insights into possible transcriptional control mechanisms operating in both normal and malignant cells. Through integrated analysis of 142 blood specific ChIP-Seq binding datasets, C-GSCA identifies likely combinatorial transcriptional control mechanisms by revealing TF co-occupancy patterns specifically associated with gene regulatory elements from a given gene set.


Main features of this webtool

  1. We have collected genome-wide binding patterns for 89 unique transcription factors across 330 experiments in 26 major blood lineages including three types of leukaemia resulting in a total of 4804346 genomic regions bound by at least one transcription factor. ChIP-Seq samples of the same transcription factor in related cell types were merged together.
  2. For a given gene set of user interest, GSCA web tool identifies potential transcription regulators (over-represented overlaps between the gene set and ChIP-seq samples) by performing a hyper-geometric test.
  3. GSCA takes into account multiple binding events in a gene locus, unlike previous methods. When applied to the 80 gene modules from Novershtern et. al., 2011, GSCA approach reported significant associations with ChIP-Seq peaks for 65 gene modules, which corresponds to 81% of all gene sets compared with only 46% using the previously reported ChEA (Lachmann et al., 2010) and Cscan ( Zambelli et al., 2012) protocols.
  4. C-GSCA tool performs a hierarchical clustering of over-represented samples found by GSCA. Therefore, unlike the GSCA approach, C-GSCA has the potential to identify distinct subsets of candidate upstream regulators for a given gene set.

  5. GSCA case study »


Citing this research

Joshi A. et.al., Gene Set Control Analysis (GSCA) predicts hematopoietic control mechanisms from genome-wide transcription factor binding data. Exp Hematol.