Promoter state analysis module



The promoter state analysis module of AtREA has been designed to study the role of CRE/CRE combinations in regulation of gene expression. This module takes as input a set of genes and a set of CREs (derived from literature/experimental data and/or computational analysis) that may be involved in expression regulation under the given condition. The module first segregates the upstream sequence set into CRE states based on presence/frequency of occurrence of each of the user defined CREs. The module then compares the expected frequency of occurrence of each of the states (assumed from their respective genomic occurrences) to their actual frequency of occurrence in the selected gene set and identify overrepresented states in the selected gene set.


Input CRE  

Input cis regulatory elements in "CRE state analysis module" of AtREA can be presently in consensus format only. The input can contain upto four different CREs separated by hash(#) sign.


 

Example of input CRE for state analysis

 





In addition to consensus sequences based the four basic nucleotides i.e A,C,G and T ,the input CRE can contain the following nucleotide codes.


Code

Implication

M

A or C

R

A or G

W

A or T

S

C or G

Y

C or T

K

G or T

V

A or C or G

H

A or C or T

D

A or G or T

B

C or G or T

N

G or A or T or C



Mode

The "CRE state analysis module" can be used in two different modes:

Binary: In the binary mode a CRE can have only two possible sates(0,1). For example if two CREs A and B are the inputs in the binary mode, then based on presence(1) /absence(0) of the two CREs the entire upstream sequence set is divided into 4 CRE states( A0B0, A1B0, A1B1, A0B1).
CRE instances:  In the "CRE instances" mode the states are defined on the basis of frequency of occurrence of each CRE in a upstream sequence and a CRE can have n of possible states ( 0,1,2,3,4...n) where n is the maximum observed frequency of the CRE (in any of the ~22000 genes). Therefore if we consider two CREs A and B where CRE-A has three occurrence states (0,1,2) and CRE B has also three occurrence states then in the "CRE instances" mode the upstream sequence set will be divided into nine CRE states (A0B0, A0B1, A0B2, A1B0, A1B1, A1B2, A2B0, A2B1, A2B2) . The CRE instances mode is specially useful for shorter CRE(s), or CRE(s) whose number of occurrence in a upstream sequences is important in determining the corresponding gene's expression.

*The reference set consists of 22229 1Kb upstream sequences(with respect ot TSS).
*The enrichment of  a CRE in all these cases is estimated on the basis of cumulative hypergeometric probablity of observing  greater or same number of CRE state containing genes  in a random gene set of same size.



Class

  • Any class from any of the five ontology categories can be analysed for the enrichment of the CRE states. To begin with user needs to select a category from the list of class categories.For each of these categories, the class of interest can be choosen from the  drop down menu of the corresponding ontology group.
 Category
Ontology
Classes Incuded in AtREA
GOBP Gene Ontology Biological Process
list of GOBP classes
GOMF Gene Onltology Molecular Function list of GOMF classes
GOCC Gene Onlolotgy Cellular Component list of GOCC classes
MIPS MIPS FUNCAT classes list of MIPS FUNCAT classes
ARACYC ARACYC pathways list of ARACYC classes

For example to analyse the features of a CRE is GOBP class GO:0000038 the user needs to select GOBP from the "category menu" and the class GO:0000038 from the "GOBP class" menu.
Select Class Category: 


GOBP class


  • To study the features of a CRE in  induced or repressed genes from any of the expression slides the corresponding option should be selected from the category menu. 

 Category Description  Expression slides
induced Genes which show induction in different microarray experiemnts

repressed Genes which show induction in different microarray experiemnts

The slide of interest can then be selected from the "expression slide" menu.

Expression slide

As the characterization of genes as induced on repressed depends on a expression score cutoff the user can specify expression cutoff  from the "Fold Cutoff" menu. The options include score cutoffs in log 2 format and range from 1.2 to 4 (i.e. ~1.4 to 16 folds).


   Fold Cutoff (only for expression classes) :

 


  • User can also analyse different features of a CRE in their own set of genes.For  this the category "UserGeneset" should be selected as category and a list of AGI codes( like At1g75820) for genes shold be entered in the UserGeneSet text input box.

Select Class Category: 


User Geneset





Output


CRE State(RCCGAC#ACGTGKC) Number of gene with CRE state in class(Gc) Number of genes in class(Tc) Number of genes with CRE state in reference set(Fr) Number of genes in reference set(Tr) Class Occurrence of CRE State (Co=Fc/Tc) Reference set occurrence of CRE state(Ro=Fr/Tr) Occurrence Ratio (=Co/Ro) Hypergeometic P-value Genes which contain CRE state
0#1            
47
294
2617 
22229
  0.15986   
0.11773 1.36  0.0181219106940625 At1g10760,At1g11210,At1g12730,At1g22750,At1g22770,At1g28330,At1g53910,At1g67660,At1g75200,At1g78600,At1g79440,At1g80480,At2g15960,At2g21130,At2g28910,At2g33845,At2g34810,At2g38240,At2g39930,At3g10020,At3g15640,At3g29320,At3g43690,At3g47160,At3g53800,At4g03510,At4g11600,At4g14270,At4g15660,At4g18240,At4g19390,At4g26680,At4g30660,At4g33980,At4g34950,At4g39100,At5g14560,At5g42900,At5g47250,At5g47260,At5g50460,At5g54960,At5g59320,At5g60100,At5g60110,At5g61380,At5g67480



                                                                                                                               Figure 1

The  output of AtREA CRE features module (Figure 1) contains the following columns

  • Column 1: The CRE state.(the states for which the P-value of enrichment in the selected gene set is less than 0.05 the rows are highlighted in blue.

  •  Column 2: Number  of genes from class which show the particular CRE state.

  • Column 3: Number of genes in the class.
  • Column 4: Number of genes from the  reference set which show the particular CRE state.

  • Column 5: Number of genes in the  reference set .

  •  Column 6: Class occurrence of CRE state i.e. number of genes from the selected class which show the CRE state divided by the total number of genes in the selected class.

  • Column 7: Reference state occurrence of CRE state i.e. number of genes from the reference set which show the CRE state divided by the total number of genes in the reference set.

  • Column 8: Ratio of Class occurrence and reference set occurrence.
  • Column 9 (Hypergeometric P-vlaue) :  the cumulative hypergeometric p-value of observing eqaul or greater number of  CRE state containing promoters in a random set containing same number of genes.  

  • Column 10 : list of genes from class which contain the CRE state in their upstream sequence.


 

Example

The dehydration response element binding (DREB) motif is known to be involved in transcription regulation under draught stress.Evidneces from literature suggest that along with DREB,the ABRE binding sites also play a signficant role in transcription regulation under drought stress. We have used state analysis module to study the distribution of  states based on DREB and ABRE consensus sequences in genes which are induced by draught stress(slide:drought_6h_shoot(1007966668#1029, fold cutoff: 1.4).

Example of Strand analysis  option

Table 1 shows the distribution of  CRE states based on ABRE DREB and MYB consensus sequences (in binary mode) among genes which are induced by drought stress(slide: drought_6h_shoot(1007966668#1029) , fold cutoff: 1.4).

CRE State(RCCGAC#ACGTGKC) Number of gene with CRE state in class(Gc) Number of genes in class(Tc) Number of genes with CRE state in reference set(Fr) Number of genes in reference set(Tr) Class Occurrence of CRE State (Co=Fc/Tc) Reference set occurrence of CRE state(Ro=Fr/Tr) Occurrence Ratio (=Co/Ro) Hypergeometic P-value Genes which contain CRE state
0#1 47 294 2617 22229 0.15986 0.11773 1.36 0.0181219106940625 At1g10760,At1g11210,At1g12730,At1g22750,At1g22770,At1g28330,At1g53910,At1g67660,At1g75200,At1g78600,At1g79440,At1g80480,At2g15960,At2g21130,At2g28910,At2g33845,At2g34810,At2g38240,At2g39930,At3g10020,At3g15640,At3g29320,At3g43690,At3g47160,At3g53800,At4g03510,At4g11600,At4g14270,At4g15660,At4g18240,At4g19390,At4g26680,At4g30660,At4g33980,At4g34950,At4g39100,At5g14560,At5g42900,At5g47250,At5g47260,At5g50460,At5g54960,At5g59320,At5g60100,At5g60110,At5g61380,At5g67480
1#0 69 294 4179 22229 0.23469 0.18800 1.25 0.0257057006005127 At1g06460,At1g14000,At1g26665,At1g27630,At1g29395,At1g48330,At1g49720,At1g49730,At1g51090,At1g68070,At1g68080,At1g75190,At1g80500,At2g01520,At2g02990,At2g15860,At2g15880,At2g16660,At2g19450,At2g21660,At2g21690,At2g23040,At2g25930,At2g28900,At2g34930,At2g37140,At2g39030,At2g39340,At2g42530,At2g42540,At3g15630,At3g18080,At3g26580,At3g28290,At3g43680,At3g47836,At3g47840,At3g47850,At3g47860,At3g48370,At3g50970,At3g53990,At3g59350,At3g62550,At4g03530,At4g04340,At4g11310,At4g11610,At4g12280,At4g19120,At4g23600,At4g30650,At4g34960,At4g35770,At5g05630,At5g06690,At5g06700,At5g07010,At5g25210,At5g26340,At5g42910,At5g47240,At5g48250,At5g50450,At5g57630,At5g61400,At5g62360,At5g63810,At5g63820
1#1 29 294 1042 22229 0.09864 0.04688 2.10 0.000136907956885787 At1g17690,At1g22370,At1g27640,At1g69830,At1g75210,At1g76590,At2g15970,At2g17840,At2g22450,At2g29630,At2g38465,At2g47890,At3g05880,At3g10410,At3g22740,At3g46640,At3g47890,At3g48360,At3g63160,At4g01130,At4g02370,At4g03560,At4g09020,At4g27440,At4g33700,At5g03240,At5g15960,At5g52310,At5g57110

Table 1