CRE feature analysis module


CRE feature analysis module of AtREA can be used to evaluate role of strand, position, variations in consensus sequence and frequency of occurrence of a CRE in regulation of expression of a set of genes as well as to identify overall trends in distribution of a CRE. The gene set can be selected from any of the predefined categories or can be provided by the user.


Input CRE  

Input cis regulatory elements in "CRE features analysis module" of AtREA can be presently in consensus format only. In addition to consensus sequences based the four basic nucleotides i.e A,C,G and T.The feature analysis  module also accepts the following nucleotide codes. For analysis of "variants" the input CRE can contain only A.C,G and T.

Code

Implication

M

A or C

R

A or G

W

A or T

S

C or G

Y

C or T

K

G or T

V

A or C or G

H

A or C or T

D

A or G or T

B

C or G or T

N

G or A or T or C



 
Example of input CRE

 

 

In addition to user defined CREs known CRE sequences from different CRE databases like PLACE and ATCISDB can be also analysed using AtREA. input field.


Analysis

Four different features of a CRE in a user selected class can be analysed by  the "CRE feature analysis module".

Strand: The strand analysis option compares the occurrence of the input CRE in different strands(coding reverse and both) in the selected class to to their occurrences(in the corresponding strand) in the reference set.

Position:
The position analysis compares the occurrence of the input CRE in five position windows (0-200, 200-400, 400-600, 600-800, 800-1000) in the selected class to their occurrences(in the corresponding position windows) in the reference set.

Frequency: In frequency analysis, all possible frequency states of  a CRE (number of instances which a CRE can show in any of the 1Kb upstream sequence from the reference set) are identified. The CRE RCCGAC for example can occur 0,1,2,3,4 or 5 times in a 1Kb upstream sequence. The program the compares the occurrence of each frequency state of the input CRE in the selected class to its occurrence in reference set .

Variants:
In this analysis the program generates all possible variants (differeing from the input CRE at the selected nucleotide position) and compare occurrence of each variant in the selected class to its occurrence in reference set. For CREs which contain six or less nucleotides the variants anlaysis for all the nucleotide positions can be performed in a single run by using the "all" option from the "Variant Position" menu. For longer CREs the user needs to select any nucleotide position of the CRE from the "Variant Position" menu.


*The reference set consists of 22229 1Kb upstream sequences(with respect ot TSS).
*The enrichment of  a CRE in all these cases is estimated on the basis of cumulative hypergeometric probablity of observing  greater or same number of CRE containing genes (in comparison to the class) in a random gene set of same size.



Class

 Category
Ontology
Classes Incuded in AtREA
GOBP Gene Ontology Biological Process
list of GOBP classes
GOMF Gene Onltology Molecular Function list of GOMF classes
GOCC Gene Onlolotgy Cellular Component list of GOCC classes
MIPS MIPS FUNCAT classes list of MIPS FUNCAT classes
ARACYC ARACYC pathways list of ARACYC classes

For example to analyse the features of a CRE is GOBP class GO:0000038 the user needs to select GOBP from the "category menu" and the class GO:0000038 from the "GOBP class" menu.
Select Class Category: 


GOBP class


  • To study the features of a CRE in  induced or repressed genes from any of the expression slides the corresponding option should be selected from the category menu. 

 Category Description  Expression slides
induced Genes which show induction in different microarray experiemnts

repressed Genes which show induction in different microarray experiemnts

The slide of interest can then be selected from the "expression slide" menu.

Expression slide

As the characterization of genes as induced on repressed depends on a expression score cutoff the user can specify expression cutoff  from the "Fold Cutoff" menu. The options include score cutoffs in log 2 format and range from 1.2 to 4 (i.e. ~1.4 to 16 folds).


   Fold Cutoff (only for expression classes) :

 


  • User can also analyse different features of a CRE in their own set of genes.For  this the category "UserGeneset" should be selected as category and a list of AGI codes( like At1g75820) for genes shold be entered in the UserGeneSet text input box.

Select Class Category: 


User Geneset






Output


Class feature
(strand)
Number of CRE containing genes in class(Gc) Number of genes in class(Tc) Number of CRE containing genes in reference set(Gr) Number of genes in reference set(Tr) Percentage of CRE containing gene is class (Pc = (Gc/Tc) X 100 ) Percentage of CRE containing genes in reference set ( Pr=(Gr/Tr) X 100 ) Ratio (Pc/Pr) Hypergeometic P-value Genes which contain input CRE
Coding
7 16 1901 22229 43.750 8.552 5.12 0.0001892 At1g68020,At2g22190,
At4g12430,At4g22590,
At4g39770,At5g51460,
At5g65140

                                                                                                                               Figure 1


The  output of AtREA CRE features module (Figure 1) contains the following columns

  • Column 1: The name of the feature subclass. (For position analysis the subclasses correspond to different position windows.For strand analysis the feature subclasses are coding ,reverse and both ie. different strands.For variants each variant is treated as a feature subclass.For frequency analysis each frequency state is considered as a subclass.)

  •  Column 2: Number  of genes from class which show the particular CRE feature.

  • Column 3: Number of genes in the class.
  • Column 4: Number of genes from the  reference set which show the particular CRE features.

  • Column 5: Number of genes in the  reference set .

  •  Column 6: Percentage of promoters from class which show the CRE feature.

  • Column 7: Percentage of promoters from reference set which show the CRE. feature

  • Column 8 (Hypergeometric P-vlaue) :  the cumulative hypergeometric P-values of observing eqaul or greater number of  CRE containing promoters in a set containing same number of genes by chance.  

  • Column 9 : list of genes from class which contain the user defined CRE in their upstream sequence.


 

Example

The dehydration response element binding (DREB) motif is known to be involved in transcription regulation under draught stress. We have used different analysis option to study the distribution of DREB consensus (RCCGAC) sequence in genes which are induced by draught stress(slide:drought_6h_shoot(1007966668#1029, fold cutoff: 1.4).

Example of Strand analysis  option

Table 1 shows the distribution of DREB consensus (RCCGAC) sequence (in different strands) among genes which are induced by drought stress(slide: drought_6h_shoot(1007966668#1029) , fold cutoff: 1.4).

CRE Strand Number of CRE containing genes in class(Gc) Number of genes in class(Tc) Number of CRE containing genes in reference set(Gr) Number of genes in reference set(Tr) Percentage of CRE containing gene is class( (Gc/Tc) X 100 ) Percentage of CRE containing genes in reference set ( (Gr/Tr) X 100 ) Hypergeometic P-value Genes which contain input CRE
coding 37 185 2895 22229 20.0000 13.0235 0.00488504236088069 At1g06460,At1g27630,At1g48330,At1g49720,At1g69830,At2g01520,At2g15880,At2g15970,At2g17840,At2g21660,At2g22450,At2g39030,At2g42530,At2g42540,At3g10410,At3g15630,At3g22740,At3g46640,At3g50970,At3g59350,At3g63160,At4g09020,At4g12280,At4g19120,At4g27440,At4g33700,At4g35770,At5g06690,At5g07010,At5g15960,At5g25210,At5g48250,At5g50450,At5g52310,At5g57630,At5g62360,At5g63810
reverse 42 185 2723 22229 22.7027 12.2498 4.96347784419354e-05 At1g22370,At1g26665,At1g29395,At1g51090,At1g75190,At1g76590,At2g02990,At2g16660,At2g19450,At2g25930,At2g28900,At2g29630,At2g34930,At2g38465,At2g39030,At2g42530,At2g42540,At2g47890,At3g05880,At3g18080,At3g22740,At3g26580,At3g28290,At3g46640,At3g47860,At3g48360,At3g50970,At3g53990,At3g62550,At4g01130,At4g02370,At4g09020,At4g11310,At4g12280,At4g19120,At4g23600,At4g30650,At5g03240,At5g06690,At5g26340,At5g47240,At5g57110
both 69 185 5221 22229 37.2973 23.4873 1.64100850699939e-05 At1g06460,At1g22370,At1g26665,At1g27630,At1g29395,At1g48330,At1g49720,At1g51090,At1g69830,At1g75190,At1g76590,At2g01520,At2g02990,At2g15880,At2g15970,At2g16660,At2g17840,At2g19450,At2g21660,At2g22450,At2g25930,At2g28900,At2g29630,At2g34930,At2g38465,At2g39030,At2g42530,At2g42540,At2g47890,At3g05880,At3g10410,At3g15630,At3g18080,At3g22740,At3g26580,At3g28290,At3g46640,At3g47860,At3g48360,At3g50970,At3g53990,At3g59350,At3g62550,At3g63160,At4g01130,At4g02370,At4g09020,At4g11310,At4g12280,At4g19120,At4g23600,At4g27440,At4g30650,At4g33700,At4g35770,At5g03240,At5g06690,At5g07010,At5g15960,At5g25210,At5g26340,At5g47240,At5g48250,At5g50450,At5g52310,At5g57110,At5g57630,At5g62360,At5g63810
Table 1


Example of Position analysis  option:

Table 2 shows the distribution of DREB consensus (RCCGAC) sequence (in different position windows) in genes which are induced by draught stress(slide: drought_6h_shoot(1007966668#1029) , fold cutoff: 1.4). The results show that the frequency of the DREB consensus sequence increases as we move towards the TSS ,with maximum frequency in the 800-1000 window,in the selected gene set as well as the reference set.

CRE   Position Number of CRE containing genes in class(Gc) Number of genes in class(Tc) Number of CRE containing genes in reference set(Gr) Number of genes in reference set(Tr) Percentage of CRE containing gene is class( (Gc/Tc) X 100 ) Percentage of CRE containing genes in reference set ( (Gr/Tr) X 100 ) Hypergeometic P-value Genes which contain input CRE
0 - 200 17 185 1088 22229 9.1892 4.8945 0.00953949701390473 At1g27630,At1g49720,At2g15880,At2g19450,At2g29630,At2g39030,At3g18080,At3g22740,At3g46640,At3g48360,At3g53990,At3g63160,At4g01130,At4g09020,At4g27440,At5g57110,At5g63810
200 - 400 12 185 1038 22229 6.4865 4.6696 0.157457096695083 At1g06460,At1g22370,At2g02990,At2g39030,At2g42530,At3g26580,At3g62550,At4g02370,At4g09020,At4g19120,At5g06690,At5g26340
400 - 600 13 185 1088 22229 7.0270 4.8945 0.121828404294242 At1g75190,At1g76590,At2g01520,At2g15970,At2g25930,At3g05880,At3g15630,At3g47860,At4g09020,At4g12280,At4g33700,At5g48250,At5g57630
600 - 800 22 185 1165 22229 11.8919 5.2409 0.000287070980858587 At1g29395,At1g48330,At1g51090,At1g69830,At1g75190,At1g76590,At2g17840,At2g34930,At2g47890,At3g10410,At3g28290,At3g46640,At3g50970,At3g59350,At4g12280,At4g23600,At4g35770,At5g07010,At5g25210,At5g50450,At5g52310,At5g62360
800 - 1000 18 185 1419 22229 9.7297 6.3836 0.0492800406452404 At1g26665,At2g16660,At2g21660,At2g22450,At2g28900,At2g38465,At2g42530,At2g42540,At3g22740,At3g46640,At3g50970,At3g53990,At4g11310,At4g30650,At5g03240,At5g15960,At5g47240,At5g52310
Table 2

Example of Frequency analysis  option

Table 3 shows the distribution of different frequency states of DREB consensus sequence (RCCGAC) sequence among genes which are induced by draught stress( slide: drought_6h_shoot(1007966668#1029) , fold cutoff: 1.4).
 
CRE Frequency Frequency of Occurrnce of CRE in class(Fc) Number of CRE containing genes in class(Gc) Number of genes in class(Tc) Frequency of occurrence of CRE in reference set(Fr) Number of CRE containing genes in reference set(Gr) Number of genes in reference set(Tr) Class Occurrence of CRE (Co=Fc/Tc) Reference set occurrence of CRE (Ro=Fr/Tr) Occurrence Ratio (=Co/Ro) Percentage of CRE containing gene is class( (Gc/Tc) X 100 ) Percentage of CRE containing genes in reference set ( (Gr/Tr) X 100 ) Hypergeometic P-value P-value estimated by bootstrap method Genes which contain input CRE
1 51 51 185 4355 4355 22229 0.27568 0.19592 1.41 27.5676 19.5915 0.00525809532467544 0.01 At1g06460,At1g22370,At1g26665,At1g27630,At1g29395,At1g48330,At1g49720,At1g69830,At2g01520,At2g02990,At2g15880,At2g16660,At2g17840,At2g19450,At2g21660,At2g22450,At2g25930,At2g29630,At2g34930,At2g38465,At2g47890,At3g05880,At3g10410,At3g15630,At3g18080,At3g26580,At3g28290,At3g47860,At3g48360,At3g62550,At3g63160,At4g01130,At4g02370,At4g11310,At4g23600,At4g27440,At4g30650,At4g33700,At4g35770,At5g03240,At5g07010,At5g15960,At5g25210,At5g26340,At5g47240,At5g48250,At5g50450,At5g57110,At5g57630,At5g62360,At5g63810
2 14 14 185 712 712 22229 0.07568 0.03203 2.36 7.5676 3.2030 0.00261942026992649 <0.01 At1g51090,At1g75190,At1g76590,At2g15970,At2g28900,At2g39030,At2g42530,At2g42540,At3g22740,At3g53990,At3g59350,At4g12280,At4g19120,At5g06690
3 3 3 185 133 133 22229 0.01622 0.00598 2.71 1.6216 0.5983 0.0996678212609147 0.08 At3g46640,At3g50970,At4g09020
4 1 1 185 16 16 22229 0.00541 0.00072 7.51 0.5405 0.0720 0.125201681897977 0.13 At5g52310
5 0 0 185 5 5 22229 0 0.00022 - 0 0.0225 - - -
Table 3

Example of Variant analysis  option

Table 4 shows the distribution of DREB consensus related sequence GCCGAC (we can only use A,C,G and T containing CREs in the variant option) sequence (in different position windows) in genes which are induced by draught stress(slide: drought_6h_shoot (1007966668#1029) , fold cutoff: 1.4). The results show that only the variant ACCGAC(which along with GCCCAC constitutes the DREB consensus ) is significantly enriched in the selected gene set.

CRE Variant Number of CRE containing genes in class(Gc) Number of genes in class(Tc) Number of CRE containing genes in reference set(Gr) Number of genes in reference set(Tr) Percentage of CRE containing gene is class( (Gc/Tc) X 100 ) Percentage of CRE containing genes in reference set ( (Gr/Tr) X 100 ) Hypergeometic P-value Genes which contain input CRE
GCCGAC 32 185 2078 22229 17.2973 9.3481 0.000479464125515584 At1g51090,At1g76590,At2g02990,At2g15970,At2g16660,At2g25930,At2g28900,At2g39030,At2g42530,At2g42540,At3g18080,At3g46640,At3g47860,At3g50970,At3g53990,At3g59350,At3g63160,At4g01130,At4g02370,At4g09020,At4g11310,At4g12280,At4g19120,At4g30650,At4g33700,At5g06690,At5g07010,At5g25210,At5g52310,At5g57630,At5g62360,At5g63810
aCCGAC 51 185 3545 22229 27.5676 15.9476 3.98593423864847e-05 At1g06460,At1g22370,At1g26665,At1g27630,At1g29395,At1g48330,At1g49720,At1g69830,At1g75190,At1g76590,At2g01520,At2g15880,At2g15970,At2g17840,At2g19450,At2g21660,At2g22450,At2g29630,At2g34930,At2g38465,At2g39030,At2g42530,At2g42540,At2g47890,At3g05880,At3g10410,At3g15630,At3g22740,At3g26580,At3g28290,At3g46640,At3g48360,At3g50970,At3g53990,At3g59350,At3g62550,At4g09020,At4g12280,At4g19120,At4g23600,At4g27440,At4g35770,At5g03240,At5g06690,At5g15960,At5g26340,At5g47240,At5g48250,At5g50450,At5g52310,At5g57110
tCCGAC 26 185 3434 22229 14.0541 15.4483 0.730643779073995 At1g22370,At1g33970,At1g49720,At1g51940,At1g75190,At1g76590,At1g79440,At2g15890,At2g15960,At2g19450,At2g43550,At3g01310,At3g10410,At3g22740,At3g50700,At4g01130,At4g02370,At4g11310,At4g11600,At4g15210,At4g30690,At4g33980,At4g39090,At5g15960,At5g35735,At5g54960
cCCGAC 11 185 1693 22229 5.9459 7.6162 0.841664135891827 At1g07040,At1g53885,At2g01520,At2g28840,At2g43550,At3g28290,At3g47160,At4g09020,At4g30650,At4g33700,At5g03240
GaCGAC 32 185 4545 22229 17.2973 20.4463 0.878007079432752 At1g14250,At1g17665,At1g20010,At1g22740,At1g29395,At1g51090,At1g79440,At2g19450,At2g21660,At2g25930,At2g28840,At2g28900,At2g33830,At2g34930,At2g38465,At2g39330,At2g39900,At3g01310,At3g10410,At3g26580,At3g50700,At3g59350,At3g63160,At4g14230,At4g14270,At4g17470,At4g27440,At4g34950,At4g35770,At4g39260,At5g23660,At5g63810
GtCGAC 17 185 1601 22229 9.1892 7.2023 0.179949411621242 At1g10760,At2g15970,At2g17840,At2g21130,At2g33830,At2g34810,At2g38240,At2g43550,At2g45560,At3g47860,At4g15210,At4g34950,At4g39260,At5g05600,At5g23660,At5g61380,At5g62360
GgCGAC 11 185 1928 22229 5.9459 8.6734 0.933826693358881 At1g07040,At1g28050,At1g75190,At2g15830,At3g59930,At4g15210,At4g18240,At5g23240,At5g54960,At5g62360,At5g62720
GCaGAC 20 185 2742 22229 10.8108 12.3352 0.768224205053058 At1g02300,At1g06460,At1g22740,At1g22770,At1g70420,At1g77210,At2g25730,At2g43510,At3g52180,At3g59930,At3g63160,At4g11310,At4g12280,At4g14270,At4g15660,At4g30690,At5g05600,At5g48250,At5g57550,At5g62360
GCtGAC 24 185 2940 22229 12.9730 13.2260 0.573577158795241 At1g26665,At1g29395,At1g49720,At1g69830,At1g70420,At2g15830,At2g22450,At2g25730,At2g29630,At2g34930,At3g01310,At3g10410,At3g15450,At3g46640,At3g47340,At3g49620,At3g52180,At4g01130,At4g19390,At4g30660,At4g33980,At4g35770,At4g39260,At5g06870
GCgGAC 7 185 1066 22229 3.7838 4.7955 0.789104033261681 At1g06460,At1g22570,At2g34930,At3g07650,At5g11150,At5g54960,At5g67480
GCCaAC 35 185 4094 22229 18.9189 18.4174 0.459740187574122 At1g17665,At1g22370,At1g49720,At1g69830,At1g77210,At2g01520,At2g16660,At2g21130,At2g29630,At2g34930,At2g38240,At2g39330,At2g39900,At2g42540,At2g47890,At3g26740,At3g28290,At3g50970,At3g63160,At4g04330,At4g11310,At4g12280,At4g19390,At4g27440,At4g30650,At4g30660,At4g39090,At5g06690,At5g07010,At5g14550,At5g23660,At5g26340,At5g27280,At5g62720,At5g63810
GCCtAC 18 185 2092 22229 9.7297 9.4111 0.477495223930389 At1g26665,At1g52410,At1g68050,At2g28840,At2g33830,At2g38465,At2g42540,At3g07650,At3g59350,At4g12280,At4g14270,At4g17470,At4g35770,At5g14920,At5g26570,At5g48250,At5g61380,At5g63810
GCCcAC 28 185 2770 22229 15.1351 12.4612 0.159762078229407 At1g06460,At1g07040,At1g10760,At1g11210,At1g14250,At1g22740,At1g22770,At1g28330,At1g68050,At1g80480,At2g25930,At2g39900,At2g43550,At3g07650,At3g18080,At3g46640,At3g49620,At3g50970,At3g63160,At4g39260,At5g07010,At5g11150,At5g14550,At5g25210,At5g43440,At5g54960,At5g57110,At5g62360
GCCGtC 23 185 2376 22229 12.4324 10.6887 0.251716257316645 At1g28050,At1g28330,At1g33970,At1g69830,At1g79440,At2g25730,At2g28900,At2g43550,At3g29320,At3g49620,At3g53800,At3g62550,At4g01130,At4g02370,At4g12280,At4g30690,At4g35770,At4g39260,At5g25210,At5g42900,At5g59320,At5g64860,At5g67480
GCCGgC 4 185 750 22229 2.1622 3.3740 0.874512206479939 At1g20010,At3g49620,At4g15210,At4g15660
GCCGcC 22 185 1895 22229 11.8919 8.5249 0.0698733913297847 At1g02300,At1g22370,At1g33970,At2g25930,At2g42540,At2g43550,At2g45560,At3g26580,At3g46640,At3g49620,At3g50500,At3g50970,At3g59350,At3g59930,At3g63160,At4g11360,At4g11600,At4g12280,At4g19390,At4g26670,At5g35735,At5g54960
GCCGAa 18 185 2766 22229 9.7297 12.4432 0.894776958560156 At1g07050,At1g20010,At1g53885,At2g15830,At2g21660,At2g34810,At2g38240,At2g42540,At3g10410,At3g46640,At3g47160,At4g16146,At4g35770,At5g15960,At5g24470,At5g57110,At5g61380,At5g67480
GCCGAt 19 185 2455 22229 10.2703 11.0441 0.666599710422168 At1g06460,At1g14250,At1g20010,At2g25730,At2g34810,At2g34930,At2g43550,At2g47890,At3g62550,At4g02370,At4g14230,At4g16146,At4g35770,At5g15960,At5g43440,At5g54960,At5g57630,At5g62720,At5g67480
GCCGAg 14 185 2023 22229 7.5676 9.1007 0.801715143071705 At1g07050,At1g22370,At1g26665,At1g29395,At1g75190,At2g21660,At2g43550,At3g26580,At3g26740,At3g46970,At3g53800,At3g63160,At4g12280,At4g14270
Table 4