Code |
Implication |
---|---|
M |
A or C |
R |
A or G |
W |
A or T |
S |
C or G |
Y |
C or T |
K |
G or T |
V |
A or C or G |
H |
A or C or T |
D |
A or G or T |
B |
C or G or T |
N |
G or A or T or C |
Consensus |
Position Frequency matrix Matrix score cutoff: |
CRE Pairs First CRE
consensus sequence :
Second CRE consensus
sequence :
Minimum allowed distance: Miximum allowed distance: |
Multiple CREs
|
Five different
ontology categories can be analysed for the enrichment of the user
provided CRE .
Category
header |
Ontology |
GOBP | Gene
Ontology Biological Process |
GOMF | Gene Onltology Molecular Function |
GOCC | Gene Onlolotgy Cellular Component |
MIPS | MIPS FUNCAT classes |
ARACYC | ARACYC pathways |
In addition to
ontology classes genes induced/repressed under different conditions identifed from different microarray slides can
be also anlaysed using AtrREA (please see documentation for details).
The option "induced" can be used to study the over-representation of
the user provided CRE in induced genes from each of the slides in the
expression dataset. The selection of "repressed" option from the class
category menu similarly analyses the overrepresntation of the provided
CRE in repressed genes from each of the slides.
As the
characterization
of genes as induced on repressed depends on a expression score cutoff
the use of induced and repressed option require a expression cutoff to
be selected from the "Fold Cutoff" menu. The options include score
cutoffs in log 2 format and range from 1.2 to 4 (i.e. ~1.4 to
16 folds). The selection of the option "induced" from the
category menu and 2 from fold cutoff menu therefore analyses
the distribution of the CRE in genes which show 4 fold or greater
induction in each slide whereas the selection of "repressed"
option from the category menu and 2 from fold cutoff
menu analyses
the distribution of the CRE in genes which show 4 fold or greater
repression in each slide from the expression dataset.
Fold Cutoff (only for expression classes) : |
To incorporate strand,
position or frequency preference (if already known) information in the CRE
distribution anlaysis these features can be specified from the
sequence features menu.
Position
Windows: In addition to the entire 1kb upstream region (from TSS) which
is the default(0-1000) five different position windows can be be
selected from this menu.The position window 800-1000 is closest to the
TSS weheras the window 0-200 is most distant from the TSS.
Strand- The options in the strand menu include coding,reverse and both. When the both strands option is selected the program searches both the strands for the input CRE and reports presence when the CRE is detected in any of the strands.
The
enrichment of a CRE in a class is estimated on the basis of
hypergeometric test (for a brief introduction on hypergeomteric test see this link). For each class (from the selected category) the program calculates the cumulative hypergeometric probablity (the probablity of observing the same or greater number of CRE containing genes in a randomly selected gene set of same size) of occurrence of CRE containing genes. As multiple classes are anlysed simulatenously the
P-values obtained by hyperprgeometric test are adjusted using Bonferroni correction method. The P-value cutoff
option allows the user select a P-value threshold. Classes from the selected category for which the P-vlaue of enrichment of the input CRE is lower than the specified cutoff are displayed in the output.
Class | Number of CRE containing genes in class(Gc) | Number of genes in class(Tc) | Percentage of CRE containing gene is class (Pc = (Gc/Tc) X 100 ) | Number of CRE containing genes in reference set(Gr) | Number of genes in reference set(Tr) | Percentage of CRE containing genes in reference set ( Pr=(Gr/Tr) X 100 ) | Ratio (Pc/Pr) | Hypergeometic P-value | Corrected P-value | Genes which contain input CRE |
GO:0004805(trehalose-phosphatase activity) | 7 | 16 | 43.750 | 1901 | 22229 | 8.552 | 5.12 | 0.0001892 | 0.04959604 | At1g68020,At2g22190, At4g12430,At4g22590, At4g39770,At5g51460, At5g65140 |
The output of
AtREA CRE distribution module (Figure 1) contains the
following columns
The first
column contains the name of the functional class or expression slide
along with links to sites that contain more information regarding the
functional class or slide.
The next three column(2-4) contains the number of CRE containing promoters in class, the total number of genes in the class and the percentage of promoters from class which contain the CRE.
The next three columns(5-7) contain the same set of data for the reference set (which contains 22229 1Kb upstream sequence).