AtREA


Datasets used in AtREA



AtREA contains an integrated database containing Arabidopsis upstream sequences, gene ontology(GO) , ARACYC pathways and MIPS FUNCAT annotations and microarray data from 1388 microarray slides. The source of these data and preprocessing steps that were applied to this data before integrating into AtREA are explained below.

1)1Kb nucleotide sequence upstream of trancription start site(TSS) of all Arabidopsis genes(according to TAIR7) were downloaded from TAIR

2) Normalized and log(base2) transfromed data for the expression of 22263* Arabidopsis genes (corresponding to Affymetrix 22k chip) in 1388 array slides was downloaded from ATTED database (please refer ATTED database help section for normalization, slide categorization and other details). From the dataset slides lableled as "control","mock" and "no treatment" were removed. As the slides in ATTED dataset include many replicate sets, for each such set a single slide with avarage expression value of the corresponding replicates was considered .

3) Gene Ontology and Aracyc Pathway ontologies of all Arabidopsis genes were downloaded from TAIR. MIPS FUNCAT ontology of Arabidopsis gene was downloaded from the MIPS database.
The GO classes were separated into three major categories GOBP (Gene Ontolgy Biological Process), GOMF (Gene Ontolgy Molecular Function ), and GOCC (Gene Ontolgy Cellular Component). The GOBP, GOMF, GOCC, Aracyc pathway and MIPS ontology classes represented by at least 8 genes(in our dataset) were then selected out and grouped into the corresponding categorires.


*upstream sequence of 22229 genes were found in the TAIR database(TAIR7 version). Information related to these 22229 genes were used to construct the dataset of AtREA.
** GO classes which are very large but do not correspond to characterized functions ( Biological Process Unknown, Molecular Function Unknown and Cellular Component Unknown) were excluded .


Relationship between different modules

The modules for CRE feature and promoter state analysis have been desgined to run in combination AtREA CRE distribution module. While the AtREA CRE distribution module can analyse multiple functional classes and microarray slides in a single run(batch mode) the CRE features and states modules have been targetted to compare the features and states in a single class only.  The results obtained by using these two modules for a single class, can however, be incorporated into the CRE distribution module inputs which enable them to be analysed in batch mode also. Table 1 shows how the single classes that can be used for feature analysis should be selected from CRE distribution analysis module and also how  the results obtained from feature analysis can again be incorporated into the CRE distribution analsis module (the last column) .

The promoter state analysis module have been desgined to run in combination AtREA CRE distribution module. While the AtREA CRE distribution module can analyse multiple functional classes and microarray slides in a single run(batch mode) the promoter state analysis modules have been targetted to compare promoter states in a single class only.  The results obtained by using this modules for a single class, can however, be incorporated into the CRE distribution module inputs which enable them to be analysed in batch mode also. Table 2  shows how the single classes that can be used for state analysis should be selected from CRE distribution analysis module and also how  the results obtained from state analysis can again be incorporated into the CRE distribution analsis module (the last column) .



Outcome

Outcome
Input CRE

 (Single consensus format)


Distribution analysis
------------------->
Identify putative functional tagerts among GO, MIPS ,ARACYC pathway classes.

Select  conditions
or classes where the CRE shows high enrichment of the CRE


CRE Feature analysis
----------------->
if positon preference detected
select  position window in which the CRE shows maximum enrichment
if strand preference detected
select  strand in which CRE shows maximum enrichment
Identify conditions ( from microarray slide) where the CRE may be signifincat in expression regulation

if CRE frequency effects gene expression. Select a minimum frequency
if any of the variants shows significantly highenrichment Incorporate the variations in original CRE sequence

<---------------------------------------------------------------------------------------------------------------------------------<
Change parameters (input CRE, Sequence features and minimum freqeuncy) and perform CRE distribution analysis

Table 1. Relationship between CRE distribution and CRE feature analysis modules





Outcome

Outcome
Input CRE

(Single consensus format)


Distribution analysis
------------------->
Identify putative functional tagerts among GO, MIPS ,ARACYC pathway classes.

Select  conditions
or classes where the CRE shows high enrichment of the CRE


CRE State analysis
----------------->



Select one or more CREs from literature,
experimental or computaional analysis known to function in association with the input CRE
If expression of genes which contain a second CRE along with the input CRE (CRE combination)  differ from the expression of genes which contain only the input CRE
Select the CRE combination
Identify conditions ( from microarray slide) where the CRE may be signifincat in expression regulation


<---------------------------------------------------------------------------------------------------------------------------------<
 perform CRE distribution analysis with multiple CRE option with the combination as input



Table 2. Relationship between CRE distribution and promoter states analysis modules