[BiO BB] Starting point EST annotation/grouping

Bossers, Alex Alex.Bossers at wur.nl
Fri Aug 11 02:02:16 EDT 2006



No I did not use the PHRED/PHRAP/CONSED package for this (I did try it
but found the TGICL suite more appropriate for ESTs).


For basecalling I used a windows platform based caller since PHRED at
that time did not support our used dyes for sequencing.. :(

Thereafter the TGICL basically uses megablast to cluster all sequences
into groups and than assembles it using the cap3 assembler.


Clustering and assembling is always difficult to explain. As far as I
understand the clustering of large groups of sequences speeds up the
second step; assembling. Basically you end up with contigs (having more
than one sequence) or singletons.


With unigene list I mean a list of all different genes present in my
sample of 13k. Like the Gene indices lists of species present at TIGR.



Now the next steps.







Van: ahmed.moustafa at gmail.com [mailto:ahmed.moustafa at gmail.com] Namens
Ahmed Moustafa
Verzonden: donderdag 10 augustus 2006 23:53
Aan: Bossers, Alex
Onderwerp: Re: [BiO BB] Starting point EST annotation/grouping


Hi Alex,


Regarding the steps that you mentioned to process ESTs, I would like to
ask you:


(1) What is the difference between clustering and assembling the ESTs?

(2) What is a unigene?


BTW, are you using the Phred/Phrap/Consed suite?

Thanks in advance!



On 8/10/06, Bossers, Alex <Alex.Bossers at wur.nl> wrote: 

Dear all,


I am looking for a point where to start in "annotating" our groups of



            I have a DB containing my 13k ESTs. 

            I clustered/assembled them after cleaning using the Tiger
TGICL tools.

            Now I have a "unigene list" of contigs and singletons (about
7k features).

            BLAST made it possible to "annotate" about half of the
sequences further with known (useful) information. 


Now I want to do the following (at best using (semi) automated
conditions (web based or linux based tools));

1.       I would like to know how many genes of which groups/classes are
available and which genes of my set belong to it (i.e. Apoptosis,
development, immunology,........ This is to get an overview of the genes
present and how complete the repertoire is we finally put onto our

2.       Are some of these genes normally tissue specific?



Any help to get me started into the right direction is appreciated,





General Forum at Bioinformatics.Org -
BiO_Bulletin_Board at bioinformatics.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20060811/ded6166e/attachment.html>

More information about the BBB mailing list