[BiO BB] base counting
Corné HW Klaassen
c.klaassen at cwz.nl
Thu Mar 16 04:36:26 EST 2006
Hi Peter,
Thanks for the quick reply. On paper this is exactly what I'm looking
for but ......I gave compseq a try and it doesn't seem to work on
features larger than 20 nt whereas I'm particularly interested in
features 40-140 nt (I realize that this can be a very computational
intensive job). Any other suggestions? Is there perhaps something
similar for protein sequences or on some other arbitrary units?
Corné
>> I remember having seem this once but I do not recollect exactly where
>> so I'll just pop this question here:
>> Does anyone know of a free software package (windows or on-line) that
>> analyzes the frequency or counts all possible combinations of bases
>> in a given sequence (single bases, dinucl. trinucl. tetranuc. etc.).
>
>
> compseq from EMBOSS will do this. For example, it will find in E.coli
> sequences the dramatic underrepresentation of CTAG (or CCTAG and
> CTAGG) due to mismatch repair mechanisms.
>
> To find such features on a range of scales, the chaos program in
> EMBOSS (Chaos Game Representation) can also be useful. The above
> feature shows as sets of white boxes. CpG features in mammalian
> genomes also appear in the plot. Shorter sequences take up larger
> areas of the plot. Once you know the scale of the feature you are
> looking for, a compseq run will report the under or over represented
> sequences.
>
> Hope that helps,
>
> Peter Rice
>
> _______________________________________________
> Bioinformatics.Org general forum -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
More information about the BBB
mailing list