[BiO BB] GC variation in Genome

Alex Milowski alex at milowski.com
Tue Oct 14 11:00:14 EDT 2003


On Tuesday, October 14, 2003, at 03:55  AM, Adarsh Ramakumar wrote:

> Could you guys suggest how I could do them or rather
> best way of doing it? I mean is there any software
> that you know off where I can graphically see the
> variations in the contigs with respect to GC?

You could also check out the software that produces Chaos Game
Fractals from genome sequences.  There are many variations
available out there.  They "visualize" the sequences
and subsequences quite compactly.

The nice thing is that if you count on a discrete lattice, each
lattice point corresponds to a unique subsequence (or partial
subsequence).  Thus, you can calculate the fractal and count
many different subsequences that contain GC at once.

I wrote a paper on using this fractal to calculate CpG islands:

    http://www.milowski.com/display.jsp?doc=math/cpggame/

...and I have some software which I can make available on my website.
I just haven't gotten around to it... :)

There is also a really nice recent article by Almedia et. al
titled "Analysis of Genomic Sequences by Chaos Game Representation" that
was published in Bioinfomatics.  They talk about methods for counting
fractional sequences (amongst other things).

Ultimately, the original source is an article by Jeffery titled
"Chaos Game Representation of Gene Structure" published in
Nucleic Acids Research.

Depending on what you want, a lattice-based Chaos Game fractal could 
quickly
count GC-containing subsequences and give you a way to visualize them.


Alex Milowski                FAX: (707) 598-7649                        
  alex at milowski.com

"The excellence of grammar as a guide is proportional to the paucity of 
the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics





More information about the BBB mailing list