[Biococoa-dev] BCSequenceCluster

Charles Parnot charles.parnot at gmail.com
Mon Oct 10 02:04:30 EDT 2005


> I think this is a very good suggestion, I totally agree. One class  
> should fit them all. The only potential "problem" is that the  
> overlap with contigs is quite small, so the BCSequences will  
> contain plenty of gaps while with the other "formats", the  
> sequences will be much more similar. In fact that's not really a  
> problem at all... just a thought. For entire genomes, this might  
> become a problem, though, as every sequence might become very long.

I suppose we can at least use just one offset value, which is where  
each sequence is actually positioned, and then have gaps in between  
symbols as needed. So just one integer per sequence: instead of  
starting a sequence ith 400 gaps, store its position as being '400'.  
The position is relative to the leftmost sequence (and with circular  
sequence, well, euh, we don't do that!!).

But this is an implementation detail. Whatever Koen prefers is fine  
with me if he writes the code ;-)

charles


--
Xgrid-at-Stanford
Help science move fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford

Charles Parnot
charles.parnot at gmail.com







More information about the Biococoa-dev mailing list