[Biococoa-dev] BCSequenceCluster

Fri Oct 7 18:12:54 EDT 2005

On Oct 7, 2005, at 4:53 AM, Peter Schols wrote:

> Hi Charles and Koen,
>
> First of all: I'm not a sequence cluster expert and I have never  
> used it for my own research.

Neither am I :)

> Back to BC: I don't think we need this cluster functionality in  
> BioCocoa, at least for now. For the I/O methods we only need a good  
> container class for BCSequences as you are pointing out. With this  
> approach we would store the gaps directly inside the BCSequences  
> (using the gap symbol). This is definitely the easiest way to  
> implement it.
> The only thing we would need to do when going this route - to  
> compensate for the fact that sequences don't have gaps in reality -  
> is that we should add a method to BCSequence that returns the  
> sequence without gaps (the 'real' sequence).

So this assumes that the input file already has the gaps in place?

>
> To answer Charles' question: right now, the only purpose for the  
> BCSequenceGroup would be to make I/O easier. But in the future, we  
> could add extra methods to this class to enable alignment of the  
> BCSequenceGroup (using BCAlignment) or to return a list of shared  
> indels, for example. This BCSequenceGroup could also be the perfect  
> class to pass as an argument to classes that do phylogenetic analysis.
>
> So in the future, we could do things like:
>
> BCSequenceGroup *group = [BCSequenceGroup  
> groupWithFile:@"myFastaFile.fst"];
> [group align];
> BCPhylogeneticTree *tree = [group  
> analyzeUsingHeuristicSearchWithReplicates: 1000];

That sounds like a good plan to me. The only thing that I now see  
what I don't like is the name BCSequenceGroup. Using this assumes  
that there is always a group of sequences that need to be imported,  
while often there is only one. But I cannot think of anything else  
right now :(

cheers,

- Koen.