[Biococoa-dev] BCSequenceCluster

Fri Oct 7 04:53:00 EDT 2005

Hi Charles and Koen,

First of all: I'm not a sequence cluster expert and I have never used  
it for my own research.

As far as I know, sequence clustering is a way to align homologous  
sequences, detect regions of high sequence similarity and describe  
the differences between sequences. This is especially useful to  
create non-redundant datasets: for example, if you'd align hemoglobin  
genes for 5 mammals, you will end up with almost 5 times the same  
information. So if you'd just store one of the five sequences and  
store the differences between this sequence (the referenceSequence)  
and the other 4 sequences, you will save a lot of memory and disk  
space (maybe not for this example but on a genomic scale) without  
losing any information. It's highly comparable to JPEG compression.

Back to BC: I don't think we need this cluster functionality in  
BioCocoa, at least for now. For the I/O methods we only need a good  
container class for BCSequences as you are pointing out. With this  
approach we would store the gaps directly inside the BCSequences  
(using the gap symbol). This is definitely the easiest way to  
implement it.
The only thing we would need to do when going this route - to  
compensate for the fact that sequences don't have gaps in reality -  
is that we should add a method to BCSequence that returns the  
sequence without gaps (the 'real' sequence).

To answer Charles' question: right now, the only purpose for the  
BCSequenceGroup would be to make I/O easier. But in the future, we  
could add extra methods to this class to enable alignment of the  
BCSequenceGroup (using BCAlignment) or to return a list of shared  
indels, for example. This BCSequenceGroup could also be the perfect  
class to pass as an argument to classes that do phylogenetic analysis.

So in the future, we could do things like:

BCSequenceGroup *group = [BCSequenceGroup  
groupWithFile:@"myFastaFile.fst"];
[group align];
BCPhylogeneticTree *tree = [group  
analyzeUsingHeuristicSearchWithReplicates: 1000];

Cheers,

Peter

On 06 Oct 2005, at 23:45, Charles Parnot wrote:

> At this point, given that I don't know that much about the fine  
> details of sequence clusters and sequence groups, could you, Peter,  
> take some time to explain exactly what the concept is, and also  
> maybe come up with some examples of what it can be used for and how  
> a user of the framework would want to use it. This way, we can  
> define a header that does the job, and then worry about the  
> implementation. In fact, I should have asked that question in the  
> first place instead of pretending I understood what it was all about!
>
> Sorry maybe this is a quite wide question. We don't have to go too  
> deep at this point, as we merely want some I/O to work. However, in  
> 'I/O', there is 'O' for output, so the question is: after loading a  
> sequence from disk, what information will the user want to  
> retrieve? Or will she just want to perform some operations on the  
> sequence group and then move on?
>
> cheers,
>
> charles
>
> On Oct 6, 2005, at 5:26 AM, Peter Schols wrote:
>
>
>> Hi Koen,
>>
>>
>>
>>> In the case where the input sequences are already aligned, we're  
>>> ready to go, I guess. Also note we already have a BCAlignment  
>>> class which we could use. However, I have not yet adapted that to  
>>> use the NSData structure. I didn't write the alignment code, and  
>>> I don't want to mess up someone elses work ;-)
>>>
>>> I added the representative sequence after looking up some info  
>>> about sequence clusters (see eg http://en.wikipedia.org/wiki/ 
>>> Sequence_clustering). But if you suggest it is not needed in this  
>>> case, I'll be happy to remove it!
>>>
>>>
>>
>> I don't think we need the sequence cluster functionality for the  
>> BCSequenceGroup class. We could reserve the name BCSequenceCluster  
>> for such a class we could eventually create in the future (if  
>> there is any need for this). The BCSequenceCluster class could  
>> then inherit from BCSequenceGroup.
>>
>> Peter
>>
>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>> _______________________________________________
>> Biococoa-dev mailing list
>> Biococoa-dev at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>>
>>
>
> --
> Xgrid-at-Stanford
> Help science move fast forward:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford
>
> Charles Parnot
> charles.parnot at gmail.com
>
>
>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm