[Biococoa-dev] BCSequenceCluster
Peter Schols
peter.schols at bio.kuleuven.be
Fri Oct 7 04:53:00 EDT 2005
Hi Charles and Koen,
First of all: I'm not a sequence cluster expert and I have never used
it for my own research.
As far as I know, sequence clustering is a way to align homologous
sequences, detect regions of high sequence similarity and describe
the differences between sequences. This is especially useful to
create non-redundant datasets: for example, if you'd align hemoglobin
genes for 5 mammals, you will end up with almost 5 times the same
information. So if you'd just store one of the five sequences and
store the differences between this sequence (the referenceSequence)
and the other 4 sequences, you will save a lot of memory and disk
space (maybe not for this example but on a genomic scale) without
losing any information. It's highly comparable to JPEG compression.
Back to BC: I don't think we need this cluster functionality in
BioCocoa, at least for now. For the I/O methods we only need a good
container class for BCSequences as you are pointing out. With this
approach we would store the gaps directly inside the BCSequences
(using the gap symbol). This is definitely the easiest way to
implement it.
The only thing we would need to do when going this route - to
compensate for the fact that sequences don't have gaps in reality -
is that we should add a method to BCSequence that returns the
sequence without gaps (the 'real' sequence).
To answer Charles' question: right now, the only purpose for the
BCSequenceGroup would be to make I/O easier. But in the future, we
could add extra methods to this class to enable alignment of the
BCSequenceGroup (using BCAlignment) or to return a list of shared
indels, for example. This BCSequenceGroup could also be the perfect
class to pass as an argument to classes that do phylogenetic analysis.
So in the future, we could do things like:
BCSequenceGroup *group = [BCSequenceGroup
groupWithFile:@"myFastaFile.fst"];
[group align];
BCPhylogeneticTree *tree = [group
analyzeUsingHeuristicSearchWithReplicates: 1000];
Cheers,
Peter
On 06 Oct 2005, at 23:45, Charles Parnot wrote:
> At this point, given that I don't know that much about the fine
> details of sequence clusters and sequence groups, could you, Peter,
> take some time to explain exactly what the concept is, and also
> maybe come up with some examples of what it can be used for and how
> a user of the framework would want to use it. This way, we can
> define a header that does the job, and then worry about the
> implementation. In fact, I should have asked that question in the
> first place instead of pretending I understood what it was all about!
>
> Sorry maybe this is a quite wide question. We don't have to go too
> deep at this point, as we merely want some I/O to work. However, in
> 'I/O', there is 'O' for output, so the question is: after loading a
> sequence from disk, what information will the user want to
> retrieve? Or will she just want to perform some operations on the
> sequence group and then move on?
>
> cheers,
>
> charles
>
> On Oct 6, 2005, at 5:26 AM, Peter Schols wrote:
>
>
>> Hi Koen,
>>
>>
>>
>>> In the case where the input sequences are already aligned, we're
>>> ready to go, I guess. Also note we already have a BCAlignment
>>> class which we could use. However, I have not yet adapted that to
>>> use the NSData structure. I didn't write the alignment code, and
>>> I don't want to mess up someone elses work ;-)
>>>
>>> I added the representative sequence after looking up some info
>>> about sequence clusters (see eg http://en.wikipedia.org/wiki/
>>> Sequence_clustering). But if you suggest it is not needed in this
>>> case, I'll be happy to remove it!
>>>
>>>
>>
>> I don't think we need the sequence cluster functionality for the
>> BCSequenceGroup class. We could reserve the name BCSequenceCluster
>> for such a class we could eventually create in the future (if
>> there is any need for this). The BCSequenceCluster class could
>> then inherit from BCSequenceGroup.
>>
>> Peter
>>
>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>> _______________________________________________
>> Biococoa-dev mailing list
>> Biococoa-dev at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>>
>>
>
> --
> Xgrid-at-Stanford
> Help science move fast forward:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford
>
> Charles Parnot
> charles.parnot at gmail.com
>
>
>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
More information about the Biococoa-dev
mailing list