[Biococoa-dev] (no subject)
Alexander Griekspoor
a.griekspoor at nki.nl
Mon Mar 14 03:46:29 EST 2005
> Somehow, you have to explain that in more details ;-)
>
> The BCSymbolMapping class proposed by Phil is exactly what I had in
> mind. I would add the following methods:
> - (char *)charMappingForSequence:(BCAbstractSequence *)sequence;
> - (char **)charMappingForScoreMatrix:yadayada..;
> ... and the same backwards...
>
> The BCSymbolMapping can even take care of the malloc, like put above
> (with automatic autorelease; I can give more details how). It could
> implement some caching in the future if needed (@Phil: BTW, I would
> rather have BCSymbolMapping do the caching than BCScoreMatrix, ref: a
> previous email from you, see what I mean?).
Caching would be nice, but again, why not let the BCSequence do the job
itself (no hassle with helper objects), it's also THE place to store
the cache IMHO...
> The whole idea of this class, again, would be to have a separate class
> that takes care of the mapping, and only of the mapping:
>
> objects ------> C ------> algorithm -------> C -------> Objects
>
> The algorithm should not know anything about the biology. I would not
> want to see anything like -whatevermatrix['A']['G']- in the middle of
> the algorithm. Having the mapping done in a separate class allows to
> write the algorithm like this:
Well, perhaps I'm more humanoid, but I like it better than
whatevermatrix['0x00']['0x03'];
>
Also it would change the code dramatically as well:
BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... ->
not necessary (unless we make the matrix creation dependent on the
symbolset (see below)
BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set];
-> not necessary
char *seq1=[mapping charMappingForSequence:sequenceObject1]; -> same
char *seq2=[mapping charMappingForSequence:sequenceObject2]; -> same
int **scores=[mapping charMappingForScoreMatrix:matrix]; -> int
**scores = [BCAlignment matrixForSymbolSet: set];
// .... run the algorithm...
BCSequenceAlignment *result=[BCAlignment
alignementForSequences(int)count length:(int)length
charBuffer:(char*)seqs]; -> Why make BCSymbolmapping the mother of
alignments?!
>
> Again, I do think that mapping to the representing char of a symbol
> will make sense and might do the job (and will be VERY convenient for
> debugging), so I agree with you Koen and Alex.
> But separating the mapping step allows for easier modifications in the
> future:
> * it is possible that a 16 bytes score matrix will use the caches more
> efficiently than a 16 kilobytes; it is not just a RAM issue; L2 cache
> is 512 kb on dual G5, not sure about L1; if may even fit in registers
> (?)
Yes could be, but I really doubt if this is the bottleneck in the
algorithm, this would be a typical example of doing lots of tuning
before we even know where the problem is! Let's first make the thing in
the SIMPLE way and then optimize it. We can always implement the
remapping IF indeed there's lots to win in this area.
> * if a score is an int or a float, the matrix is actually 128 x 128 x
> 4 = 64 kilobytes
That's right, but come on, 64kb that's nothing.
> * it is possible that int will be better than char because of the cast
> step? I know it is a big issue for float to int, but I don't know
> about char --> int; so maybe we will use int?
Same thing, let's make the thing and Shark will tell us.
>
> The most important is: we don't know yet any of that and we will know
> only later, after running Shark on real cases.
Aha, to early again ;-)
> If we have everything in place to easily test and choose the best
> mapping, it will be easier.
No mapping it all ;-)
> Also, the mapping could be useful for other purposes (like saving as
> binary and compress, but not the best example!). Finally, if we find
> that we need to improve the mapping step, at least there will be
> mostly one class that will have to be modified.
Or none, well you got the point. Sorry for that couldn't resist.
> The mapping class may evolve to take more parameters and implement
> different approaches depending on the symbol set (at which point it
> would become a class cluster, but don't get me there).
>
> Sorry this whole email comes a bit after the discussion, but my main
> point is to make a case in favor of a separate class for mapping. I
> think it will help, and not obfuscate things, but actually separate
> things better, and make them clearer!
That I have no problem with, I believe there might be a need in the
future for this thing, but I don't see why we would need it in
alignments before we start to optimize things, and thus I don't see why
we would implement it now if there's not yet a purpose. We can better
focus on writing a damn fast BCSequence to char array converter ;-)
>
> Phil, hang in there. Let's not let these guys take us down ;-)
GRRRRR!!!! LOL,
Cheers mates!
Alex
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
iRNAi, do you?
http://www.mekentosj.com/irnai
*********************************************************
More information about the Biococoa-dev
mailing list