[Biococoa-dev] (no subject)
Philipp Seibel
biococoa at bioworxx.com
Mon Mar 14 03:58:28 EST 2005
Wow, seems to become a very hot topic.
> Well, perhaps I'm more humanoid, but I like it better than
> whatevermatrix['0x00']['0x03'];
fast algorithms may not be human readable ;-)
>>
> Also it would change the code dramatically as well:
> BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... ->
> not necessary (unless we make the matrix creation dependent on the
> symbolset (see below)
> BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set];
> -> not necessary
> char *seq1=[mapping charMappingForSequence:sequenceObject1]; -> same
> char *seq2=[mapping charMappingForSequence:sequenceObject2]; -> same
> int **scores=[mapping charMappingForScoreMatrix:matrix]; -> int
> **scores = [BCAlignment matrixForSymbolSet: set];
can't agree with this, because we need make the scoringMatrix
customizable, so caching and converting has to be outside the
BCAlignment class.
Phil
> // .... run the algorithm...
> BCSequenceAlignment *result=[BCAlignment
> alignementForSequences(int)count length:(int)length
> charBuffer:(char*)seqs]; -> Why make BCSymbolmapping the mother of
> alignments?!
>
>>
>> Again, I do think that mapping to the representing char of a symbol
>> will make sense and might do the job (and will be VERY convenient for
>> debugging), so I agree with you Koen and Alex.
>
>> But separating the mapping step allows for easier modifications in
>> the future:
>> * it is possible that a 16 bytes score matrix will use the caches
>> more efficiently than a 16 kilobytes; it is not just a RAM issue; L2
>> cache is 512 kb on dual G5, not sure about L1; if may even fit in
>> registers (?)
> Yes could be, but I really doubt if this is the bottleneck in the
> algorithm, this would be a typical example of doing lots of tuning
> before we even know where the problem is! Let's first make the thing
> in the SIMPLE way and then optimize it. We can always implement the
> remapping IF indeed there's lots to win in this area.
>
>> * if a score is an int or a float, the matrix is actually 128 x 128 x
>> 4 = 64 kilobytes
> That's right, but come on, 64kb that's nothing.
>
>> * it is possible that int will be better than char because of the
>> cast step? I know it is a big issue for float to int, but I don't
>> know about char --> int; so maybe we will use int?
> Same thing, let's make the thing and Shark will tell us.
>>
>> The most important is: we don't know yet any of that and we will know
>> only later, after running Shark on real cases.
> Aha, to early again ;-)
>> If we have everything in place to easily test and choose the best
>> mapping, it will be easier.
> No mapping it all ;-)
>> Also, the mapping could be useful for other purposes (like saving as
>> binary and compress, but not the best example!). Finally, if we find
>> that we need to improve the mapping step, at least there will be
>> mostly one class that will have to be modified.
> Or none, well you got the point. Sorry for that couldn't resist.
>> The mapping class may evolve to take more parameters and implement
>> different approaches depending on the symbol set (at which point it
>> would become a class cluster, but don't get me there).
>>
>> Sorry this whole email comes a bit after the discussion, but my main
>> point is to make a case in favor of a separate class for mapping. I
>> think it will help, and not obfuscate things, but actually separate
>> things better, and make them clearer!
> That I have no problem with, I believe there might be a need in the
> future for this thing, but I don't see why we would need it in
> alignments before we start to optimize things, and thus I don't see
> why we would implement it now if there's not yet a purpose. We can
> better focus on writing a damn fast BCSequence to char array converter
> ;-)
>>
>> Phil, hang in there. Let's not let these guys take us down ;-)
> GRRRRR!!!! LOL,
> Cheers mates!
> Alex
More information about the Biococoa-dev
mailing list