[Biococoa-dev] Symbol mapping for optimization
biococoa at bioworxx.com
Sat Mar 12 04:00:07 EST 2005
> At this point, I hope you see why having a separate translator class
> could make sense. Now, next step: its implementation. In the
> implementation of the translator, I can see how BCSymbolSet would be
> very useful. I think each BCSymbolSet could define a different
> mapping. For instance, a symbol set with ATGC would result in a
> certain mapping, where A is mapped to a certain char XXXX. But if the
> symbol set is ATGCBVHD, then symbol A could well be mapped to a
> different char, e.g. not XXXX but YYYY. Thus instead of having a fixed
> mapping BCSymbol <--> char, we could have a more dynamic mapping only
> dependent on a symbol set. This way, for instance, we could decide to
> always use the smaller possible matrix for scores, e.g. 4x4 for a
> symbol set of 4 symbols.
> Symbol set are easy to define before starting an alignement, and
> should be easy to define before any algorithm where BCSymbol<-->char
> mapping makes sense. In the case of alignement, we would do the
> * Define a BCSymbolSet that covers the sequences to align, e.g. union
> of the symbol sets of the sequences
> * Use that symbol set to instantiate a new translator, e.g.
> * Call the translator to translate the BCSequences --> *char
> * Call the translator to translate the BCScoreMatrix --> **int (the
> indexes will be chars cast to ints)
> * Run the algorithm using only the chars
> * Call the translator to translate back the chars into sequences et al.
Great, i like your idea very much. got it now ;-).
Perhaps we should not run the translator for the BCScoreMatrix inside
the Algorithm class, because when we want to do several alignments with
one scoring matrix, we would have to translate it several times. It's
better to run the translator during the initialization from the .plist
file i think. so we have the matrix for the special SymbolSet already
in int* (or int** ;-)) format.
Can't wait having this structure ;-)
> (note about int**: you are right, Phil, that *int are faster to
> access, but you can have both **int and *int at the same time, because
> if you create a matrix a as one block in memory, then you can use
> a = an *int with single index access, when needed).
> does this email make more sense??
> Thanks for reading it all :-)
> These were my 4 cents.
> NB: we may use the name 'mapper' instead of 'translator'...
> Help science go fast forward:
> Charles Parnot
> charles.parnot at stanford.edu
> Room B157 in Beckman Center
> 279, Campus Drive
> Stanford University
> Stanford, CA 94305 (USA)
> Tel +1 650 725 7754
> Fax +1 650 725 8021
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
More information about the Biococoa-dev