[Biophp-dev] Re: levenshtein distance

Serge Gregorio biophp-dev@bioinformatics.org
Fri, 09 May 2003 03:17:22 +0800


Hello Frankie!

>hm how about passing an argument which unlocks string computation >with more than 1024 chars and let them crash their computers >themselves?

Hahaha!  Great idea!

Btw, I took your code for a testdrive and it's good!  Works even
when comparing very short and very long strings. While you're at    
it, you *MIGHT* want to write some code for scoring matrices (it
is related to the levenshtein thingie).

Let's say you have two strings A and B of equal length as follows: 

   string A = "SERGE"
   string B = "FRANK"
      
Write a function that returns the SCORE based on the ff. matrix
for example:

   string A = "S  E  R  G  E"
   string B = "F  R  A  N  K"
              -2  0 -1  0  1 = -2 => FINAL SCORE

     A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V
 A   4
 R  -1  5
 N  -2  0  6
 D  -2 -2  1  6
 C   0 -3 -3 -3  9
 Q  -1  1  0  0 -3  5
 E  -1  0  0  2 -4  2  5
 G   0 -2  0 -1 -3 -2 -2  6
 H  -2  0  1 -1 -3  0  0 -2  8
 I  -1 -3 -3 -3 -1 -3 -3 -4 -3  4
 L  -1 -2 -3 -4 -1 -2 -3 -4 -3  2  4
 K  -1  2  0 -1 -3  1  1 -2 -1 -3 -2  5
 M  -1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5
 F  -2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6
 P  -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7
 S   1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4
 T   0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5
 W  -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11
 Y  -2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7
 V   0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4

You can store the matrix as an array of arrays, e.g.

   ( (A, A, 4), (A, R, -1), ... ) 

Or you could use associative keys, e.g.

   ( A => ( (A, 4), (R, -1), ... ) )

But the first looks simpler.  What do you think?  Are you up to it? =)

Of course, the next step is a function that handles strings of 
unequal length, but we'll get to that later.  See your code 
soon!

Regards,

Serge

P.S. You can check out this link for more info:

   http://molvis.chem.indiana.edu/C687_S99/lecture2.html



Need a new email address that people can remember
Check out the new EudoraMail at
http://www.eudoramail.com