[Biophp-dev] Re: levenshtein distance
Serge Gregorio
biophp-dev@bioinformatics.org
Fri, 09 May 2003 03:17:22 +0800
Hello Frankie!
>hm how about passing an argument which unlocks string computation >with more than 1024 chars and let them crash their computers >themselves?
Hahaha! Great idea!
Btw, I took your code for a testdrive and it's good! Works even
when comparing very short and very long strings. While you're at
it, you *MIGHT* want to write some code for scoring matrices (it
is related to the levenshtein thingie).
Let's say you have two strings A and B of equal length as follows:
string A = "SERGE"
string B = "FRANK"
Write a function that returns the SCORE based on the ff. matrix
for example:
string A = "S E R G E"
string B = "F R A N K"
-2 0 -1 0 1 = -2 => FINAL SCORE
A R N D C Q E G H I L K M F P S T W Y V
A 4
R -1 5
N -2 0 6
D -2 -2 1 6
C 0 -3 -3 -3 9
Q -1 1 0 0 -3 5
E -1 0 0 2 -4 2 5
G 0 -2 0 -1 -3 -2 -2 6
H -2 0 1 -1 -3 0 0 -2 8
I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
You can store the matrix as an array of arrays, e.g.
( (A, A, 4), (A, R, -1), ... )
Or you could use associative keys, e.g.
( A => ( (A, 4), (R, -1), ... ) )
But the first looks simpler. What do you think? Are you up to it? =)
Of course, the next step is a function that handles strings of
unequal length, but we'll get to that later. See your code
soon!
Regards,
Serge
P.S. You can check out this link for more info:
http://molvis.chem.indiana.edu/C687_S99/lecture2.html
Need a new email address that people can remember
Check out the new EudoraMail at
http://www.eudoramail.com