[Biophp-dev] Action plan proposal

Serge Gregorio biophp-dev@bioinformatics.org
Sun, 04 May 2003 23:44:31 +0800


This is a multi-part message in MIME format.
You need a MIME compliant mail reader to completely decode it.

--=_-=_-EKPDHAPAANABMCAA
Content-Type: text/plain; charset=us-ascii
Content-Language: en
Content-Length: 249
Content-Transfer-Encoding: 7bit

It seems my web-based email doesn't used fixed-width fonts.

I am re-sending my previous email as a plain *.txt file.

Cheers!

Serge


Need a new email address that people can remember
Check out the new EudoraMail at
http://www.eudoramail.com
--=_-=_-EKPDHAPAANABMCAA
Content-Type: text/plain; charset=us-ascii; name="proposed_action_plan.txt"
Content-Language: en
Content-Length: 2546
Content-Transfer-Encoding: 7bit

Hello all!

May I suggest the following courses of action for the next
few weeks?

                       May    June    July
                     1 2 3 4 1 2 3 4 1 2 3 4 
DESIGN

 Recruitment (ALL)   *-----* 
 Formation of a 
  Design Committee         *-*
 Design Work                 *-----*-------*
 1st Draft                         *
 Approve Final Design                      * 
 
DEVELOPMENT

 Interface with ext. *---------------------*
  web servers (Sean) 
  (e.g. BLAST@NCBI)  
 Writing parsers     *---------------------*
  (Nico)
 Miscellaneous code  *---------------------*
  c/o Serge
 Study other BioXXX  *---------------------* 
  to aid design (Greg) 
 Test/critique       *---------------------*
  existing code 
  (Andres, Frank, 
  other SF members)

The design phase may appear to be too long for some, but this is
a long-term project (which hopefully will outlive us all) and 
viewed in that context, two months isn't too big a deal.

Also, the DEVELOPMENT that goes in parallel with DESIGN is meant
to fill the appetite of people more inclined to coding.  We can
identify modules that could be written in a way that would be 
easy to integrate into whatever final design is adopted by the
end of July.  I vote to have such development focus on code that
interface with external data or applications (local or web-based)
because that seems to interest most of our potential users. 

What do you think?

This early, I have something for the last group (Andres,
Frank, etc.) to look at: the xlevdist() method.

The xlevdist() method is my own version of PHP's levenshtein()
function.  They both return an integer that is a measure of the 
difference between two strings.  I wrote xlevdist() because of
the 256-character limit of levenshtein().  You can view the
source code at the CVS repository (GenePHP folder, etc.inc.php
file, inside the SeqMatch() class, if I'm not mistaken.)

I have two problems with this.  One is that it *SEEMS* to give
wrong answers when given a very short string and a very long one
(as arguments).  (Well, I hope it's just my imagination!) 

My second problem is this: how should function xlevdist() be
modified to allow for user-defined weights (for substitution,
deletion, insertion operations such as the BLOSSUM matrix)?

The algorithm for levenshtein is described here:    
   
  http://www.merriampark.com/ld.htm#WHATIS

It also has code written in Java that implements the algorithm.

Many thanks!

Regards,

Serge
--=_-=_-EKPDHAPAANABMCAA--