From mourad12345678 at yahoo.com  Sun Feb  3 20:04:37 2008
From: mourad12345678 at yahoo.com (Mourad Elloumi)
Date: Sun, 3 Feb 2008 17:04:37 -0800 (PST)
Subject: [BiO BB] Call for Paper : Algorithms in Molecular Biology -
	ALBIO'08 (Vienna, July 2008 )
Message-ID: <910936.72662.qm@web31514.mail.mud.yahoo.com>

                    CALL FOR PAPERS
 
  Higher School of Sciences and Technologies of Tunis 
                      (Tunisia)

                      Organizes

        Algorithms in Molecular Biology (ALBIO'08)
             Workshop held in parallel with

           2nd International Conference on
    Bioinformatics Research and Development (BIRD?08)

                    www.birdconf.org

         Technical University of Vienna, Austria
                     July 7-9, 2008

  Computational Molecular Biology has emerged from the
Human Genome Project as an important discipline for
academic research and industrial application. The
exponential growth of the size of biological
databases, the complexity of biological problems and
the necessity to deal with errors in biological
sequences, result in time efficiency and memory
requirements. The development of fast, low memory
requirements and high-performances algorithms is thus
increasingly important in Computational Molecular
Biology.

  We are interested in papers that deal with
algorithms that solve fundamental and/or applied
problems in Molecular Biology, that are
computationally efficient, that have been implemented
and experimented on simulated and/or on real
biological sequences, and that provide interesting new
results. The submitted papers should present recent
research results and identify and explore directions
for future research. Topics include, but not limited
to: (i) strings processing, (ii) biological sequences
comparison, (iii) structures prediction, (iv)
phylogeny reconstruction, (v) DNA sequences assembly,
clustering, and mapping, (vi) molecular evolution,
(vii) genes prediction/recognition, (viii) genes
expression (ix) haplotyping (x) genomes rearrangement
(xi) strings barecoding.

  You are invited to submit a draft paper in PDF
format before March 1, 2008 to the Workshop Chair: Dr.
Mourad Elloumi, E.Mail: Mourad.Elloumi at fsegt.rnu.tn or
Mourad12345678 at yahoo.com

  Papers should not exceed 10 pages in Lecture Notes
in Bioinformatics (LNBI) format. All accepted papers
will be published in LNBI
www.springer.de/comp/lncs/authors.html by Springer
Verlag.

  Program Committee:
  . Mourad Elloumi, University of Tunis, Tunisia,
(Chair)
  . Sami Khuri, San Jos? State University, USA
  . Alain Gu?noche, Institute of Mathematics of
Luminy, Marseille, France.
  . Nadia Pisanti, University of Pisa, Italy
  . Gianluca Della Vedova, University of
Milano-Bicocca, Italy
  . Pierre Peterlongo, IRISA-INRIA, Rennes, France
  . Jan Holub, Czech Technical University in Prague,
Czech Republic

Important Dates: 
  Submission of Full Papers: March 1, 2008 
  Notification of Acceptance: April 1, 2008 
  Camera-ready Copies: April 15, 2008 


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 

From marchywka at hotmail.com  Mon Feb  4 08:22:45 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Mon, 4 Feb 2008 08:22:45 -0500
Subject: [BiO BB] looking for reference on DSCAM exon locations.
In-Reply-To: <2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com>
References: <10f601c857c5$26607a90$0301a8c0@openhelia1076a>
	<2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com>
Message-ID: <BAY108-W3259EA4D1345CC1AF9035BBE330@phx.gbl>


Hi,
I'm using DSCAM, and mostly fly DSCAM, as a test case to develop more general
tools for exploring base sequences. Some early results don't appear
to be trivially wrong, but I have a few missing pieces of info I can't
quite locate to further explore initial output. If you could point me to a link
that may address these issues that would be most helpful. Essentially,
I just need to know the exact location of each exon variant, preferably in
as many species as possible but so far I have only located this for exon 4
and otherwise had to guess from ref [3] results. 

I'm trying to generalize results in [4] and [5] and search for DNA features that
may suggest splice rules or answer some questions posed in [6]. 
I'm searching for stem-loop structures as in [4] and [5],
as well as reverse-complement matches that may be well separated as in [8].

>From [1], I gather that there are a certain number of exon variants for melanogaster.
Notably, 12 for exon 4, 48 for exon 6, and 33 for exon 9. 
I can get exact locations for exon 4 and 5 starts from [2], but am stuck using
ambiguous flybase exons. From [3], I end up with 98 exons which is short of the
100+ I get from adding up earlier variants or the 115 cited in [6]( or [7] ). 


I tried a sloppy version of the stem-loop in [4] that relates to pseudoexon.
In my bastardized regular-expression format (I'm using '[]' for a group, not the normal
PERL convention, don't ask..., and implicitly match one group to its reverse complement-
and,yes, the quantifiers are redundant. Otherwise, this is just a PERL REGEX): 
[\1]{6,6}.{2,10}[\2]{2,3}.{1,8}[\3]{1,5}.{0,4}[\3]{1,5}.{1,8}[\2]{2,3}.{2,10}[\1]{6,6}.{6,11}[\4]{4,7}.{0,4}[\4]{4,7}> RC5|5|CFTR

I was rather excited that these "hits" are in many locations BUT are excluded in the
range of exon 4 variant. In particular, this mish-mash of hits shows where things seem to
occur. It appears that exon 6 may or may not obey similar distribtion of hits.
Each line is the location of some "rule hit" where the first number is location in genome,
"Dscam" indicates the flybase exon number, "RC5" is my rule hit, and the
other things are rule hits to known locations such as exon 4 starts: 
( I tried to make the numbers useful to outside reader but this confuses things
like the exon 4 rule hits that end where Dscam starts- my hits are leaders, the dscam
labelled hits are where exon actually starts): 

$ cat flybase_exon_starts ffx fg | sort -g | awk '{$1=3269374-$1; print $0;}' | more> mish_mash.txt

3269374 Drosophila melanogaster chromosome 2R
3269374 Drosophila melanogaster chromosome 2R
3269375 Dscam:98
3268566 875 TATTTCATGCTACTTTTTATTTATAAATCGAGTTTTAGAGGAAATAATTGCAGTCCCTGAATTTTCAG> RC5|5|CFTR
3267836 1597 TAATTTCTGTTTACATTGATACTCCGCTTAATGTAAATTATTATACTTATTTTACAATAA> RC5|5|CFTR
3265322 4114 TTTATAAGCACAAAAGGAGTAGCCCCTATAAAAAATGTATAAACAAAATAAATCATATAATAT> RC5|5|CFTR
3265220 Dscam:97
3264108 5333 ATTTATTCCTCCATTTTACTTTTCCCTATTATCGTAATAATTGATAAATTGCATATGCAAACTATTTG> RC5|5|CFTR
3263246 6195 AATGCGATGTTTATGTTGTTGTTCCTGTCTCCGCTACAGTCGGACGCATTTAATTCGCAATTTCATTG> RC5|5|CFTR
3259916 9510 TTGCTTAAATTAATTAAAGCATTGGCTTAAAGAAGCAAAGAATCTATAATTAT> RC5|5|CFTR
3257467 11974 TTAAACTATTACTTTATAGATAAAAGTATATCCTCACAATAATTTTGTTTAACAAATGCATTCAAATG> RC5|5|CFTR
3257239 12192 AATTGTTCATTGCATTCACATTATTTAATTAACAATTAATAAATAATTTTATTTTAAA> RC5|5|CFTR
3257148 12285 TAAGAACATAACTATACTTATTCTGTGCCTTTGAGCTTTCTTATATTAATGGATTTAAAT> RC5|5|CFTR
3256238 Dscam:96
3255817 13612 TTAAAAAAGGATAGATATGAGCTTTATATATTTTTAAAAAGTTTAAAAAAATATTT> RC5|5|CFTR
3254485 14902 CGGCCTTTTCCCAG>local|i|DNA Fly DCAM Exon 4.1
3254472 Dscam:95
3254146 15241 TCCTACCTGTTTAG>local|i|DNA Fly DCAM Exon 4.2
3254133 Dscam:94
3253623 15764 CATTGCTGTTTTAG>local|i|DNA Fly DCAM Exon 4.3
3253610 Dscam:93
3253001 16386 GAACTCACCTTCAG>local|i|DNA Fly DCAM Exon 4.4
3252988 Dscam:92
3252698 16689 CTCTTGCTTTACAG>local|i|DNA Fly DCAM Exon 4.5
3252685 Dscam:91
3252412 16975 ATTTTAAATCGCAG>local|i|DNA Fly DCAM Exon 4.6
3252399 Dscam:90
3252136 17251 GCACACCTTTGCAG>local|i|DNA Fly DCAM Exon 4.7
3252123 Dscam:89
3251867 17520 TATTCGATTCAAAG>local|i|DNA Fly DCAM Exon 4.8
3251854 Dscam:88
3251567 17820 TTCTATCGACTCAG>local|i|DNA Fly DCAM Exon 4.9
3251554 Dscam:87
3251284 18103 CTGATTTCCTTCAG>local|i|DNA Fly DCAM Exon 4.10
3251271 Dscam:86
3251009 18378 CTCCCGTCTTGCAG>local|i|DNA Fly DCAM Exon 4.11
3250996 Dscam:85
3250713 18674 CGTACACTTTGCAG>local|i|DNA Fly DCAM Exon 4.12
3250700 Dscam:84
3249574 19855 ATTTTTGCACAATTAAAAGTAACACAAAATGAAAAATGATTACCAGCCATGTGGCT> RC5|5|CFTR
3249386 20001 TATCAAAATATCAG>local|i|DNA Fly DCAM Exon 5
3249373 Dscam:83
3248960 20486 TTTGTATCTTTTGGAGTTTTCTCATCTACAGCTCAAATAGAATAGATACAAATCAAGTATTAAAATACATATT> RC5|5|CFTR
3248760 20675 AATTTAAAACTTATCATATTTCAAATATTTTTGAACACATAAATTTAATGTCAAATTGTTTG> RC5|5|CFTR
3248545 20904 TTTACAAATATAAATATATATATAATTCAATATAAATATTGAAATATCAAAAATGTAAATATTTAAAATGATATTT> RC5|5|CFTR
3248155 Dscam:82
3247920 Dscam:81
3247711 Dscam:80
3247513 Dscam:79
3247296 Dscam:78
3247071 Dscam:77
3246851 Dscam:76
3246645 Dscam:75
3246436 Dscam:74
3246233 Dscam:73
3245845 Dscam:72
3245421 Dscam:71
3245220 Dscam:70
3245029 Dscam:69
3244602 Dscam:68
3244374 Dscam:67
3244156 Dscam:66
3243946 Dscam:65
3243736 Dscam:64
3243530 Dscam:63
3243315 Dscam:62
3242920 Dscam:61
3242716 Dscam:60
3242511 Dscam:59
3242315 Dscam:58
3242055 27370 ATAGAATACGTACGGCTGGGTGAAATCGTTTCTATAATGTGTCCTGCGCAGG> RC5|5|CFTR
3241906 Dscam:57
3241442 Dscam:56
3241198 Dscam:55
3240871 Dscam:54
3240528 Dscam:53
3239545 Dscam:52
3239328 Dscam:51
3238953 30482 ATATTTATGATACGGGAATGTTAGATTTGATATTCAAATATACTCCACTTCTTTATGTTAAA> RC5|5|CFTR
3238803 Dscam:50
3238210 Dscam:49
3238003 Dscam:48
3237466 31973 CTACAACATCAATAAGTCCCATAAGAAGCATATTGTTATTACTTTTGTAGAGCCAGTTGGCGCCAA> RC5|5|CFTR
3237417 Dscam:47
3237019 Dscam:46
3236481 Dscam:45
3235516 Dscam:44
3235203 Dscam:43
3234956 34477 CGTGTGTGGCCAGGAATGCGGCCGGGGTCATCTACCACACGGCAGAGCTGCGCGTTAACG> RC5|5|CFTR
3234817 34627 CCTCGCCCTCCTCCGCAGTTCTGCCCCAGATCGTGCCCTTCGATTTTGGCGAGGAGACCGTCAACGAGTTG> RC5|5|CFTR
3234800 Dscam:42
3234435 Dscam:41
3234062 Dscam:40
3233672 Dscam:39
3233281 Dscam:38
3233199 36235 TCAAGGGGGACCTGCCCTTGAGAATCCACTGGACCTTGAATGGTGAGCCTGTGGCAACAGG> RC5|5|CFTR
3232857 Dscam:37
3232742 36707 CACTAAACTCGGCTCTCATTGTAAACGGTGAAATGGGATTCACGTTAGTGCGGCTGAATAAGCGAACCAGTTCGCT> RC5|5|CFTR
3232460 Dscam:36
3232075 Dscam:35
3231673 37750 ATATGATATTTGTGCTGAATGTCATATAAATCAGAAAAATTAGGTGTAAT> RC5|5|CFTR
3231128 Dscam:34
3230754 Dscam:33
3230387 Dscam:32
3229897 Dscam:31
3229501 Dscam:30
3229124 Dscam:29
3228738 Dscam:28
3228338 Dscam:27
3227948 Dscam:26
3227576 Dscam:25
3227196 Dscam:24
3226762 42662 AGTCTCTGTGACTTGTTTGATATCCAGTGGAGACTTACCCATCGATATCGA> RC5|5|CFTR
3226434 Dscam:23
3226043 Dscam:22
3225665 Dscam:21
3225287 Dscam:20
3225060 44371 TAGTTGCCGGGCAAAGAACTACGCAGCAGCCGTCAACTACAGCACTGAACTCATAGTT> RC5|5|CFTR
3224228 45215 CCCGTGGACATCACCTGGTTGTTCAATGACTATGCCATCAACGAGTATCACGGGGTCACCTCTTCCAAGA> RC5|5|CFTR
3223509 Dscam:19
3222724 Dscam:18
3222172 Dscam:17
3219886 Dscam:16
3219708 Dscam:15
3219235 50201 TCCGGAGATGCCATATGCTTTGAAGGTACTCGACAAATCCGGACGTTCCGTGCAGCTGAGCTG> RC5|5|CFTR
3218320 Dscam:14
3218106 Dscam:13
3217357 Dscam:12
3217195 Dscam:11
3217178 52257 GCTTCTGACATTTTGAACACCCGGACCAAGGGACAGAAGCCCAAGCTGCCCGAGAAACCTCG> RC5|5|CFTR
3216961 Dscam:10
3216459 52976 AACAAATTGCACAGTATATAAAATTATATTATTCCTATTTTTTGTTGTTCAAACCAAGCTTG> RC5|5|CFTR
3216293 53130 AAAATCATTAGTGTAAAATAATAATGATTTTTCTTACGTAAATGCAATTT> RC5|5|CFTR
3216028 53416 TTTTGTTCAGTTTTTCAGCTCACGTAAGGTTAAAAAAAAAAAAACAAAAGTAGAGCTTTCTTAAATTTTAA> RC5|5|CFTR
3214571 54860 CGAAAACGACTACATATCGACAAGTTAACCTTTGAATTTTTCGCCTGCCACAGTCTGT> RC5|5|CFTR
3214290 Dscam:9
3213870 Dscam:8
3213504 55919 TATTATCCTTTCATTTACAAAGATAATATTTTGCATCCAATTAACTAATT> RC5|5|CFTR
3212243 Dscam:7
3211474 Dscam:6
3211209 58231 GGCTTAATATGTCTGGATTAGCTAGTCTATAATCTATGTTAAGCCATACTGCCTCTACTCTTTGAGT> RC5|5|CFTR
3210838 Dscam:5
3210462 Dscam:4
3210224 Dscam:3
3209155 Dscam:2
3208270 Dscam:1


References
==========


[1] Graveley 2004 , http://www.rnajournal.org/cgi/reprint/10/10/1499
[2] Celotto and Graveley, 2001 http://www.genetics.org/cgi/reprint/159/2/599.pdf
[3] http://flybase.bio.indiana.edu/reports/FBgn0033159.html
[4] Buratti 2007, http://nar.oxfordjournals.org/cgi/reprint/35/13/4369
[5] Kreahling and Graveley 2005 http://mcb.asm.org/cgi/content/full/25/23/10251
[6] Olson... Graveley 2007 http://www.nature.com/nsmb/journal/v14/n12/full/nsmb1339.html
[7] ref 5 in [6], Schmucker etl al 2000 http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WSN-4194S59-F&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=d159aee1d55f9b955b8a9dc96344a5f4
[8] Anastassiou 2006 http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1431710&blobtype=pdf


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.


_________________________________________________________________
Helping your favorite cause is as easy as instant messaging.?You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join


From jeff at bioinformatics.org  Wed Feb  6 21:31:32 2008
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Wed, 06 Feb 2008 21:31:32 -0500
Subject: [BiO BB] Courses: Gene Expression Analysis and Biostatistics
Message-ID: <47AA6D84.6050101@bioinformatics.org>

Greetings,

The following courses are being offered at Bioinformatics.Org this month:

Gene Expression Analysis; Feb 18-22, 2008

This course helps to demystify Affymetrix analysis so that any researcher can take the basic steps to go from a chip image to a list of genes that are up- or down-regulated in an experiment. Various tools will be covered, e.g. GCOS, Excel, MATLAB, and free tools like R and Dchip. It is geared towards researchers who conduct microarray experiments to study genome-wide expression changes and understand the underlying mechanisms of gene regulation in samples of interest. Most scientists are not able to analyze the resulting data themselves. They are not able to get desired results using traditional tools like Microsoft Word and Excel, or with advanced software provided by commercial vendors. The freeware solutions come either with a steep learning curve or as black-box interfaces that provide limited functionality with little or no technical support. In the midst of all this is the fundamental lack of understanding among scientists on how the technology works and what the fundam
ental parts of the analysis are.

FOR MORE INFORMATION:
http://wiki.bioinformatics.org/BI201A_Gene_Expression_Analysis

Biostatistics: Distributions, Tests and Graphics; Feb 25-29, 2008

The various statistical distributions covered will help you know when assumptions can be made about a normal distribution and how to test whether or not these assumptions are true. Essential descriptive statistics are reviewed and then used in various situations to calculate background, noise, normalization and thresholding. Additionally, hypothesis testing is introduced so that you can assess groups of observations for a particular parameter and calculate whether or not the difference between groups is significant. Data visualization using various graphs will also be reviewed. Armed with these techniques, you will be able to better deal with the challenges of data analysis. Plus, you'll be able to understand and interpret data at a more fundamental level and draw the correct conclusions about them.

FOR MORE INFORMATION:
http://wiki.bioinformatics.org/MA101A_Distributions,_Tests_and_Graphics

Cheers,
Jeff
-- 
J.W. Bizzaro
Bioinformatics Organization, Inc. (Bioinformatics.Org)
E-mail: jeff at bioinformatics.org
Phone:  +1 508 890 8600
--


From aao at fe.up.pt  Thu Feb  7 04:49:26 2008
From: aao at fe.up.pt (alexandra)
Date: Thu, 7 Feb 2008 09:49:26 -0000
Subject: [BiO BB] First Announcement NN2008
Message-ID: <000b01c8696e$ba28bc50$a56aa8c0@ineb.fe.up.pt>

Apologies for multiple copies.

We appreciate if you can forward this Announcement to potential candidates. 
============================================================= 
SUMMER SCHOOL NN2008

NEURAL NETWORKS in CLASSIFICATION, REGRESSION and DATA MINING 

July 7-11, 2008, Porto, Portugal
============================================================= 

 <http://www.nn.isep.ipp.pt> http://www.nn.isep.ipp.pt  email:
<mailto:nn-2008 at isep.ipp.pt> nn-2008 at isep.ipp.pt 

 
GENERAL INFORMATION 

The Summer School will be held at Porto, Portugal, jointly organized by the
Polytechnic School of Engineering of Porto (ISEP) and the Faculty of
Engineering, Porto University (FEUP). 

Following last year experience, this year's edition also includes a
POSTER/WORKSHOP SESSION providing a discussion forum where the participants
can obtain peer guidance for their projects.

 
PROGRAMME COMMITTEE

* Alexander Zien (Research Scientist at the Friedrich Miescher Laboratory,
Germany) 

* Carlos Soares (Assistant Professor, Faculty of Economy, University of
Porto, Portugal) 

* Christopher Bishop (Deputy Managing Director at Microsoft Research
Laboratory in Cambridge and Chair of Computer Science at the University of
Edinburgh, UK)

* Joaquim Marques de S? (Full Professor, Dept. Electr. and Comp.
Engineering, Fac. of Engineering, University of Porto, Portugal) 

* Jorge Santos(Assistant Professor, Engineering Polythecnic Institute,
Porto, Portugal) 

* Mark Embrecht (Associate Professor, Rensselaer Polytechnic Institute, RPI
Troy, New York, U.S.A.) 

* Noelia S?nchez Maro?o (Assistant Professor, Coruna University, Spain) 

* Paulo Cortez (Assistant Professor, University of Minho, Portugal) 

* Petia Georgieva (Assistant Professor, University of Aveiro, Portugal) 

* Yann Guermeur ((Scientific Director of the Laboratoire Lorrain de
Recherche en Informatique et ses Applications, France)

 
COURSE CONTENTS 

Neural networks (NN) have become a very important tool in classification and
regression tasks. The applications are nowadays abundant, e.g. in the
engineering, economy and biology areas. The Summer School on NN is dedicated
to explain relevant NN paradigms, namely multilayer perceptrons (MLP),
radial basis function networks (RBF) and support vector machines (SVM) used
for classification and regression tasks, illustrated with applications to
real data. Specific topics are also presented, namely Multi-Valued and UB
Neurons , Functional Networks , MLP's with Entropic Criteria and Data Mining
using NN. 

Classes include practical sessions with appropriate software tools. The
trainee has, therefore, the opportunity to apply the taught concepts and
become conversant with a broad range of NN topics and applications. A
special workshop session will provide a discussion forum where the
participants can obtain peer guidance for their projects.

 
PRELIMINARY PROGRAMME 

A preliminary programme and further information about the classes are
available at the school webpage ( <http://www.nn.isep.ipp.pt>
http://www.nn.isep.ipp.pt) 

 
IMPORTANT DEADLINES 

Early Registration: 18 May 2008

Poster Submission: 15 June 2008

Hotel booking : 15 June 2008

Summer School: 7-11 July 2008 

All participants are required to register prior to the start of the School -
until the June 15 - even if you choose to pay the late registration fee at
the registration desk. 
Please note that only a LIMITED number of participants can be accepted. 

 
REGISTRATION 

In order to attend the School you must fill in the registration form,
available at the School web page. Please note that if you have any guests
who would like to take part in the social programme, you must register them
as well, by filling in the corresponding field in the registration form. 

 
SCHOOL FEES 

The registration fee for participants amounts to: 

- Early registration fee (paid before the 18th of May) 

        * 350 Euro (students, ISEP and FEUP staff) 

        * 400 Euro (all other participants) 

- Late registration fee (paid after the 18th of May) 

        * 400 Euro (students, ISEP and FEUP staff) 

        * 450 Euro (all other participants) 

The registration fee includes: 

* school package (manuscripts, lecture's notes, CD) 
* coffee breaks 
* daily lunch 
* welcome reception 
* school banquet 

NOTE: The registration fee for those who attended previous editions amounts
to 25/30 euro per lecture and includes the school package and coffee-breaks.
Please, contact the LOC  for further details. 

 
LOCAL ORGANIZING COMMITTEE (LOC) 

- Helena Br?s Silva - Assistant Professor, Dept. Mathematics, ISEP, Portugal


- Jorge M. Santos - Assistant Professor, Dept. Mathematics, ISEP, Portugal 

- Rui Chibante - Assistant Professor, Dept. Mathematics, ISEP, Portugal 

 
CONTACT ADDRESS 

Local Organizing Committee (LOC) - Summer School NN2008

A/C Jorge M. Santos

Departamento de Matem?tica

Instituto Superior de Engenharia do Porto

Rua Dr. Ant?nio Bernardino de Almeida 431

4200-072 PORTO / PORTUGAL
Email: nn-2008 at isep.ipp.pt 

 
NN2008 Secretariat

Ms. Gabriela Afonso 
Email: gafonso at fe.up.pt 

 
Programme Chair: 
Prof. Joaquim Marques de S? 
Tel. +351 225081828 - Email: jmsa at fe.up.pt 

======================================== 


From isbra-l at engr.uconn.edu  Thu Feb  7 22:42:51 2008
From: isbra-l at engr.uconn.edu (ISBRA Symposium Announcements)
Date: Thu, 7 Feb 2008 22:42:51 -0500 (EST)
Subject: [BiO BB] [ISBRA-L] ISBRA 2008 Call for Posters in Bioinformatics
Message-ID: <Pine.LNX.4.60.0802072239110.32252@dna.engr.uconn.edu>


     CALL FOR POSTERS IN BIOINFORMATICS
   ================================================
     ISBRA 2008
     International Symposium on Bioinformatics Research and Applications
     May 6-8, 2008

     Georgia State University
     Atlanta, Georgia

     http://www.cs.gsu.edu/isbra08/

   ================================================

The International Symposium on Bioinformatics Research and Applications
(ISBRA) provides a forum for the exchange of ideas and results among
researchers, developers, and practitioners working on all aspects of
bioinformatics and computational biology and their applications.
Authors are invited to submit posters that demonstrate original
research in all areas of bioinformatics and computational biology,
including the development of experimental or commercial systems.
Topics of interest include but are not limited to:

* Biomedical databases and data integration
* Biomedical image processing
* Bio-ontologies
* Comparative genomics
* Computational genetic epidemiology
* Computational proteomics
* Data mining and visualization
* Gene expression analysis
* Genome analysis
* High-performance bio-computing
* Molecular evolution and phylogenetics
* Molecular modeling and simulation
* Pattern discovery and classification
* Population genetics
* RNA and protein structure prediction
* Sequence assembly
* Software tools and applications
* Systems biology


SUBMISSION REQUIREMENTS
Poster submission must be made electronically at:

http://www.easychair.org/conferences/?conf=ISBRA08

Submissions must be formatted using the Springer LNCS style
and must not exceed 4 pages. The accepted poster papers will
be published on CD-ROM and the symposium website. Submission
implies the willingness of at least one of the authors to
register and present the poster at the symposium. One best
poster award will be given at ISBRA08.


IMPORTANT DATES
   Submission deadline            March 14, 2007
   Notification of acceptance     March 21, 2008
   Final Version Submission       March 31, 2008


LOCATION
   ISBRA 2008 will be held at Georgia State University in Atlanta.
   Atlanta's major attractions--Centennial Olympic Park, Underground
   Atlanta, CNN Center, the World of Coca-Cola, and the Georgia Aquarium
   (the largest in the world)--can all be reached by a ten-minute walk
   from the GSU campus.


GENERAL CHAIRS
   Dan Gusfield, University of California, Davis
   Yi Pan, Georgia State University


PROGRAM CHAIRS
   Ion Mandoiu, University of Connecticut
   Raj Sunderraman, Georgia State University
   Alexander Zelikovsky, Georgia State University


POSTER CHAIRS
   Gulsah Altun, Georgia State University
   Stefan Gremalschi, Georgia State University

CONTACT INFORMATION
   Please direct questions to Ion Mandoiu (ion at engr.uconn.edu),
   Alexander Zelikovsky (alexz at cs.gsu.edu), or Raj Sunderraman
   (raj at cs.gsu.edu).

CONFERENCE WEB SITE: http://www.cs.gsu.edu/isbra08/


_______________________________________________
ISBRA-L mailing list
ISBRA-L at dna.engr.uconn.edu
http://dna.engr.uconn.edu/mailman/listinfo/isbra-l


From marchywka at hotmail.com  Sun Feb 10 13:47:16 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Sun, 10 Feb 2008 13:47:16 -0500
Subject: [BiO BB] looking for reference on DSCAM exon locations.
In-Reply-To: <2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com>
References: <10f601c857c5$26607a90$0301a8c0@openhelia1076a>
	<2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com>
Message-ID: <BAY108-W8747410083D38890D9476BE290@phx.gbl>


As it turns out, to answer most of my earlier question,

http://www.mail-archive.com/bbb at bioinformatics.org/msg00026.html

the exon locations are reasonably well described at NCBI following the links contained here,

http://genomebiology.com/2006/7/1/R2

( or here, I can't get figures to render at above link, 
     http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1431710&blobtype=pdf    )

Variable window binding for mutually exclusive alternative splicing
Dimitris Anastassiou , Hairuo Liu  and Vinay Varadan 
Center for Computational Biology and Bioinformatics, and Department of Electrical Engineering, Columbia University, New York, NY 07670, USA
 author email corresponding author email
Genome Biology 2006, 7:R2doi:10.1186/gb-2006-7-1-r2

"Because the Dscam gene of four out of the six Drosophila spp. had not previously been annotated [11], we first generated the missing annotations for all exons of cluster 6 using the existing annotations as benchmarks and ensuring that exons are located in open reading frames. The resulting annotated sequences for D. yakuba, D. ananassae, D. mojavensis and D. pseudoobscura have been deposited in GenBank under accession numbers DQ317106, DQ317107, DQ317108 and DQ317109, respectively. These can be accessed in addition to the previously available annotated sequences for D. melanogaster (accession number AF260530) and D. virilis (accession number AY686597)."

for mealnogaster, this would be here

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=8072216

Not sure how I missed this earlier but, anyway DSCAM does seem like a good test case and example to follow for splicing literature.

And I was able to verify that I can use my reverse-complement and rule code to find the patterns
previously reported by the authors. I'm still trying to determine what, if any, significance there may be to
the pattern I mentioned earlier.  I did some surveys on random genome segments and it does come
up pretty often but it doesn't seem to be completed excluded from DSCAM exon clusters.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.


_________________________________________________________________
Climb to the top of the charts!?Play the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan


From akunthavai at yahoo.co.in  Mon Feb 11 04:17:56 2008
From: akunthavai at yahoo.co.in (A KUNTHAVAI)
Date: Mon, 11 Feb 2008 09:17:56 +0000 (GMT)
Subject: [BiO BB] Homological DNA sequences
Message-ID: <152662.42187.qm@web8912.mail.in.yahoo.com>

Sir,
          I want to know the list of homological rice gene sequence to give as an input to Blastn, Blastp , blast2sq program. Please provide me the answer as early as possible. 
  A.Kunthavai
  Research Scholar
  Anna University


---------------------------------
 Why delete messages? Unlimited storage is just a click away.

From rebekah.rogers at gmail.com  Fri Feb  8 20:56:41 2008
From: rebekah.rogers at gmail.com (Rebekah Rogers)
Date: Fri, 8 Feb 2008 20:56:41 -0500
Subject: [BiO BB] Inconsistent Blast Results
In-Reply-To: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com>
References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com>
Message-ID: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>

Hi:

I'm currently running blast 2.2.14 locally on my mac.  I've noticed
that the printout from a blastn run at an E cutoff of 10^-10 reads
differently than a blast run at an E cutoff of 10^-7 when hits worse
than 10^-10 are ignored.   Suddenly at 10^-7 new hits with evals of
10^-11 appear that weren't there before and even the relative strength
of different hits can change.

I'm not certain I understand why this is true and it has a huge impact
on my results.  I know that the Eval is dependent on certain constants
taken from the compared sequences, but I don't understand how this
could possibly change when I'm using the exact same input file and
database.

Does anyone have an explanation?

-Rebekah


From marty.gollery at gmail.com  Mon Feb 11 12:28:40 2008
From: marty.gollery at gmail.com (Martin Gollery)
Date: Mon, 11 Feb 2008 09:28:40 -0800
Subject: [BiO BB] Inconsistent Blast Results
In-Reply-To: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>
References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com>
	<79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>
Message-ID: <bdd10c2a0802110928w6354d866x8366b412ff7bd939@mail.gmail.com>

Hi Rebekah,
I believe you are seeing differences because of scores getting thrown
out at an earlier step. What I think is happening is that the hits are
being cut off with the 10^-10 threshold that would have given better
results in the alignment regeneration phase. Then when you run the
search with the 10^-7 cutoff, those hits are allowed into the final
step and they are extended to yield better scores.

Best Regards,
Marty

On Feb 8, 2008 5:56 PM, Rebekah Rogers <rebekah.rogers at gmail.com> wrote:
> Hi:
>
> I'm currently running blast 2.2.14 locally on my mac.  I've noticed
> that the printout from a blastn run at an E cutoff of 10^-10 reads
> differently than a blast run at an E cutoff of 10^-7 when hits worse
> than 10^-10 are ignored.   Suddenly at 10^-7 new hits with evals of
> 10^-11 appear that weren't there before and even the relative strength
> of different hits can change.
>
> I'm not certain I understand why this is true and it has a huge impact
> on my results.  I know that the Eval is dependent on certain constants
> taken from the compared sequences, but I don't understand how this
> could possibly change when I'm using the exact same input file and
> database.
>
> Does anyone have an explanation?
>
> -Rebekah
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
-- 
Martin Gollery
Senior Bioinformatics Scientist
TimeLogic- a Division of Active Motif
775-833-9113
880 Northwood Blvd. Suite 7
Incline Village, NV 89451


From aey1531 at comcast.net  Mon Feb 11 10:50:32 2008
From: aey1531 at comcast.net (aey1531)
Date: Mon, 11 Feb 2008 10:50:32 -0500
Subject: [BiO BB] Inconsistent Blast Results
In-Reply-To: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>
References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com>
	<79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>
Message-ID: <00d501c86cc5$d6b45a00$841d0e00$@net>

Can you remove me from your email list

thanks

-----Original Message-----
From: bbb-bounces at bioinformatics.org [mailto:bbb-bounces at bioinformatics.org]
On Behalf Of Rebekah Rogers
Sent: Friday, February 08, 2008 8:57 PM
To: bbb at bioinformatics.org
Subject: [BiO BB] Inconsistent Blast Results

Hi:

I'm currently running blast 2.2.14 locally on my mac.  I've noticed
that the printout from a blastn run at an E cutoff of 10^-10 reads
differently than a blast run at an E cutoff of 10^-7 when hits worse
than 10^-10 are ignored.   Suddenly at 10^-7 new hits with evals of
10^-11 appear that weren't there before and even the relative strength
of different hits can change.

I'm not certain I understand why this is true and it has a huge impact
on my results.  I know that the Eval is dependent on certain constants
taken from the compared sequences, but I don't understand how this
could possibly change when I'm using the exact same input file and
database.

Does anyone have an explanation?

-Rebekah

_______________________________________________
BBB mailing list
BBB at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bbb


From rzimmer at MPLNet.com  Mon Feb 11 10:58:59 2008
From: rzimmer at MPLNet.com (Rob Zimmer)
Date: Mon, 11 Feb 2008 10:58:59 -0500
Subject: [BiO BB] Inconsistent Blast Results
References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com>
	<79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>
Message-ID: <73ACA48AF9871543A87B5CF26C311B3ED159EE@MPLNMail.mplnet.com>


Please note that Apocom Genomics provides the GrailEXP software which
incorporates BLAST and specific exon, Cpg island ID functions (as well
as many other features).  Anyone interested in learning more, should
e-mail me back.

Robin Zimmer

-----Original Message-----
From: bbb-bounces at bioinformatics.org
[mailto:bbb-bounces at bioinformatics.org] On Behalf Of Rebekah Rogers
Sent: Friday, February 08, 2008 8:57 PM
To: bbb at bioinformatics.org
Subject: [BiO BB] Inconsistent Blast Results

Hi:

I'm currently running blast 2.2.14 locally on my mac.  I've noticed
that the printout from a blastn run at an E cutoff of 10^-10 reads
differently than a blast run at an E cutoff of 10^-7 when hits worse
than 10^-10 are ignored.   Suddenly at 10^-7 new hits with evals of
10^-11 appear that weren't there before and even the relative strength
of different hits can change.

I'm not certain I understand why this is true and it has a huge impact
on my results.  I know that the Eval is dependent on certain constants
taken from the compared sequences, but I don't understand how this
could possibly change when I'm using the exact same input file and
database.

Does anyone have an explanation?

-Rebekah

_______________________________________________
BBB mailing list
BBB at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bbb


From mleczny at gmail.com  Mon Feb 11 11:00:10 2008
From: mleczny at gmail.com (Paco B C)
Date: Mon, 11 Feb 2008 17:00:10 +0100
Subject: [BiO BB] Ensembl and Gene Ontology terms
Message-ID: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>

Hi!
this is my first message in this list. My name is Paco and I'm doing my PhD.
on Bioinformatics in University of Leuven, Belgium.
I would like to build a java module that, given a list of Ensembl Gene
Identifiers, it would give back their related Gene Ontology terms. I've
accessed the GO database, but I can't find ENSG terms and I've read in the
Ensembl website that they give the link to external databases for
translation and transcript objects but not for genes (maybe in the future,
but not now).
My question is, do you know which database could I query in order to get
this relation within Ensembl and GO terms?
Thanks!
Paco


From delete at elfdata.com  Mon Feb 11 11:21:23 2008
From: delete at elfdata.com (Theodore H. Smith)
Date: Mon, 11 Feb 2008 16:21:23 +0000
Subject: [BiO BB] Looking for researcher, to assist on blast-like invention
Message-ID: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>


Hi everyone,

So I've been working, on and off, on this algorithm for quite a while  
now. It's basically an invention of mine. It is a "blast-like"  
algorithm, in that it does "Fuzzy lookup" operations across a database  
of letters. I am designing this algorithm to be useful for bio- 
informatics, this is the main field I am initially targetting.

The database will be filled with protein sequences, and the search  
across the database will be another protein sequence. The algorithm  
has a "scoring matrix", which can accept different protein replacement  
scores. The cost of inserting letters (protein letters) can be  
configured also.

In this sense, it's no different to Smith-Waterman. The same input,  
the same output!

The real difference from Smith-Waterman, is it's speed. My algorithm  
will be hugely faster. This is because I use many techniques to avoid  
processing unnecessary parts of the Smith-Waterman matrix.

I also use many tricks to reuse computations across various proteins.  
For example, the matrix for protein "ABCDE", is identical, at first  
anyhow, for the matrix for "ABCDEFG". This means if I have both  
proteins "ABCDE", and "ABCDEFG" in my protein database, I can test  
both of them against the search query, in almost half the time. My  
algorithm also runs in logarithmic-time with respect to the size of  
the database. Basically, bigger databases run disproportionately faster.

I want to turn this algorithm, into something useful for people. My  
first challenge here, is to answer the question "is this algorithm  
faster, or better than BLAST". If it is not faster, my algorithm  
basically has little use. But I have good hopes it will be faster! I  
am very good with these sort of things, you see :) Speed is my strong- 
point.

Currently, I do not know about the speed, because I haven't  
implemented a C++ version of my algorithm, or a good speed testing  
framework.

I do however know that my algorithm is more accurate than BLAST,  
because it is just as accurate as SSEARCH, as mine uses the Smith- 
Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- 
work basically. A fine heuristic, but still a heuristic. Mine is  
methodological, not heuristic based.

So here is what I am looking for!

I am hoping, that someone in the field will be able to offer me  
guidance, interest, enthusiasm, suggestions and maybe even do some  
testing for me.

Perhaps a student doing a bio-informatics related degree, who would  
like to write a paper on an alternative way of processing protein  
databases. My invention could be an interesting subject for a paper.

Or perhaps a researcher who just has an interest in these sort of  
things! Perhaps a researcher who feels there must be a better way of  
doing these things. Or anyone really in this field with the time and  
interest, and feels helping me could help him (or her) too in some way.

I'd like someone I can ask a lot of questions to, and show my software  
to, and explain my hopes what I can achieve with it.

Basically, my first question to you, would be "how would I set this up  
to be useful for someone", and "how would I test it's usefulness, what  
would you need to know about my algorithm that you would decide to use  
it over blast"

It's sort of a vague question from me, like "what do you need me to  
do", but... well that's where I am right now. Sort of a bit on the  
outside hoping someone on the inside will show me something.

So it's an opportunity to tell me what you want, basically!! Tell me,  
and I might just make it.

Who knows? Maybe one day in a few years time, everyone will be using  
this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You  
might be part of something.

Thanks to anyone who replies!

--
http://elfdata.com/plugin/
"String processing, done right"


From marchywka at hotmail.com  Mon Feb 11 12:51:58 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Mon, 11 Feb 2008 12:51:58 -0500
Subject: [BiO BB] Inconsistent Blast Results
In-Reply-To: <bdd10c2a0802110928w6354d866x8366b412ff7bd939@mail.gmail.com>
References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com>
	<79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> 
	<bdd10c2a0802110928w6354d866x8366b412ff7bd939@mail.gmail.com>
Message-ID: <BAY108-W3B5AF55DE63D2F40EB7AEBE2A0@phx.gbl>


>> than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of
>> 10^-11 appear that weren't there before and even the relative strength
>> of different hits can change.
>>

I think someone else suggested using the score not the e-value. I'd seen cases using a blast
server where I got confusing results so I just got in the habit of asking for a lot of marginal hits
and then sort them out locally with text scripts. 


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> Date: Mon, 11 Feb 2008 09:28:40 -0800
> From: marty.gollery at gmail.com
> To: bbb at bioinformatics.org
> Subject: Re: [BiO BB] Inconsistent Blast Results
>
> Hi Rebekah,
> I believe you are seeing differences because of scores getting thrown
> out at an earlier step. What I think is happening is that the hits are
> being cut off with the 10^-10 threshold that would have given better
> results in the alignment regeneration phase. Then when you run the
> search with the 10^-7 cutoff, those hits are allowed into the final
> step and they are extended to yield better scores.
>
> Best Regards,
> Marty
>
> On Feb 8, 2008 5:56 PM, Rebekah Rogers  wrote:
>> Hi:
>>
>> I'm currently running blast 2.2.14 locally on my mac. I've noticed
>> that the printout from a blastn run at an E cutoff of 10^-10 reads
>> differently than a blast run at an E cutoff of 10^-7 when hits worse
>> than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of
>> 10^-11 appear that weren't there before and even the relative strength
>> of different hits can change.
>>
>> I'm not certain I understand why this is true and it has a huge impact
>> on my results. I know that the Eval is dependent on certain constants
>> taken from the compared sequences, but I don't understand how this
>> could possibly change when I'm using the exact same input file and
>> database.
>>
>> Does anyone have an explanation?
>>
>> -Rebekah
>>
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
>
>
>
> --
> --
> Martin Gollery
> Senior Bioinformatics Scientist
> TimeLogic- a Division of Active Motif
> 775-833-9113
> 880 Northwood Blvd. Suite 7
> Incline Village, NV 89451
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Helping your favorite cause is as easy as instant messaging.?You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join


From golharam at umdnj.edu  Mon Feb 11 17:28:18 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 11 Feb 2008 17:28:18 -0500
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
Message-ID: <47B0CC02.8010206@umdnj.edu>

Why don't you write up a paper describing the algorithm in detail and 
submit it to a bioinformatics journal?  And, why not make the executable 
available with documentation so that people can download it and try it 
out for themselves.

Do you have any test cases that show it runs faster/better than BLAST? 
Describe them and make them available.


Theodore H. Smith wrote:
> Hi everyone,
> 
> So I've been working, on and off, on this algorithm for quite a while  
> now. It's basically an invention of mine. It is a "blast-like"  
> algorithm, in that it does "Fuzzy lookup" operations across a database  
> of letters. I am designing this algorithm to be useful for bio- 
> informatics, this is the main field I am initially targetting.
> 
> The database will be filled with protein sequences, and the search  
> across the database will be another protein sequence. The algorithm  
> has a "scoring matrix", which can accept different protein replacement  
> scores. The cost of inserting letters (protein letters) can be  
> configured also.
> 
> In this sense, it's no different to Smith-Waterman. The same input,  
> the same output!
> 
> The real difference from Smith-Waterman, is it's speed. My algorithm  
> will be hugely faster. This is because I use many techniques to avoid  
> processing unnecessary parts of the Smith-Waterman matrix.
> 
> I also use many tricks to reuse computations across various proteins.  
> For example, the matrix for protein "ABCDE", is identical, at first  
> anyhow, for the matrix for "ABCDEFG". This means if I have both  
> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test  
> both of them against the search query, in almost half the time. My  
> algorithm also runs in logarithmic-time with respect to the size of  
> the database. Basically, bigger databases run disproportionately faster.
> 
> I want to turn this algorithm, into something useful for people. My  
> first challenge here, is to answer the question "is this algorithm  
> faster, or better than BLAST". If it is not faster, my algorithm  
> basically has little use. But I have good hopes it will be faster! I  
> am very good with these sort of things, you see :) Speed is my strong- 
> point.
> 
> Currently, I do not know about the speed, because I haven't  
> implemented a C++ version of my algorithm, or a good speed testing  
> framework.
> 
> I do however know that my algorithm is more accurate than BLAST,  
> because it is just as accurate as SSEARCH, as mine uses the Smith- 
> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- 
> work basically. A fine heuristic, but still a heuristic. Mine is  
> methodological, not heuristic based.
> 
> So here is what I am looking for!
> 
> I am hoping, that someone in the field will be able to offer me  
> guidance, interest, enthusiasm, suggestions and maybe even do some  
> testing for me.
> 
> Perhaps a student doing a bio-informatics related degree, who would  
> like to write a paper on an alternative way of processing protein  
> databases. My invention could be an interesting subject for a paper.
> 
> Or perhaps a researcher who just has an interest in these sort of  
> things! Perhaps a researcher who feels there must be a better way of  
> doing these things. Or anyone really in this field with the time and  
> interest, and feels helping me could help him (or her) too in some way.
> 
> I'd like someone I can ask a lot of questions to, and show my software  
> to, and explain my hopes what I can achieve with it.
> 
> Basically, my first question to you, would be "how would I set this up  
> to be useful for someone", and "how would I test it's usefulness, what  
> would you need to know about my algorithm that you would decide to use  
> it over blast"
> 
> It's sort of a vague question from me, like "what do you need me to  
> do", but... well that's where I am right now. Sort of a bit on the  
> outside hoping someone on the inside will show me something.
> 
> So it's an opportunity to tell me what you want, basically!! Tell me,  
> and I might just make it.
> 
> Who knows? Maybe one day in a few years time, everyone will be using  
> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You  
> might be part of something.
> 
> Thanks to anyone who replies!
> 
> --
> http://elfdata.com/plugin/
> "String processing, done right"
> 
> 
> 
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
> 
> 


From marty.gollery at gmail.com  Mon Feb 11 17:49:10 2008
From: marty.gollery at gmail.com (Martin Gollery)
Date: Mon, 11 Feb 2008 14:49:10 -0800
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
Message-ID: <bdd10c2a0802111449u7cf1bde5h6a162d0085e5d604@mail.gmail.com>

On Feb 11, 2008 8:21 AM, Theodore H. Smith <delete at elfdata.com> wrote:
>
> Hi everyone,
>
> So I've been working, on and off, on this algorithm for quite a while
> now. It's basically an invention of mine. It is a "blast-like"
> algorithm, in that it does "Fuzzy lookup" operations across a database
> of letters. I am designing this algorithm to be useful for bio-
> informatics, this is the main field I am initially targetting.
>
> The database will be filled with protein sequences, and the search
> across the database will be another protein sequence. The algorithm
> has a "scoring matrix", which can accept different protein replacement
> scores. The cost of inserting letters (protein letters) can be
> configured also.
>
> In this sense, it's no different to Smith-Waterman. The same input,
> the same output!
>
> The real difference from Smith-Waterman, is it's speed. My algorithm
> will be hugely faster. This is because I use many techniques to avoid
> processing unnecessary parts of the Smith-Waterman matrix.
>
> I also use many tricks to reuse computations across various proteins.
> For example, the matrix for protein "ABCDE", is identical, at first
> anyhow, for the matrix for "ABCDEFG". This means if I have both
> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
> both of them against the search query, in almost half the time. My
> algorithm also runs in logarithmic-time with respect to the size of
> the database. Basically, bigger databases run disproportionately faster.
>
> I want to turn this algorithm, into something useful for people. My
> first challenge here, is to answer the question "is this algorithm
> faster, or better than BLAST". If it is not faster, my algorithm
> basically has little use. But I have good hopes it will be faster! I
> am very good with these sort of things, you see :) Speed is my strong-
> point.
>
> Currently, I do not know about the speed, because I haven't
> implemented a C++ version of my algorithm, or a good speed testing
> framework.
>
> I do however know that my algorithm is more accurate than BLAST,
> because it is just as accurate as SSEARCH, as mine uses the Smith-
> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess-
> work basically. A fine heuristic, but still a heuristic. Mine is
> methodological, not heuristic based.
>
> So here is what I am looking for!
>
> I am hoping, that someone in the field will be able to offer me
> guidance, interest, enthusiasm, suggestions and maybe even do some
> testing for me.
>
> Perhaps a student doing a bio-informatics related degree, who would
> like to write a paper on an alternative way of processing protein
> databases. My invention could be an interesting subject for a paper.
>
> Or perhaps a researcher who just has an interest in these sort of
> things! Perhaps a researcher who feels there must be a better way of
> doing these things. Or anyone really in this field with the time and
> interest, and feels helping me could help him (or her) too in some way.
>
> I'd like someone I can ask a lot of questions to, and show my software
> to, and explain my hopes what I can achieve with it.
>
> Basically, my first question to you, would be "how would I set this up
> to be useful for someone", and "how would I test it's usefulness, what
> would you need to know about my algorithm that you would decide to use
> it over blast"
>
> It's sort of a vague question from me, like "what do you need me to
> do", but... well that's where I am right now. Sort of a bit on the
> outside hoping someone on the inside will show me something.
>
> So it's an opportunity to tell me what you want, basically!! Tell me,
> and I might just make it.
>
> Who knows? Maybe one day in a few years time, everyone will be using
> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
> might be part of something.
>
> Thanks to anyone who replies!
>
> --
> http://elfdata.com/plugin/
> "String processing, done right"
>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
-- 
Martin Gollery
Senior Bioinformatics Scientist
TimeLogic- a Division of Active Motif
775-833-9113
880 Northwood Blvd. Suite 7
Incline Village, NV 89451


From marty.gollery at gmail.com  Mon Feb 11 17:51:15 2008
From: marty.gollery at gmail.com (Martin Gollery)
Date: Mon, 11 Feb 2008 14:51:15 -0800
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
Message-ID: <bdd10c2a0802111451q60cbec27q80d4e115e2d98922@mail.gmail.com>

The first step is to implement it in C++ to see how fast it is. Once
you have an executable, testing it will be relatively straightforward.

Marty


On Feb 11, 2008 8:21 AM, Theodore H. Smith <delete at elfdata.com> wrote:
>
> Hi everyone,
>
> So I've been working, on and off, on this algorithm for quite a while
> now. It's basically an invention of mine. It is a "blast-like"
> algorithm, in that it does "Fuzzy lookup" operations across a database
> of letters. I am designing this algorithm to be useful for bio-
> informatics, this is the main field I am initially targetting.
>
> The database will be filled with protein sequences, and the search
> across the database will be another protein sequence. The algorithm
> has a "scoring matrix", which can accept different protein replacement
> scores. The cost of inserting letters (protein letters) can be
> configured also.
>
> In this sense, it's no different to Smith-Waterman. The same input,
> the same output!
>
> The real difference from Smith-Waterman, is it's speed. My algorithm
> will be hugely faster. This is because I use many techniques to avoid
> processing unnecessary parts of the Smith-Waterman matrix.
>
> I also use many tricks to reuse computations across various proteins.
> For example, the matrix for protein "ABCDE", is identical, at first
> anyhow, for the matrix for "ABCDEFG". This means if I have both
> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
> both of them against the search query, in almost half the time. My
> algorithm also runs in logarithmic-time with respect to the size of
> the database. Basically, bigger databases run disproportionately faster.
>
> I want to turn this algorithm, into something useful for people. My
> first challenge here, is to answer the question "is this algorithm
> faster, or better than BLAST". If it is not faster, my algorithm
> basically has little use. But I have good hopes it will be faster! I
> am very good with these sort of things, you see :) Speed is my strong-
> point.
>
> Currently, I do not know about the speed, because I haven't
> implemented a C++ version of my algorithm, or a good speed testing
> framework.
>
> I do however know that my algorithm is more accurate than BLAST,
> because it is just as accurate as SSEARCH, as mine uses the Smith-
> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess-
> work basically. A fine heuristic, but still a heuristic. Mine is
> methodological, not heuristic based.
>
> So here is what I am looking for!
>
> I am hoping, that someone in the field will be able to offer me
> guidance, interest, enthusiasm, suggestions and maybe even do some
> testing for me.
>
> Perhaps a student doing a bio-informatics related degree, who would
> like to write a paper on an alternative way of processing protein
> databases. My invention could be an interesting subject for a paper.
>
> Or perhaps a researcher who just has an interest in these sort of
> things! Perhaps a researcher who feels there must be a better way of
> doing these things. Or anyone really in this field with the time and
> interest, and feels helping me could help him (or her) too in some way.
>
> I'd like someone I can ask a lot of questions to, and show my software
> to, and explain my hopes what I can achieve with it.
>
> Basically, my first question to you, would be "how would I set this up
> to be useful for someone", and "how would I test it's usefulness, what
> would you need to know about my algorithm that you would decide to use
> it over blast"
>
> It's sort of a vague question from me, like "what do you need me to
> do", but... well that's where I am right now. Sort of a bit on the
> outside hoping someone on the inside will show me something.
>
> So it's an opportunity to tell me what you want, basically!! Tell me,
> and I might just make it.
>
> Who knows? Maybe one day in a few years time, everyone will be using
> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
> might be part of something.
>
> Thanks to anyone who replies!
>
> --
> http://elfdata.com/plugin/
> "String processing, done right"
>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
-- 
Martin Gollery
Senior Bioinformatics Scientist
TimeLogic- a Division of Active Motif
775-833-9113
880 Northwood Blvd. Suite 7
Incline Village, NV 89451


From akunthavai at yahoo.co.in  Mon Feb 11 22:28:45 2008
From: akunthavai at yahoo.co.in (A KUNTHAVAI)
Date: Tue, 12 Feb 2008 03:28:45 +0000 (GMT)
Subject: [BiO BB] Homological DNA sequences
Message-ID: <28421.18270.qm@web8904.mail.in.yahoo.com>

Sir,
          I want to know the list of homological rice gene sequence to
 give as an input to Blastn, Blastp , blast2sq program. Please provide
 me the answer as early as possible. 
  A.Kunthavai
  Research Scholar
  Anna University


---------------------------------
 Did you know? You can CHAT without downloading messenger.  Click here

From marchywka at hotmail.com  Tue Feb 12 08:42:35 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 12 Feb 2008 08:42:35 -0500
Subject: [BiO BB] Looking for researcher,
 to assist on blast-like invention
In-Reply-To: <bdd10c2a0802111451q60cbec27q80d4e115e2d98922@mail.gmail.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<bdd10c2a0802111451q60cbec27q80d4e115e2d98922@mail.gmail.com>
Message-ID: <BAY108-W75A33301126C985EC0FF7BE2B0@phx.gbl>


>> I also use many tricks to reuse computations across various proteins.
>> For example, the matrix for protein "ABCDE", is identical, at first

Have you gotten any blast source code? This would be a good thing to
start with for a number of reasons. But, don't assume that a given implementation
is either well optimized of naive. Sure, they could have
code like

get_parameters();
metric=do_expensive_metric_thing();
if ( metric
_________________________________________________________________
Helping your favorite cause is as easy as instant messaging.?You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join


From aalibes at gmail.com  Tue Feb 12 10:45:00 2008
From: aalibes at gmail.com (=?ISO-8859-1?Q?Andreu_Alib=E9s?=)
Date: Tue, 12 Feb 2008 16:45:00 +0100
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
Message-ID: <885c6c040802120745g7c55440ai8275e132d3932da2@mail.gmail.com>

Why not making the code available to everybody in an Open Source
repository like sourceforge?

A

On Feb 11, 2008 5:21 PM, Theodore H. Smith <delete at elfdata.com> wrote:
>
> Hi everyone,
>
> So I've been working, on and off, on this algorithm for quite a while
> now. It's basically an invention of mine. It is a "blast-like"
> algorithm, in that it does "Fuzzy lookup" operations across a database
> of letters. I am designing this algorithm to be useful for bio-
> informatics, this is the main field I am initially targetting.
>
> The database will be filled with protein sequences, and the search
> across the database will be another protein sequence. The algorithm
> has a "scoring matrix", which can accept different protein replacement
> scores. The cost of inserting letters (protein letters) can be
> configured also.
>
> In this sense, it's no different to Smith-Waterman. The same input,
> the same output!
>
> The real difference from Smith-Waterman, is it's speed. My algorithm
> will be hugely faster. This is because I use many techniques to avoid
> processing unnecessary parts of the Smith-Waterman matrix.
>
> I also use many tricks to reuse computations across various proteins.
> For example, the matrix for protein "ABCDE", is identical, at first
> anyhow, for the matrix for "ABCDEFG". This means if I have both
> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
> both of them against the search query, in almost half the time. My
> algorithm also runs in logarithmic-time with respect to the size of
> the database. Basically, bigger databases run disproportionately faster.
>
> I want to turn this algorithm, into something useful for people. My
> first challenge here, is to answer the question "is this algorithm
> faster, or better than BLAST". If it is not faster, my algorithm
> basically has little use. But I have good hopes it will be faster! I
> am very good with these sort of things, you see :) Speed is my strong-
> point.
>
> Currently, I do not know about the speed, because I haven't
> implemented a C++ version of my algorithm, or a good speed testing
> framework.
>
> I do however know that my algorithm is more accurate than BLAST,
> because it is just as accurate as SSEARCH, as mine uses the Smith-
> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess-
> work basically. A fine heuristic, but still a heuristic. Mine is
> methodological, not heuristic based.
>
> So here is what I am looking for!
>
> I am hoping, that someone in the field will be able to offer me
> guidance, interest, enthusiasm, suggestions and maybe even do some
> testing for me.
>
> Perhaps a student doing a bio-informatics related degree, who would
> like to write a paper on an alternative way of processing protein
> databases. My invention could be an interesting subject for a paper.
>
> Or perhaps a researcher who just has an interest in these sort of
> things! Perhaps a researcher who feels there must be a better way of
> doing these things. Or anyone really in this field with the time and
> interest, and feels helping me could help him (or her) too in some way.
>
> I'd like someone I can ask a lot of questions to, and show my software
> to, and explain my hopes what I can achieve with it.
>
> Basically, my first question to you, would be "how would I set this up
> to be useful for someone", and "how would I test it's usefulness, what
> would you need to know about my algorithm that you would decide to use
> it over blast"
>
> It's sort of a vague question from me, like "what do you need me to
> do", but... well that's where I am right now. Sort of a bit on the
> outside hoping someone on the inside will show me something.
>
> So it's an opportunity to tell me what you want, basically!! Tell me,
> and I might just make it.
>
> Who knows? Maybe one day in a few years time, everyone will be using
> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
> might be part of something.
>
> Thanks to anyone who replies!
>
> --
> http://elfdata.com/plugin/
> "String processing, done right"
>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
Andreu Alib?s, PhD
Systems Biology Program - Center for Genomic Regulation
c/ Dr. Aiguader 88, 08003 Barcelona, Spain
Phone: +34 93 316 0258
http://aalibes.googlepages.com/


From bsmagic at gmail.com  Mon Feb 11 21:45:33 2008
From: bsmagic at gmail.com (Sheng Wang)
Date: Tue, 12 Feb 2008 10:45:33 +0800
Subject: [BiO BB] Homological DNA sequences
In-Reply-To: <152662.42187.qm@web8912.mail.in.yahoo.com>
References: <152662.42187.qm@web8912.mail.in.yahoo.com>
Message-ID: <793f8aed0802111845pc48d59bi5753281f37927c00@mail.gmail.com>

homology to what?

On 2/11/08, A KUNTHAVAI <akunthavai at yahoo.co.in> wrote:
>
> Sir,
>          I want to know the list of homological rice gene sequence to give
> as an input to Blastn, Blastp , blast2sq program. Please provide me the
> answer as early as possible.
> A.Kunthavai
> Research Scholar
> Anna University
>
>
>
>
> ---------------------------------
> Why delete messages? Unlimited storage is just a click away.
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
Best Regards
Sheng Wang


From bsmagic at gmail.com  Mon Feb 11 21:48:03 2008
From: bsmagic at gmail.com (Sheng Wang)
Date: Tue, 12 Feb 2008 10:48:03 +0800
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <bdd10c2a0802111451q60cbec27q80d4e115e2d98922@mail.gmail.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<bdd10c2a0802111451q60cbec27q80d4e115e2d98922@mail.gmail.com>
Message-ID: <793f8aed0802111848u1ce88078hff246eaffe83218d@mail.gmail.com>

Maybe the BLAST package should be a software to which the user could develop
3rd-part addon.

On 2/12/08, Martin Gollery <marty.gollery at gmail.com> wrote:
>
> The first step is to implement it in C++ to see how fast it is. Once
> you have an executable, testing it will be relatively straightforward.
>
> Marty
>
>
> On Feb 11, 2008 8:21 AM, Theodore H. Smith <delete at elfdata.com> wrote:
> >
> > Hi everyone,
> >
> > So I've been working, on and off, on this algorithm for quite a while
> > now. It's basically an invention of mine. It is a "blast-like"
> > algorithm, in that it does "Fuzzy lookup" operations across a database
> > of letters. I am designing this algorithm to be useful for bio-
> > informatics, this is the main field I am initially targetting.
> >
> > The database will be filled with protein sequences, and the search
> > across the database will be another protein sequence. The algorithm
> > has a "scoring matrix", which can accept different protein replacement
> > scores. The cost of inserting letters (protein letters) can be
> > configured also.
> >
> > In this sense, it's no different to Smith-Waterman. The same input,
> > the same output!
> >
> > The real difference from Smith-Waterman, is it's speed. My algorithm
> > will be hugely faster. This is because I use many techniques to avoid
> > processing unnecessary parts of the Smith-Waterman matrix.
> >
> > I also use many tricks to reuse computations across various proteins.
> > For example, the matrix for protein "ABCDE", is identical, at first
> > anyhow, for the matrix for "ABCDEFG". This means if I have both
> > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
> > both of them against the search query, in almost half the time. My
> > algorithm also runs in logarithmic-time with respect to the size of
> > the database. Basically, bigger databases run disproportionately faster.
> >
> > I want to turn this algorithm, into something useful for people. My
> > first challenge here, is to answer the question "is this algorithm
> > faster, or better than BLAST". If it is not faster, my algorithm
> > basically has little use. But I have good hopes it will be faster! I
> > am very good with these sort of things, you see :) Speed is my strong-
> > point.
> >
> > Currently, I do not know about the speed, because I haven't
> > implemented a C++ version of my algorithm, or a good speed testing
> > framework.
> >
> > I do however know that my algorithm is more accurate than BLAST,
> > because it is just as accurate as SSEARCH, as mine uses the Smith-
> > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess-
> > work basically. A fine heuristic, but still a heuristic. Mine is
> > methodological, not heuristic based.
> >
> > So here is what I am looking for!
> >
> > I am hoping, that someone in the field will be able to offer me
> > guidance, interest, enthusiasm, suggestions and maybe even do some
> > testing for me.
> >
> > Perhaps a student doing a bio-informatics related degree, who would
> > like to write a paper on an alternative way of processing protein
> > databases. My invention could be an interesting subject for a paper.
> >
> > Or perhaps a researcher who just has an interest in these sort of
> > things! Perhaps a researcher who feels there must be a better way of
> > doing these things. Or anyone really in this field with the time and
> > interest, and feels helping me could help him (or her) too in some way.
> >
> > I'd like someone I can ask a lot of questions to, and show my software
> > to, and explain my hopes what I can achieve with it.
> >
> > Basically, my first question to you, would be "how would I set this up
> > to be useful for someone", and "how would I test it's usefulness, what
> > would you need to know about my algorithm that you would decide to use
> > it over blast"
> >
> > It's sort of a vague question from me, like "what do you need me to
> > do", but... well that's where I am right now. Sort of a bit on the
> > outside hoping someone on the inside will show me something.
> >
> > So it's an opportunity to tell me what you want, basically!! Tell me,
> > and I might just make it.
> >
> > Who knows? Maybe one day in a few years time, everyone will be using
> > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
> > might be part of something.
> >
> > Thanks to anyone who replies!
> >
> > --
> > http://elfdata.com/plugin/
> > "String processing, done right"
> >
> >
> >
> > _______________________________________________
> > BBB mailing list
> > BBB at bioinformatics.org
> > http://www.bioinformatics.org/mailman/listinfo/bbb
> >
>
>
>
> --
> --
> Martin Gollery
> Senior Bioinformatics Scientist
> TimeLogic- a Division of Active Motif
> 775-833-9113
> 880 Northwood Blvd. Suite 7
> Incline Village, NV 89451
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
Best Regards
Sheng Wang


From bsmagic at gmail.com  Mon Feb 11 21:50:14 2008
From: bsmagic at gmail.com (Sheng Wang)
Date: Tue, 12 Feb 2008 10:50:14 +0800
Subject: [BiO BB] Ensembl and Gene Ontology terms
In-Reply-To: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
Message-ID: <793f8aed0802111850j63b44dd9s865ea1514ad2e649@mail.gmail.com>

it seemed that there's a similar software listed in GO official website.

On 2/12/08, Paco B C <mleczny at gmail.com> wrote:
>
> Hi!
> this is my first message in this list. My name is Paco and I'm doing my
> PhD.
> on Bioinformatics in University of Leuven, Belgium.
> I would like to build a java module that, given a list of Ensembl Gene
> Identifiers, it would give back their related Gene Ontology terms. I've
> accessed the GO database, but I can't find ENSG terms and I've read in the
> Ensembl website that they give the link to external databases for
> translation and transcript objects but not for genes (maybe in the future,
> but not now).
> My question is, do you know which database could I query in order to get
> this relation within Ensembl and GO terms?
> Thanks!
> Paco
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
Best Regards
Sheng Wang


From alen295200 at gmail.com  Tue Feb 12 06:49:20 2008
From: alen295200 at gmail.com (anil kumar)
Date: Tue, 12 Feb 2008 03:49:20 -0800
Subject: [BiO BB] Homological DNA sequences
In-Reply-To: <28421.18270.qm@web8904.mail.in.yahoo.com>
References: <28421.18270.qm@web8904.mail.in.yahoo.com>
Message-ID: <3d0047290802120349ic9f4e6emc4c8c8de3ce580b7@mail.gmail.com>

hi,
rice genome u can get from TIGR.
your problem does not seem to me defined. any way u download the data and
work.


On 2/11/08, A KUNTHAVAI <akunthavai at yahoo.co.in> wrote:
>
> Sir,
>          I want to know the list of homological rice gene sequence to
> give as an input to Blastn, Blastp , blast2sq program. Please provide
> me the answer as early as possible.
> A.Kunthavai
> Research Scholar
> Anna University
>
>
>
>
> ---------------------------------
> Did you know? You can CHAT without downloading messenger.  Click here
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From dan.bolser at gmail.com  Tue Feb 12 04:15:48 2008
From: dan.bolser at gmail.com (Dan Bolser)
Date: Tue, 12 Feb 2008 10:15:48 +0100
Subject: [BiO BB] Ensembl and Gene Ontology terms
In-Reply-To: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
Message-ID: <2c8757af0802120115s3c64ce62hf421b9e442b8804e@mail.gmail.com>

You could try SRS?

Not sure if it has what you need... Else I think Uniprot has links to
GO and Ensembl ... I think. At least Uniprot links to go, and SRS
links Uniprot and Ensembl! You would think we had all this sorted out
by now!


----

Talk to the experts;
irc://irc.freenode.net/#bioinformatics


On 11/02/2008, Paco B C <mleczny at gmail.com> wrote:
> Hi!
> this is my first message in this list. My name is Paco and I'm doing my PhD.
> on Bioinformatics in University of Leuven, Belgium.
> I would like to build a java module that, given a list of Ensembl Gene
> Identifiers, it would give back their related Gene Ontology terms. I've
> accessed the GO database, but I can't find ENSG terms and I've read in the
> Ensembl website that they give the link to external databases for
> translation and transcript objects but not for genes (maybe in the future,
> but not now).
> My question is, do you know which database could I query in order to get
> this relation within Ensembl and GO terms?
> Thanks!
> Paco
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
hello


From delete at elfdata.com  Mon Feb 11 18:56:41 2008
From: delete at elfdata.com (Theodore H. Smith)
Date: Mon, 11 Feb 2008 23:56:41 +0000
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <47B0CC02.8010206@umdnj.edu>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<47B0CC02.8010206@umdnj.edu>
Message-ID: <B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>


On 11 Feb 2008, at 22:28, Ryan Golhar wrote:

> Why don't you write up a paper describing the algorithm in detail and
> submit it to a bioinformatics journal?  And, why not make the  
> executable
> available with documentation so that people can download it and try it
> out for themselves.
>
> Do you have any test cases that show it runs faster/better than BLAST?
> Describe them and make them available.

The first thing I'd need to do is make a good test. I'm not sure what  
constitutes "a good test", in this case.

How big should the databanks be to make the test reasonable? Is  
randomly generated data good enough, or is a randomly selected sample  
better. If a sample is better, how large a dataset must I gather to do  
the test.

Perhaps certain settings make my algorithm work better or worse  
relative to BLAST. But then how do I know which settings are more  
likely to be used and which aren't?

I think someone who uses BLAST frequently, and knows it well from a  
user's perspective... might have a better feel for creating a test  
than I might.

The worst thing that could happen is I make a test, which is unfairly  
prejudiced to my algorithm :) The next thing that would happen is  
people would see my test has "suspiciously good" results, and... be  
annoyed about that, and lose interest, even if it were an innocent  
mistake on my end. I'd rather avoid that sort of mistake by getting a  
knowledged eye in the designing of a test!

Like I said, I haven't gotten all the code in C++ yet. I've got a  
framework in C++ already, I mean I know how to write C++. And I know  
what to do, as I've written it in a proto-typing language.

The C++ version will come soon, though.

> Theodore H. Smith wrote:
>> Hi everyone,
>>
>> So I've been working, on and off, on this algorithm for quite a while
>> now. It's basically an invention of mine. It is a "blast-like"
>> algorithm, in that it does "Fuzzy lookup" operations across a  
>> database
>> of letters. I am designing this algorithm to be useful for bio-
>> informatics, this is the main field I am initially targetting.
>>
>> The database will be filled with protein sequences, and the search
>> across the database will be another protein sequence. The algorithm
>> has a "scoring matrix", which can accept different protein  
>> replacement
>> scores. The cost of inserting letters (protein letters) can be
>> configured also.
>>
>> In this sense, it's no different to Smith-Waterman. The same input,
>> the same output!
>>
>> The real difference from Smith-Waterman, is it's speed. My algorithm
>> will be hugely faster. This is because I use many techniques to avoid
>> processing unnecessary parts of the Smith-Waterman matrix.
>>
>> I also use many tricks to reuse computations across various proteins.
>> For example, the matrix for protein "ABCDE", is identical, at first
>> anyhow, for the matrix for "ABCDEFG". This means if I have both
>> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
>> both of them against the search query, in almost half the time. My
>> algorithm also runs in logarithmic-time with respect to the size of
>> the database. Basically, bigger databases run disproportionately  
>> faster.
>>
>> I want to turn this algorithm, into something useful for people. My
>> first challenge here, is to answer the question "is this algorithm
>> faster, or better than BLAST". If it is not faster, my algorithm
>> basically has little use. But I have good hopes it will be faster! I
>> am very good with these sort of things, you see :) Speed is my  
>> strong-
>> point.
>>
>> Currently, I do not know about the speed, because I haven't
>> implemented a C++ version of my algorithm, or a good speed testing
>> framework.
>>
>> I do however know that my algorithm is more accurate than BLAST,
>> because it is just as accurate as SSEARCH, as mine uses the Smith-
>> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent  
>> guess-
>> work basically. A fine heuristic, but still a heuristic. Mine is
>> methodological, not heuristic based.
>>
>> So here is what I am looking for!
>>
>> I am hoping, that someone in the field will be able to offer me
>> guidance, interest, enthusiasm, suggestions and maybe even do some
>> testing for me.
>>
>> Perhaps a student doing a bio-informatics related degree, who would
>> like to write a paper on an alternative way of processing protein
>> databases. My invention could be an interesting subject for a paper.
>>
>> Or perhaps a researcher who just has an interest in these sort of
>> things! Perhaps a researcher who feels there must be a better way of
>> doing these things. Or anyone really in this field with the time and
>> interest, and feels helping me could help him (or her) too in some  
>> way.
>>
>> I'd like someone I can ask a lot of questions to, and show my  
>> software
>> to, and explain my hopes what I can achieve with it.
>>
>> Basically, my first question to you, would be "how would I set this  
>> up
>> to be useful for someone", and "how would I test it's usefulness,  
>> what
>> would you need to know about my algorithm that you would decide to  
>> use
>> it over blast"
>>
>> It's sort of a vague question from me, like "what do you need me to
>> do", but... well that's where I am right now. Sort of a bit on the
>> outside hoping someone on the inside will show me something.
>>
>> So it's an opportunity to tell me what you want, basically!! Tell me,
>> and I might just make it.
>>
>> Who knows? Maybe one day in a few years time, everyone will be using
>> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
>> might be part of something.
>>
>> Thanks to anyone who replies!
>>
>> --
>> http://elfdata.com/plugin/
>> "String processing, done right"
>>
>>
>>
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
>>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

--
http://elfdata.com/plugin/
"String processing, done right"


From dorjetarap at googlemail.com  Tue Feb 12 09:30:18 2008
From: dorjetarap at googlemail.com (dorje tarap)
Date: Tue, 12 Feb 2008 14:30:18 +0000
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <BAY108-W75A33301126C985EC0FF7BE2B0@phx.gbl>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<bdd10c2a0802111451q60cbec27q80d4e115e2d98922@mail.gmail.com>
	<BAY108-W75A33301126C985EC0FF7BE2B0@phx.gbl>
Message-ID: <d6a0d8f10802120630i49a269f0o7469622432e83cbf@mail.gmail.com>

Hi Mike,

"Faster than blast" or even "more accurate than blast" type algorithms have
been around for some time now. Some interesting examples are patternhunter
(commercial)  http://www.bioinformaticssolutions.com/products/ph/ and YAST
(opensource); Both claim to be significantly faster and more accurate than
BLAST, unfortunately they are not as popular.

I suspect this is for a few reasons: Blast has been around for a while, and
has gained some confidence in the bioinformatics sector; The reporting of
the statistical significance (e-values) is easy to interpret; And it has a
large genomics database readily available. For an algorithm to replace
blast, it would have to tick a lot of these boxes.

Your approach seems pretty interesting as you mention it is not a heuristic
algorithm, whereas the main approach recently seems to be using the
"spaced-seeds" concept introduced in PatternHunter. Your approach sounds
somewhat similar to the four-russians speedup and any way to speed up the
dynamic programming algorithm, without sacrificing speed would benefit a
number of feilds, not just bioinformatics.

I guess, the first step would be to outline your algorithm into a draft
paper to get a better understanding of your approach.

HTH

Karma

On 12/02/2008, Mike Marchywka <marchywka at hotmail.com> wrote:
>
>
>
> >> I also use many tricks to reuse computations across various proteins.
> >> For example, the matrix for protein "ABCDE", is identical, at first
>
> Have you gotten any blast source code? This would be a good thing to
> start with for a number of reasons. But, don't assume that a given
> implementation
> is either well optimized of naive. Sure, they could have
> code like
>
> get_parameters();
> metric=do_expensive_metric_thing();
> if ( metric
> _________________________________________________________________
> Helping your favorite cause is as easy as instant messaging. You IM, we
> give.
> http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From dwrice at indiana.edu  Tue Feb 12 01:49:24 2008
From: dwrice at indiana.edu (Danny Rice)
Date: Tue, 12 Feb 2008 01:49:24 -0500
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
Message-ID: <47B14174.1080109@indiana.edu>

If you have a way to speed up the full S&W algorithm it would be 
interesting whether or not it is faster than BLAST.  I would focus on 
showing it is significantly faster than the current implementations of 
the smith and waterman.  Such an algorithm could be incorporated into 
BLAST or any other dynamic programming algorithm.  You can test it, for 
example, by searching the swissprot database 
ftp://ftp.ncbi.nih.gov/blast/db/swissprot.tar.gz with a bunch of queries 
pulled from this database.  It would seem, however, that you could 
calculate the time savings directly as a function of conditions.  You 
shouldn't need any help with this.  Just show it is significantly faster 
than the S&W, while searching the entire matrix, and you are golden.

Theodore H. Smith wrote:
> Hi everyone,
>
> So I've been working, on and off, on this algorithm for quite a while  
> now. It's basically an invention of mine. It is a "blast-like"  
> algorithm, in that it does "Fuzzy lookup" operations across a database  
> of letters. I am designing this algorithm to be useful for bio- 
> informatics, this is the main field I am initially targetting.
>
> The database will be filled with protein sequences, and the search  
> across the database will be another protein sequence. The algorithm  
> has a "scoring matrix", which can accept different protein replacement  
> scores. The cost of inserting letters (protein letters) can be  
> configured also.
>
> In this sense, it's no different to Smith-Waterman. The same input,  
> the same output!
>
> The real difference from Smith-Waterman, is it's speed. My algorithm  
> will be hugely faster. This is because I use many techniques to avoid  
> processing unnecessary parts of the Smith-Waterman matrix.
>
> I also use many tricks to reuse computations across various proteins.  
> For example, the matrix for protein "ABCDE", is identical, at first  
> anyhow, for the matrix for "ABCDEFG". This means if I have both  
> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test  
> both of them against the search query, in almost half the time. My  
> algorithm also runs in logarithmic-time with respect to the size of  
> the database. Basically, bigger databases run disproportionately faster.
>
> I want to turn this algorithm, into something useful for people. My  
> first challenge here, is to answer the question "is this algorithm  
> faster, or better than BLAST". If it is not faster, my algorithm  
> basically has little use. But I have good hopes it will be faster! I  
> am very good with these sort of things, you see :) Speed is my strong- 
> point.
>
> Currently, I do not know about the speed, because I haven't  
> implemented a C++ version of my algorithm, or a good speed testing  
> framework.
>
> I do however know that my algorithm is more accurate than BLAST,  
> because it is just as accurate as SSEARCH, as mine uses the Smith- 
> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- 
> work basically. A fine heuristic, but still a heuristic. Mine is  
> methodological, not heuristic based.
>
> So here is what I am looking for!
>
> I am hoping, that someone in the field will be able to offer me  
> guidance, interest, enthusiasm, suggestions and maybe even do some  
> testing for me.
>
> Perhaps a student doing a bio-informatics related degree, who would  
> like to write a paper on an alternative way of processing protein  
> databases. My invention could be an interesting subject for a paper.
>
> Or perhaps a researcher who just has an interest in these sort of  
> things! Perhaps a researcher who feels there must be a better way of  
> doing these things. Or anyone really in this field with the time and  
> interest, and feels helping me could help him (or her) too in some way.
>
> I'd like someone I can ask a lot of questions to, and show my software  
> to, and explain my hopes what I can achieve with it.
>
> Basically, my first question to you, would be "how would I set this up  
> to be useful for someone", and "how would I test it's usefulness, what  
> would you need to know about my algorithm that you would decide to use  
> it over blast"
>
> It's sort of a vague question from me, like "what do you need me to  
> do", but... well that's where I am right now. Sort of a bit on the  
> outside hoping someone on the inside will show me something.
>
> So it's an opportunity to tell me what you want, basically!! Tell me,  
> and I might just make it.
>
> Who knows? Maybe one day in a few years time, everyone will be using  
> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You  
> might be part of something.
>
> Thanks to anyone who replies!
>
> --
> http://elfdata.com/plugin/
> "String processing, done right"
>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>   


From marchywka at hotmail.com  Tue Feb 12 10:00:44 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 12 Feb 2008 10:00:44 -0500
Subject: [BiO BB] Looking for researcher,
 to assist on blast-like invention
In-Reply-To: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
Message-ID: <BAY108-W23248FDE60883DEC92F4ECBE2B0@phx.gbl>


[ this stupid hotmail editor cutoff my last message,  I guess "plain text" still expects formating... ]

> I am hoping, that someone in the field will be able to offer me
> guidance, interest, enthusiasm, suggestions and maybe even do some
> testing for me.

Anyway, the rest of my prior post wasn't all that interesting but I would
also suggest you can read the literature and find problems. 

http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&&term=blast+limitation

just skimming hits, there are new lab techniques and these may have artifacts or
quirks that need certain ID features. Alternatively, if you can find a specific confusing result
and sort it out with your technique that would be a good proof of concept.

Don't immediately dismiss this approach as there is so much new literature these days that
there may be problems and solutions waiting to be matched as research groups are too busy on
their own or different matters.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> From: delete at elfdata.com
> To: bbb at bioinformatics.org
> Date: Mon, 11 Feb 2008 16:21:23 +0000
> Subject: [BiO BB] Looking for researcher, to assist on blast-like invention
>
>
> Hi everyone,
>
> So I've been working, on and off, on this algorithm for quite a while
> now. It's basically an invention of mine. It is a "blast-like"
> algorithm, in that it does "Fuzzy lookup" operations across a database
> of letters. I am designing this algorithm to be useful for bio-
> informatics, this is the main field I am initially targetting.
>
> The database will be filled with protein sequences, and the search
> across the database will be another protein sequence. The algorithm
> has a "scoring matrix", which can accept different protein replacement
> scores. The cost of inserting letters (protein letters) can be
> configured also.
>
> In this sense, it's no different to Smith-Waterman. The same input,
> the same output!
>
> The real difference from Smith-Waterman, is it's speed. My algorithm
> will be hugely faster. This is because I use many techniques to avoid
> processing unnecessary parts of the Smith-Waterman matrix.
>
> I also use many tricks to reuse computations across various proteins.
> For example, the matrix for protein "ABCDE", is identical, at first
> anyhow, for the matrix for "ABCDEFG". This means if I have both
> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
> both of them against the search query, in almost half the time. My
> algorithm also runs in logarithmic-time with respect to the size of
> the database. Basically, bigger databases run disproportionately faster.
>
> I want to turn this algorithm, into something useful for people. My
> first challenge here, is to answer the question "is this algorithm
> faster, or better than BLAST". If it is not faster, my algorithm
> basically has little use. But I have good hopes it will be faster! I
> am very good with these sort of things, you see :) Speed is my strong-
> point.
>
> Currently, I do not know about the speed, because I haven't
> implemented a C++ version of my algorithm, or a good speed testing
> framework.
>
> I do however know that my algorithm is more accurate than BLAST,
> because it is just as accurate as SSEARCH, as mine uses the Smith-
> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess-
> work basically. A fine heuristic, but still a heuristic. Mine is
> methodological, not heuristic based.
>
> So here is what I am looking for!
>
> I am hoping, that someone in the field will be able to offer me
> guidance, interest, enthusiasm, suggestions and maybe even do some
> testing for me.
>
> Perhaps a student doing a bio-informatics related degree, who would
> like to write a paper on an alternative way of processing protein
> databases. My invention could be an interesting subject for a paper.
>
> Or perhaps a researcher who just has an interest in these sort of
> things! Perhaps a researcher who feels there must be a better way of
> doing these things. Or anyone really in this field with the time and
> interest, and feels helping me could help him (or her) too in some way.
>
> I'd like someone I can ask a lot of questions to, and show my software
> to, and explain my hopes what I can achieve with it.
>
> Basically, my first question to you, would be "how would I set this up
> to be useful for someone", and "how would I test it's usefulness, what
> would you need to know about my algorithm that you would decide to use
> it over blast"
>
> It's sort of a vague question from me, like "what do you need me to
> do", but... well that's where I am right now. Sort of a bit on the
> outside hoping someone on the inside will show me something.
>
> So it's an opportunity to tell me what you want, basically!! Tell me,
> and I might just make it.
>
> Who knows? Maybe one day in a few years time, everyone will be using
> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
> might be part of something.
>
> Thanks to anyone who replies!
>
> --
> http://elfdata.com/plugin/
> "String processing, done right"
>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Connect and share in new ways with Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008


From oceanhu at 126.com  Tue Feb 12 04:43:46 2008
From: oceanhu at 126.com (ocean)
Date: Tue, 12 Feb 2008 17:43:46 +0800 (CST)
Subject: [BiO BB] Ensembl and Gene Ontology terms
In-Reply-To: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
Message-ID: <23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com>

 Hey,Paco
 
i think you can try BIOMART(www.biomart.org).  this database had done such thing already.
you can  contract them for some help.
 
good luck!
 
Huhaiyang
 
 
?2008-02-12?"Paco B C" <mleczny at gmail.com> ???

Hi!
this is my first message in this list. My name is Paco and I'm doing my PhD.
on Bioinformatics in University of Leuven, Belgium.
I would like to build a java module that, given a list of Ensembl Gene
Identifiers, it would give back their related Gene Ontology terms. I've
accessed the GO database, but I can't find ENSG terms and I've read in the
Ensembl website that they give the link to external databases for
translation and transcript objects but not for genes (maybe in the future,
but not now).
My question is, do you know which database could I query in order to get
this relation within Ensembl and GO terms?
Thanks!
Paco
_______________________________________________
BBB mailing list
BBB at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bbb


From pace_john at hotmail.com  Tue Feb 12 10:50:46 2008
From: pace_john at hotmail.com (John Pace)
Date: Tue, 12 Feb 2008 09:50:46 -0600
Subject: [BiO BB] Inconsistent Blast Results
In-Reply-To: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>
References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com>
	<79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com>
Message-ID: <BAY105-W1533DB23F7E6DD44407D72842B0@phx.gbl>

Rebekah,
 
The reason for this is the way Blast calculates e-values.  The e-value is a function of the score.  The higher the score, the lower the e-value.  The score gets lower as the alignment gets worse and also depends on the length of the query sequence.  So, for a lower e-value to be obtained, say 10e-10, the alignment for the HSP must be better than the alignment for the HSP that generates an e-value of 10e-7.  If the alignment can be worse, chances are that more of the query sequence will show up in the HSPs, thus creating different output.  Also, the e-value is a function of the length of the sequence and the size of the database.  So a shorter query sequence that is 10% diverged from the hit will have a higher e-value than a query sequence that is 5 times longer than the short sequence with the same divergence.
 
I hope this helps. Unfortunately, comparing different e-values in Blast can be a little like comparing apples to oranges.  I have found that this can be circumvented by using a sliding e-value.  You can use this to make sure all query sequences, regardless of length, match a certain criteria, such as at least 50% similarity over the entire length of the query sequence.  It gets a little more complicated, but at least it is comparing apples to apples.
 
Thanks,
John Pace
PhD Candidate
University of Texas at Arlington


> Date: Fri, 8 Feb 2008 20:56:41 -0500> From: rebekah.rogers at gmail.com> To: bbb at bioinformatics.org> Subject: [BiO BB] Inconsistent Blast Results> > Hi:> > I'm currently running blast 2.2.14 locally on my mac. I've noticed> that the printout from a blastn run at an E cutoff of 10^-10 reads> differently than a blast run at an E cutoff of 10^-7 when hits worse> than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of> 10^-11 appear that weren't there before and even the relative strength> of different hits can change.> > I'm not certain I understand why this is true and it has a huge impact> on my results. I know that the Eval is dependent on certain constants> taken from the compared sequences, but I don't understand how this> could possibly change when I'm using the exact same input file and> database.> > Does anyone have an explanation?> > -Rebekah> > _______________________________________________> BBB mailing list> BBB at bioinformatics.org> http://www.bioinformatics.org/mailman/listinfo/bbb
_________________________________________________________________
Climb to the top of the charts!?Play the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan

From marchywka at hotmail.com  Tue Feb 12 13:57:06 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 12 Feb 2008 13:57:06 -0500
Subject: [BiO BB] Looking for researcher,
 to assist on blast-like invention
In-Reply-To: <B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<47B0CC02.8010206@umdnj.edu> 
	<B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com> 
Message-ID: <BAY108-W3640A02ADC5DC7BF4E4339BE2B0@phx.gbl>


> I think someone who uses BLAST frequently, and knows it well from a
> user's perspective... might have a better feel for creating a test
> than I might.

It can be hard to solicit problems from people but
it helps if you have some idea a priori what you are trying to accomplish before you test :)
Certainly test what you expect to achieve and see if the tradeoff/"pathological"
cases are what you expect, and then test with pseudo random input
if you have some way to generate an expected result.

To give you an example, right now I'm writing
stuff that re-invents the wheel with a few things I'm ultimately hoping to identify
and improve. I have scripts to generate random numbers to obtain unknown pieces
of genome and "blast" / search against those assuming they will be "negative controls."
Of course, a "hit" would require further examination but you get the idea. On
genome, you get all kinds of "odd stuff" including discovering the unappreciated
but highly conserved sequence "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN" LOL.
You get the idea.


I also have some splice related code I'm working on and I eventually found out
about particular cases like DSCAM that make good tests. If you are aware of 
hits that normal blast doesn't turn up with typical search parameters, that would
obviously make a good test case.

Most of the papers that came up in the pubmed search probably list open issues
( or else their conclusion would be translated, " therefore, we require not further funding
from our sponsor" LOL) so just reading the literature should give you some relevant ideas.

Also, besides the bio literature, check out computer/algorithm literature at places like citeseer.


_________________________________________________________________
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/


From delete at elfdata.com  Tue Feb 12 11:13:03 2008
From: delete at elfdata.com (Theodore H. Smith)
Date: Tue, 12 Feb 2008 16:13:03 +0000
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <885c6c040802120745g7c55440ai8275e132d3932da2@mail.gmail.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<885c6c040802120745g7c55440ai8275e132d3932da2@mail.gmail.com>
Message-ID: <411D6BDA-6BCA-441D-8345-AF8D3BE36029@elfdata.com>

Hi Andreu,

I am definitely making my source code available to everyone, under  
open source agreement. I am not going the commercial route. And while  
I will protect my intellectual property, I am not going the patent  
route. I am not a believer of the whole aggressive "stop people doing  
stuff" idea.

I should have said that I am making this open source, at the start.

The main reason I am delaying in making it open source, is that I  
don't have a C++ version yet, so I have nothing to offer. And also I  
find source forge awkward to use and wastes a lot of time, compared to  
me just uploading the source code directly to my website and just  
putting an agreement saying "this is open source".

As for writing a paper... I don't really have the background in  
University to write a paper, meaning it would take me a lot longer to  
do than someone experienced in writing papers. And to be honest I feel  
it would distract me from my main goal, which is to spend my time  
doing something productive. I would rather someone else write a paper  
for me :) I think this would be a fair arrangement.

But I am happy to explain my algorithm.

I think I should write up a document however explaining it. Maybe not  
in academia, more in software developer style.

Thanks for all the interest and suggestion everyone. It's helping a lot.

On 12 Feb 2008, at 15:45, Andreu Alib?s wrote:

> Why not making the code available to everybody in an Open Source
> repository like sourceforge?
>
> A
>
> On Feb 11, 2008 5:21 PM, Theodore H. Smith <delete at elfdata.com> wrote:
>>
>> Hi everyone,
>>
>> So I've been working, on and off, on this algorithm for quite a while
>> now. It's basically an invention of mine. It is a "blast-like"
>> algorithm, in that it does "Fuzzy lookup" operations across a  
>> database
>> of letters. I am designing this algorithm to be useful for bio-
>> informatics, this is the main field I am initially targetting.
>>
>> The database will be filled with protein sequences, and the search
>> across the database will be another protein sequence. The algorithm
>> has a "scoring matrix", which can accept different protein  
>> replacement
>> scores. The cost of inserting letters (protein letters) can be
>> configured also.
>>
>> In this sense, it's no different to Smith-Waterman. The same input,
>> the same output!
>>
>> The real difference from Smith-Waterman, is it's speed. My algorithm
>> will be hugely faster. This is because I use many techniques to avoid
>> processing unnecessary parts of the Smith-Waterman matrix.
>>
>> I also use many tricks to reuse computations across various proteins.
>> For example, the matrix for protein "ABCDE", is identical, at first
>> anyhow, for the matrix for "ABCDEFG". This means if I have both
>> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
>> both of them against the search query, in almost half the time. My
>> algorithm also runs in logarithmic-time with respect to the size of
>> the database. Basically, bigger databases run disproportionately  
>> faster.
>>
>> I want to turn this algorithm, into something useful for people. My
>> first challenge here, is to answer the question "is this algorithm
>> faster, or better than BLAST". If it is not faster, my algorithm
>> basically has little use. But I have good hopes it will be faster! I
>> am very good with these sort of things, you see :) Speed is my  
>> strong-
>> point.
>>
>> Currently, I do not know about the speed, because I haven't
>> implemented a C++ version of my algorithm, or a good speed testing
>> framework.
>>
>> I do however know that my algorithm is more accurate than BLAST,
>> because it is just as accurate as SSEARCH, as mine uses the Smith-
>> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent  
>> guess-
>> work basically. A fine heuristic, but still a heuristic. Mine is
>> methodological, not heuristic based.
>>
>> So here is what I am looking for!
>>
>> I am hoping, that someone in the field will be able to offer me
>> guidance, interest, enthusiasm, suggestions and maybe even do some
>> testing for me.
>>
>> Perhaps a student doing a bio-informatics related degree, who would
>> like to write a paper on an alternative way of processing protein
>> databases. My invention could be an interesting subject for a paper.
>>
>> Or perhaps a researcher who just has an interest in these sort of
>> things! Perhaps a researcher who feels there must be a better way of
>> doing these things. Or anyone really in this field with the time and
>> interest, and feels helping me could help him (or her) too in some  
>> way.
>>
>> I'd like someone I can ask a lot of questions to, and show my  
>> software
>> to, and explain my hopes what I can achieve with it.
>>
>> Basically, my first question to you, would be "how would I set this  
>> up
>> to be useful for someone", and "how would I test it's usefulness,  
>> what
>> would you need to know about my algorithm that you would decide to  
>> use
>> it over blast"
>>
>> It's sort of a vague question from me, like "what do you need me to
>> do", but... well that's where I am right now. Sort of a bit on the
>> outside hoping someone on the inside will show me something.
>>
>> So it's an opportunity to tell me what you want, basically!! Tell me,
>> and I might just make it.
>>
>> Who knows? Maybe one day in a few years time, everyone will be using
>> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
>> might be part of something.
>>
>> Thanks to anyone who replies!
>>
>> --
>> http://elfdata.com/plugin/
>> "String processing, done right"
>>
>>
>>
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
>
>
>
> -- 
> Andreu Alib?s, PhD
> Systems Biology Program - Center for Genomic Regulation
> c/ Dr. Aiguader 88, 08003 Barcelona, Spain
> Phone: +34 93 316 0258
> http://aalibes.googlepages.com/
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

--
http://elfdata.com/plugin/
"String processing, done right"


From dankoc at gmail.com  Tue Feb 12 11:15:10 2008
From: dankoc at gmail.com (Charles Danko)
Date: Tue, 12 Feb 2008 11:15:10 -0500
Subject: [BiO BB] Ensembl and Gene Ontology terms
In-Reply-To: <23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com>
References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
	<23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com>
Message-ID: <8adccabf0802120815y52942ca0ia0bbdd60cdccc627@mail.gmail.com>

Hi, Paco,

If you want to do it all programmically from Java, I would suggest
Googling a protocol called DAS.  ENSEMBL has a DAS server from which
you should be able to pull GO annotations for ENSEMBL IDs quite
easily.

You can find java-based libraries to access a DAS connection, and
parse the resulting information here:
http://www.spice-3d.org/dasobert/.

Good luck!
Charles

2008/2/12 ocean <oceanhu at 126.com>:
>  Hey,Paco
>
> i think you can try BIOMART(www.biomart.org).  this database had done such thing already.
> you can  contract them for some help.
>
> good luck!
>
> Huhaiyang
>
>
>
>
> ?2008-02-12?"Paco B C" <mleczny at gmail.com> ???
>
>
> Hi!
> this is my first message in this list. My name is Paco and I'm doing my PhD.
> on Bioinformatics in University of Leuven, Belgium.
> I would like to build a java module that, given a list of Ensembl Gene
> Identifiers, it would give back their related Gene Ontology terms. I've
> accessed the GO database, but I can't find ENSG terms and I've read in the
> Ensembl website that they give the link to external databases for
> translation and transcript objects but not for genes (maybe in the future,
> but not now).
> My question is, do you know which database could I query in order to get
> this relation within Ensembl and GO terms?
> Thanks!
> Paco
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From theoriste at gmail.com  Tue Feb 12 11:44:34 2008
From: theoriste at gmail.com (DT)
Date: Tue, 12 Feb 2008 11:44:34 -0500
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<47B0CC02.8010206@umdnj.edu>
	<B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>
Message-ID: <6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com>

On Feb 11, 2008 6:56 PM, Theodore H. Smith <delete at elfdata.com> wrote:

>
> On 11 Feb 2008, at 22:28, Ryan Golhar wrote:
>
> > Why don't you write up a paper describing the algorithm in detail and
> > submit it to a bioinformatics journal?  And, why not make the
> > executable
> > available with documentation so that people can download it and try it
> > out for themselves.
> >
> > Do you have any test cases that show it runs faster/better than BLAST?
> > Describe them and make them available.
>
> The first thing I'd need to do is make a good test. I'm not sure what
> constitutes "a good test", in this case.


NR ALL VS ALL:  This will test speed and somehow test performance. The nr
database (non-redundant) from NCBI is a good place to start testing as a
template database. I'd use your algorithm all-against-all in nr. Test
against  BLAST and then use your algorithm for each entry in nr versus all
of nr, and then compare performance. You can generate a ROC plot for BLAST
vs your algorithm against a known set of homologs and distant homologs,
based on a p-value or significance level cutoff.

A real randomization test would be this to test sensitivity and specificity:
take known sequences in nr  -- all or some of them -- and scramble them by
'homologous recombination" -- create chimeras of known sequences  by
different randomization criteria  -- by domain (criteria based on domain
annotation)  or by individual sequence based on a known randomization
function, and then test the sensitivity and specificity of BLAST vs your
algorithm to detect the originating sequences that created the chimeras.

You will also need to check the performance of your algorithm against
nucleotide sequences. There are already test cases in BLAST for
mouse-vs-human, that would be a good test case.

Deanne Taylor


From theoriste at gmail.com  Tue Feb 12 11:46:03 2008
From: theoriste at gmail.com (DT)
Date: Tue, 12 Feb 2008 11:46:03 -0500
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<47B0CC02.8010206@umdnj.edu>
	<B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>
	<6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com>
Message-ID: <6e3504600802120846y38283248m1afd57dfe0df7937@mail.gmail.com>

By the way, nr is ftp-able from NCBI and is a protein-based database if you
didn't know.

On Feb 12, 2008 11:44 AM, DT <theoriste at gmail.com> wrote:

>
> On Feb 11, 2008 6:56 PM, Theodore H. Smith <delete at elfdata.com> wrote:
>
> >
> > On 11 Feb 2008, at 22:28, Ryan Golhar wrote:
> >
> > > Why don't you write up a paper describing the algorithm in detail and
> > > submit it to a bioinformatics journal?  And, why not make the
> > > executable
> > > available with documentation so that people can download it and try it
> > > out for themselves.
> > >
> > > Do you have any test cases that show it runs faster/better than BLAST?
> > > Describe them and make them available.
> >
> > The first thing I'd need to do is make a good test. I'm not sure what
> > constitutes "a good test", in this case.
>
>
>
> NR ALL VS ALL:  This will test speed and somehow test performance. The nr
> database (non-redundant) from NCBI is a good place to start testing as a
> template database. I'd use your algorithm all-against-all in nr. Test
> against  BLAST and then use your algorithm for each entry in nr versus all
> of nr, and then compare performance. You can generate a ROC plot for BLAST
> vs your algorithm against a known set of homologs and distant homologs,
> based on a p-value or significance level cutoff.
>
> A real randomization test would be this to test sensitivity and
> specificity: take known sequences in nr  -- all or some of them -- and
> scramble them by 'homologous recombination" -- create chimeras of known
> sequences  by different randomization criteria  -- by domain (criteria based
> on domain annotation)  or by individual sequence based on a known
> randomization function, and then test the sensitivity and specificity of
> BLAST vs your algorithm to detect the originating sequences that created the
> chimeras.
>
> You will also need to check the performance of your algorithm against
> nucleotide sequences. There are already test cases in BLAST for
> mouse-vs-human, that would be a good test case.
>
> Deanne Taylor
>
>
>


From theoriste at gmail.com  Tue Feb 12 11:49:02 2008
From: theoriste at gmail.com (DT)
Date: Tue, 12 Feb 2008 11:49:02 -0500
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <6e3504600802120846y38283248m1afd57dfe0df7937@mail.gmail.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<47B0CC02.8010206@umdnj.edu>
	<B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>
	<6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com>
	<6e3504600802120846y38283248m1afd57dfe0df7937@mail.gmail.com>
Message-ID: <6e3504600802120849y2f242f95t60e2f78e10953dc3@mail.gmail.com>

One more thing --

If you do a homologous recombination function, I would also include an
additional mutator function to mimic genetic drift -- it can be
sophisticated in allowing mutations vs the codon table and can be
distributed by a known function of percent drift/difference, so you can
adjust that and not only catch originating sequences by domains but also by
drift criteria.

D


On Feb 12, 2008 11:46 AM, DT <theoriste at gmail.com> wrote:

> By the way, nr is ftp-able from NCBI and is a protein-based database if
> you didn't know.
>
>
> On Feb 12, 2008 11:44 AM, DT <theoriste at gmail.com> wrote:
>
> >
> > On Feb 11, 2008 6:56 PM, Theodore H. Smith <delete at elfdata.com> wrote:
> >
> > >
> > > On 11 Feb 2008, at 22:28, Ryan Golhar wrote:
> > >
> > > > Why don't you write up a paper describing the algorithm in detail
> > > and
> > > > submit it to a bioinformatics journal?  And, why not make the
> > > > executable
> > > > available with documentation so that people can download it and try
> > > it
> > > > out for themselves.
> > > >
> > > > Do you have any test cases that show it runs faster/better than
> > > BLAST?
> > > > Describe them and make them available.
> > >
> > > The first thing I'd need to do is make a good test. I'm not sure what
> > > constitutes "a good test", in this case.
> >
> >
> >
> > NR ALL VS ALL:  This will test speed and somehow test performance. The
> > nr database (non-redundant) from NCBI is a good place to start testing as a
> > template database. I'd use your algorithm all-against-all in nr. Test
> > against  BLAST and then use your algorithm for each entry in nr versus all
> > of nr, and then compare performance. You can generate a ROC plot for BLAST
> > vs your algorithm against a known set of homologs and distant homologs,
> > based on a p-value or significance level cutoff.
> >
> > A real randomization test would be this to test sensitivity and
> > specificity: take known sequences in nr  -- all or some of them -- and
> > scramble them by 'homologous recombination" -- create chimeras of known
> > sequences  by different randomization criteria  -- by domain (criteria based
> > on domain annotation)  or by individual sequence based on a known
> > randomization function, and then test the sensitivity and specificity of
> > BLAST vs your algorithm to detect the originating sequences that created the
> > chimeras.
> >
> > You will also need to check the performance of your algorithm against
> > nucleotide sequences. There are already test cases in BLAST for
> > mouse-vs-human, that would be a good test case.
> >
> > Deanne Taylor
> >
> >
> >
>


From cupton at uvic.ca  Tue Feb 12 11:55:30 2008
From: cupton at uvic.ca (Chris Upton)
Date: Tue, 12 Feb 2008 08:55:30 -0800
Subject: [BiO BB] Looking for researcher,
	to assist on blast-like invention
In-Reply-To: <B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>
References: <FF1F53C3-0648-4F8A-AC70-6FE9B2B6F626@elfdata.com>
	<47B0CC02.8010206@umdnj.edu>
	<B84A0415-351D-49B8-9A83-506C870BCCA8@elfdata.com>
Message-ID: <9FBE25AE-D4BA-4A87-9863-C84F52E4D6AB@uvic.ca>

Hi,
We do a lot of searching of protein databases, searching for distant  
homologs.
If we send you protein sequences, can you search a protein database  
(NR)?

Chris
On Feb 11, 2008, at 3:56 PM, Theodore H. Smith wrote:

>
> On 11 Feb 2008, at 22:28, Ryan Golhar wrote:
>
>> Why don't you write up a paper describing the algorithm in detail and
>> submit it to a bioinformatics journal?  And, why not make the
>> executable
>> available with documentation so that people can download it and try  
>> it
>> out for themselves.
>>
>> Do you have any test cases that show it runs faster/better than  
>> BLAST?
>> Describe them and make them available.
>
> The first thing I'd need to do is make a good test. I'm not sure what
> constitutes "a good test", in this case.
>
> How big should the databanks be to make the test reasonable? Is
> randomly generated data good enough, or is a randomly selected sample
> better. If a sample is better, how large a dataset must I gather to do
> the test.
>
> Perhaps certain settings make my algorithm work better or worse
> relative to BLAST. But then how do I know which settings are more
> likely to be used and which aren't?
>
> I think someone who uses BLAST frequently, and knows it well from a
> user's perspective... might have a better feel for creating a test
> than I might.
>
> The worst thing that could happen is I make a test, which is unfairly
> prejudiced to my algorithm :) The next thing that would happen is
> people would see my test has "suspiciously good" results, and... be
> annoyed about that, and lose interest, even if it were an innocent
> mistake on my end. I'd rather avoid that sort of mistake by getting a
> knowledged eye in the designing of a test!
>
> Like I said, I haven't gotten all the code in C++ yet. I've got a
> framework in C++ already, I mean I know how to write C++. And I know
> what to do, as I've written it in a proto-typing language.
>
> The C++ version will come soon, though.
>
>> Theodore H. Smith wrote:
>>> Hi everyone,
>>>
>>> So I've been working, on and off, on this algorithm for quite a  
>>> while
>>> now. It's basically an invention of mine. It is a "blast-like"
>>> algorithm, in that it does "Fuzzy lookup" operations across a
>>> database
>>> of letters. I am designing this algorithm to be useful for bio-
>>> informatics, this is the main field I am initially targetting.
>>>
>>> The database will be filled with protein sequences, and the search
>>> across the database will be another protein sequence. The algorithm
>>> has a "scoring matrix", which can accept different protein
>>> replacement
>>> scores. The cost of inserting letters (protein letters) can be
>>> configured also.
>>>
>>> In this sense, it's no different to Smith-Waterman. The same input,
>>> the same output!
>>>
>>> The real difference from Smith-Waterman, is it's speed. My algorithm
>>> will be hugely faster. This is because I use many techniques to  
>>> avoid
>>> processing unnecessary parts of the Smith-Waterman matrix.
>>>
>>> I also use many tricks to reuse computations across various  
>>> proteins.
>>> For example, the matrix for protein "ABCDE", is identical, at first
>>> anyhow, for the matrix for "ABCDEFG". This means if I have both
>>> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test
>>> both of them against the search query, in almost half the time. My
>>> algorithm also runs in logarithmic-time with respect to the size of
>>> the database. Basically, bigger databases run disproportionately
>>> faster.
>>>
>>> I want to turn this algorithm, into something useful for people. My
>>> first challenge here, is to answer the question "is this algorithm
>>> faster, or better than BLAST". If it is not faster, my algorithm
>>> basically has little use. But I have good hopes it will be faster! I
>>> am very good with these sort of things, you see :) Speed is my
>>> strong-
>>> point.
>>>
>>> Currently, I do not know about the speed, because I haven't
>>> implemented a C++ version of my algorithm, or a good speed testing
>>> framework.
>>>
>>> I do however know that my algorithm is more accurate than BLAST,
>>> because it is just as accurate as SSEARCH, as mine uses the Smith-
>>> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent
>>> guess-
>>> work basically. A fine heuristic, but still a heuristic. Mine is
>>> methodological, not heuristic based.
>>>
>>> So here is what I am looking for!
>>>
>>> I am hoping, that someone in the field will be able to offer me
>>> guidance, interest, enthusiasm, suggestions and maybe even do some
>>> testing for me.
>>>
>>> Perhaps a student doing a bio-informatics related degree, who would
>>> like to write a paper on an alternative way of processing protein
>>> databases. My invention could be an interesting subject for a paper.
>>>
>>> Or perhaps a researcher who just has an interest in these sort of
>>> things! Perhaps a researcher who feels there must be a better way of
>>> doing these things. Or anyone really in this field with the time and
>>> interest, and feels helping me could help him (or her) too in some
>>> way.
>>>
>>> I'd like someone I can ask a lot of questions to, and show my
>>> software
>>> to, and explain my hopes what I can achieve with it.
>>>
>>> Basically, my first question to you, would be "how would I set this
>>> up
>>> to be useful for someone", and "how would I test it's usefulness,
>>> what
>>> would you need to know about my algorithm that you would decide to
>>> use
>>> it over blast"
>>>
>>> It's sort of a vague question from me, like "what do you need me to
>>> do", but... well that's where I am right now. Sort of a bit on the
>>> outside hoping someone on the inside will show me something.
>>>
>>> So it's an opportunity to tell me what you want, basically!! Tell  
>>> me,
>>> and I might just make it.
>>>
>>> Who knows? Maybe one day in a few years time, everyone will be using
>>> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You
>>> might be part of something.
>>>
>>> Thanks to anyone who replies!
>>>
>>> --
>>> http://elfdata.com/plugin/
>>> "String processing, done right"
>>>
>>>
>>>
>>> _______________________________________________
>>> BBB mailing list
>>> BBB at bioinformatics.org
>>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>>
>>>
>>
>>
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>
> --
> http://elfdata.com/plugin/
> "String processing, done right"
>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb


Chris Upton Ph.D.                                   Associate Professor
Biochemistry and Microbiology             Tel. 250-721-6507
University of Victoria                                Fax  250-721-8855
P.O. Box 3055 STN CSC
Victoria, BC  V8W 3P6
Canada

web.uvic.ca/~cupton
www.virology.ca
www.biodirectory.com/uptons_blog.html


From sdua at coes.latech.edu  Wed Feb 13 11:01:46 2008
From: sdua at coes.latech.edu (Sumeet Dua)
Date: Wed, 13 Feb 2008 10:01:46 -0600
Subject: [BiO BB] CfP: IEEE-CIBCB08 Special Session on Data Mining for
	Bioinformatics.
Message-ID: <52362CF6-4B26-481E-AF86-FD97BF4A4EE2@coes.latech.edu>

Apologies for any duplicate transmissions.

---------
Call for Papers:  2008 IEEE Symposium on Computational Intelligence in  
Bioinformatics and Computational Biology (IEEE-CIBCB08)
September 15-17, 2008, Sun Valley, Idaho, USA.
Sponsored by IEEE and the IEEE Computational Intelligence Society

Special Session on: Data Mining for Bioinformatics

The IEEE Symposium on Computational Intelligence in Bioinformatics and  
Computational Biology 2008 special session on Data Mining for  
Bioinformatics will focus on novel results in the topics of  
computational intelligence as they are employed for data mining  
applications in bioinformatics. The computational intelligence (CI)  
areas of interest include Genetic Algorithms, Neural Computation,  
Fuzzy systems, Hidden Markov models, Rough Set Theory, Support Vector  
Machines, Chaos Theory, Simulated Annealing, Bayesian Framework,  
Probabilistic models, Statistical models and other emerging  
Evolutionary Computing techniques. We are specifically interested in  
topics of CI that have been developed to address data mining  
challenges in several important problems of bioinformatics, including  
but not limited to:

?       Supervised and unsupervised methods for Microarray data analysis
?       Knowledge modeling in biomedical databases
?       Feature selection in biological data
?       Medical imaging and pattern recognition
?       Metabolic pathway analysis and Gene regulatory network modeling
?       Motif and pattern discovery
?       Protein, Enzyme, and RNA structure prediction and folding
?       Evolutionary Computing for Bioinformatics
?       Molecular sequence alignment and analysis
?       Fuzzy Modeling for gene sequence analysis
?       Cell simulation and modeling
?       Molecular computing
?       Bayesian Frameworks for Microarray data analysis
?       Ontologies and taxonomies
?       Neural Network Model for Genome sequence analysis
?       Wavelets based data mining
?       Fuzzy modeling in Bioinformatics
?       Information and data visualization
?       Automated text categorization and authority determination
?       Dimensionality reduction in bioinformatics

Paper Submission: Prospective authors are invited to submit papers of  
no more than eight (8) pages IEEE conference style papers including  
results, figures and references; submission details can be found on  
the symposium web site: www.cibcb.org Accepted papers will appear in  
the proceedings and indexed by IEEE Xplore.

Important Dates:	Paper submission deadline: March 31, 2008
                        	Author notification: April 30, 2008
                        	Camera-ready paper deadline: June 15, 2008
                        	Conference: September 15-17, 2008

Special session webpage: http://www.latech.edu/~sdua/CIBCB08-SS/

For further information regarding the special session please contact  
the Session Chair:
				Sumeet Dua, Ph.D.
				Upchurch Associate Professor, Coordinator of IT Research
				Department of Computer Science, Louisiana Tech University, LA, USA
				E-mail: sdua at coes.latech.edu; Phone: 318-257-2830

For further information regarding CIBCB-2008 please contact the  
Symposium Chair or the Program Chair:

Symposium chair: Scott Smith
Program chair: Gwenn Volkert
Technical Co-Chairs: Kay C. Wiese, Madhu Chetty, Elena Marchiori
Finance Chair: Gary B. Fogel
Publicity Chair: Lutz Hamel
Regional Publicity Chairs: Joshua Knowles - Europe, P.N. Suganthan -  
Asia
Proceedings Chair: Clare Bates Congdon
Special Sessions Chair: Jennifer Hallinan
Tutorials Chair: Dan Ashlock
Web Chair: Wendy Ashlock

---------


From mleczny at gmail.com  Thu Feb 14 03:00:52 2008
From: mleczny at gmail.com (Paco B C)
Date: Thu, 14 Feb 2008 09:00:52 +0100
Subject: [BiO BB] Ensembl and Gene Ontology terms
In-Reply-To: <8adccabf0802120815y52942ca0ia0bbdd60cdccc627@mail.gmail.com>
References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com>
	<23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com>
	<8adccabf0802120815y52942ca0ia0bbdd60cdccc627@mail.gmail.com>
Message-ID: <604858190802140000w1163e5f6md214e95b059f2c57@mail.gmail.com>

Ey, DAS protocol looks very interesting for what I want to do.
Thanks a lot to all of you!
Paco

2008/2/12 Charles Danko <dankoc at gmail.com>:

> Hi, Paco,
>
> If you want to do it all programmically from Java, I would suggest
> Googling a protocol called DAS.  ENSEMBL has a DAS server from which
> you should be able to pull GO annotations for ENSEMBL IDs quite
> easily.
>
> You can find java-based libraries to access a DAS connection, and
> parse the resulting information here:
> http://www.spice-3d.org/dasobert/.
>
> Good luck!
> Charles
>
> 2008/2/12 ocean <oceanhu at 126.com>:
> >  Hey,Paco
> >
> > i think you can try BIOMART(www.biomart.org).  this database had done
> such thing already.
> > you can  contract them for some help.
> >
> > good luck!
> >
> > Huhaiyang
> >
> >
> >
> >
> > ?2008-02-12?"Paco B C" <mleczny at gmail.com> ???
> >
> >
> > Hi!
> > this is my first message in this list. My name is Paco and I'm doing my
> PhD.
> > on Bioinformatics in University of Leuven, Belgium.
> > I would like to build a java module that, given a list of Ensembl Gene
> > Identifiers, it would give back their related Gene Ontology terms. I've
> > accessed the GO database, but I can't find ENSG terms and I've read in
> the
> > Ensembl website that they give the link to external databases for
> > translation and transcript objects but not for genes (maybe in the
> future,
> > but not now).
> > My question is, do you know which database could I query in order to get
> > this relation within Ensembl and GO terms?
> > Thanks!
> > Paco
> > _______________________________________________
> > BBB mailing list
> > BBB at bioinformatics.org
> > http://www.bioinformatics.org/mailman/listinfo/bbb
> > _______________________________________________
> > BBB mailing list
> > BBB at bioinformatics.org
> > http://www.bioinformatics.org/mailman/listinfo/bbb
> >
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From dan.bolser at gmail.com  Sat Feb 16 05:46:03 2008
From: dan.bolser at gmail.com (Dan Bolser)
Date: Sat, 16 Feb 2008 11:46:03 +0100
Subject: [BiO BB] List of mailing lists?
Message-ID: <2c8757af0802160246r56fa71e2u4e4a7f879d3e2c0e@mail.gmail.com>

Hi,

Together with some friends we started to put together this page;

List of mailing lists for biologists;

http://biodatabase.org/index.php/List_of_mailing_lists_for_biologists


I am not sure if there is a better place for this kind of project
somewhere within the Bioinformatics.Org set of sites, which I know
maintains several related projects, or even if such lists already
exist elsewhere on the internet.

Sorry if I sent this before... It was hanging around in my drafts folder...

Can anyone point me towards related resources to integrate or transfer
effort to?

In the mean time please feel free to edit, add or update the existing
(rather short) list of mailing lists on that page.

Cheers,
Dan.

----

Talk to the experts?
irc://freenode.net/#bioinformatics


From marchywka at hotmail.com  Mon Feb 18 11:12:10 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Mon, 18 Feb 2008 11:12:10 -0500
Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool
	testing
In-Reply-To: <b5bbbc970712211700l6a0ebd92wbd666df551a23ac1@mail.gmail.com>
References: <b5bbbc970712211700l6a0ebd92wbd666df551a23ac1@mail.gmail.com>
Message-ID: <BAY108-W32880870920C13BE056F6BBE210@phx.gbl>


Hi,
As I mentioned in previous posts, I'm using the drosophila DSCAM genes for testing some tools.
I assembled a fasta file composed of 3 fly entries,

$ cat all_fasta | grep ">"
>AF260530  Drosophila melanogaster Dscam gene, complete cds.
>DQ317106  Drosophila yakuba Dscam gene, exons 3 through 24.
>DQ317109  Drosophila pseudoobscura Dscam gene, exons 3 through 24.

and tried aligning them with clustalw but minutes later still didn't have a result. I was wondering if
someone could suggest a set of parameters or alternative alignment tool to do a fast
alignment, even if a bit sloppy. I had always used to slow/accurate approach and don't
know what options may be available for faster work- these sequences are each about 50k long.


In the meantime, I was able to get a satisfactory result using exact string matches using successively
shorter and shorter strings. This approach yields acceptable results in under a minute and, if needed, you
could segment the questionable areas and feed them to clustal or other tool for "better" alignment.
It seems to be fast due to only comparing sequences to a reference sequence ( O(n*l^2) but "l" can be smaller
than sequence length as unique features can be found O(l*log(l))  ) .  There are, of course, likely to
be various pathological cases but for sequences known to be similar it seems to work ok and the indexing
feature allows extraction of substrings with particular distributions ( occuring only once in each sample for example).
I have aligned 2 ecoli strains in perhaps a few minutes and there weren't any obvious pathological
results ( I obviously didn't check the whole thing either by eye or programatically). 

Others have asked about testing method, I'd like to show how I'm going about this with the DSCAM example.
The alignment is only one part of more general interest in finding similar/different features between samples.
These sequences, it turns out, have exon locations in the ncbi entries. So, it was pretty easy to check the alignments
by examining the locations of the exons in the aligned composite. In this case, I aligned as follows,

$ string_test -fastas all_fasta -index 8 -length 25 -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchor_hits

and could feed the "anchor_hits" locations to an aligner that could start with these ( actually, it can now start on its own
but that is a developmental issue ) and refine subsections ( or spot gross transpositions perhaps, would have to check),

$ mm_align_tool -fastas all_fasta -pair_rules anchor_hits -ref 0 -all_samples -refine marlow -pair_params uniq -pair_align monotonize -doall -dest algned_fasta -output fasta

The aligned sequences can be output in several formats and the "algned_fasta" file can be presented along with various
rules or annotations using another tool to create bmp, html, or txt files :
( right now it requires some input parameters, hence the dummy echo)

716  echo | $progpath/annotater -source manno.src> testcomp5.html
464  echo | $progpath/annotater -source bmp.src
585  echo |$progpath/annotater -source text.src>text.txt 2>junk

where the ".src" file just contains command line parameters,
$ cat text.src
-width 120 -font $progpath/4x6-KOI8.pcf
 -mrules fixed_exons2
-merge_rules comp_5_hitss
-font /cygdrive/c/mydocs/scripts/cc/affx//4x6-KOI8-R.pcf
-acid_rank /cygdrive/c/mydocs/scripts/cc/affx//nmstrings
-acid_map 20
-xlate -inter -banner
-annotate algned_fasta

So, I could first look at a different alignment metric by outputting a table of correspondences between input and aligned locations,

  602  $progpath/mm_align_tool -fastas all_fasta -pair_rules anchor_hits -ref 0 -all_samples -refine marlow -pair_params uniq -pair_align monotonize -doall -dest xxxx_raw  -output table

and using the table to move "absolute location" rules to their location in the  aligned sequence:

 $table_tool -v -table table_table  -table_rules ncbi_exons  | sed -e 's/exon/exon /g'| sed -e 's/[.:]/ /g' | sort -k 3 -g | more

This  generates a cryptic feature comparison map which shows that most of the exons
end up in the same location on each sequence but see the publication below,
even the differently named exons were aligned from different species in most cases
( these exon rules are followed by {sequence number, aligned position, offset from first entry } ):
>exon |1|exon 3 {2,16121,0}{3,16121,0}{4,16157,36}
>exon |1|exon 4 1 {2,18582,0}{3,18582,0}{4,18582,0}
>exon |1|exon 4 10 {2,22532,0}{3,22532,0}{4,22532,0}
>exon |1|exon 4 11 {2,22836,0}{3,22836,0}{4,22845,9}
>exon |1|exon 4 12 {2,23217,0}{3,23217,0}{4,23217,0}
>exon |1|exon 4 2 {2,19006,0}{3,19006,0}{4,19006,0}
>exon |1|exon 4 3 {2,19736,0}{3,19736,0}{4,19736,0}
>exon |1|exon 4 4 {2,20545,0}{3,20545,0}{4,20545,0}
>exon |1|exon 4 5 {2,20872,0}{3,20872,0}{4,20872,0}
>exon |1|exon 4 6 {2,21269,0}{3,21269,0}{4,21269,0}
>exon |1|exon 4 7 {2,21597,0}{3,21597,0}{4,21597,0}
>exon |1|exon 4 8 {2,21895,0}{3,21895,0}{4,21895,0}
>exon |1|exon 4 9 {2,22229,0}{3,22229,0}{4,22212,-17}
>exon |1|exon 5 {2,25020,0}{3,25020,0}{4,25020,0}
>exon |1|exon 6 1 {2,27251,0}{3,27249,-2}{4,27251,0}
>exon |1|exon 6 10 {2,29659,0}{3,29826,167}{4,29451,-208}
>exon |1|exon 6 11 {2,29845,0}{3,30074,229}{4,29640,-205}
>exon |1|exon 6 12 {2,30074,0}{3,30614,540}{4,30114,40}
>exon |1|exon 6 13 {2,30614,0}{3,31054,440}{4,30438,-176}
>exon |1|exon 6 14 {2,30831,0}{3,31255,424}{4,30614,-217}
>exon |1|exon 6 15 {2,31054,0}{3,31475,421}{4,30831,-223}
>exon |1|exon 6 16 {2,31255,0}{3,31684,429}{4,31054,-201}
>exon |1|exon 6 17 {2,31475,0}{3,32161,686}{4,32160,685}
>exon |1|exon 6 18 {2,31688,0}{3,32437,749}{4,32376,688}
>exon |1|exon 6 19 {2,31926,0}{3,33251,1325}{4,32590,664}
>exon |1|exon 6 2 {2,27524,0}{3,27524,0}{4,27497,-27}
...etc...
>exon |1|exon 14 {2,66424,0}{3,66424,0}{4,66424,0}
>exon |1|exon 15 {2,66631,0}{3,66631,0}{4,66631,0}
>exon |1|exon 16 {2,66884,0}{3,66884,0}{4,66884,0}
>exon |1|exon 17 1 {2,70469,0}{3,70469,0}{4,70469,0}
>exon |1|exon 17 2 {2,71004,0}{3,71004,0}{4,71004,0}
>exon |1|exon 18 {2,72995,0}{3,72995,0}{4,72995,0}
>exon |1|exon 19 {2,74100,0}{3,74100,0}{4,74063,-37}
>exon |1|exon 20 {2,74948,0}{3,74948,0}{4,74948,0}
>exon |1|exon 21 {2,75334,0}{3,75334,0}{4,75334,0}
>exon |1|exon 22 {2,75594,0}{3,75594,0}{4,75594,0}
>exon |1|exon 23 {2,76979,0}{3,76979,0}{4,76997,18}
>exon |1|exon 24 {2,78233,0}{3,78233,0}{4,78233,0}

that can be reconciled with known inter-species exon similarities, for example 

http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1431710&blobtype=pdf

It turns out that the exon3 offset for sequence 4 is probably due to a rule issue, not an alignment issue
( excerpt from the test alignment map including aligner diagnostics such as '{' in fasta file and 
likely translation products in all 3 fwd frames ) as the alignment in this high-similarity region appears good:

New Section : 16080 to 16200 section 134 of 669
8         9         0         1         2         3         4         5         6         7         8         9
>>AF260530  Drosophila melanogaster Dscam gene, complete cds.
.....................................}AGCTTGTGGTAGTCAGACCCTAGCTGCCAATCCCCCAGATGCCGACCAAAAAGGACCCGTCTTCCTCAAGGAACCCACCAAC
 **************************************SALLCVWGV*SVSQRDTPPL*SALCAPQNISPPPPQRDMCAPRDTPQKKKKRGDTPPRVSLFSPLSQKRGENTPPHTPQN
>>AF260530  Drosophila melanogaster Dscam gene, complete cds.
.....................................}AGCTTGTGGTAGTCAGACCCTAGCTGCCAATCCCCCAGATGCCGACCAAAAAGGACCCGTCTTCCTCAAGGAACCCACCAAC
 **************************************SALLCVWGV*SVSQRDTPPL*SALCAPQNISPPPPQRDMCAPRDTPQKKKKRGDTPPRVSLFSPLSQKRGENTPPHTPQN
                                         +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 >exon|1|exon3 XXX
>>DQ317106  Drosophila yakuba Dscam gene, exons 3 through 24.
TATTTACTAATTGGCGGCGTTGTTCTTGTTTCATTTC}AGCTGGTGGTAGTCAGACCCTGGCTGCCAATCCCCCCGATGCCGACCAAAAAGGACCCGTCTTTCTCAAGGAACCCACCAAC
 YIFLYTL*NILWGARGARVLCVFSLLCVFFSHIFF***SALWGVWGV*SVSQRDTPPLWGALCAPQNISPPPPPRDMCAPRDTPQKKKKRGDTPPRVSLFFSLSQKRGENTPPHTPQN
                                         +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 >exon|1|exon3 XXX
>>DQ317109  Drosophila pseudoobscura Dscam gene, exons 3 through 24.
??????????????????????????????????????AGCTTGTGGCAGTCAGACTTTGGCTGCCAATCCACCAGATGCCGACCAGAAGGGACCCGTCTTCCTCAAAGAGCCCACCAAC
 **************************************SALLCVWGAQSVSQRDTLFLWGALCAPQNISPHTPQRDMCAPRDTPQREKRGGDTPPRVSLFSPLSQKKRESAPPHTPQN
                                                                             +++++++++++++++++++++++++++++++++++++++++++
 >exon|1|exon3 XXX


I'm aware of the following related alignment literature, open to ideas:

$ string_test -about|unix2dos >/dev/clipboard

Contact: marchywka at hotmail.com Nov 2007
Comment: uses some indexing to get speed up, 
Comment: motivation for RC rules from this etc , 
Ref:http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1431710
Commment: and should work well on text or (modified slightly ) binary code too
Note: More code in mm_align_tool
Note: Based loosely on references such as these but 'common sense'
Note: seemed to work well as these are after-the-fact lookups
Ref: http://www.google.com/search?hl=en&safe=off&q=string+alignment+site%3Aciteseer.ist.psu.edu 
Ref: http://citeseer.ist.psu.edu/csuros05rapid.html
Comment: Csuros, M., Ma, B.: Rapid homology search with two-stage extension and
Comment: daughter seeds. In: Proc. 11th Int. Computing and Combinatorics Conf. (COCOON).
Comment: Volume 3595 of LNCS., Springer-Verlag (2005) 104-- 114
Ref: http://citeseer.ist.psu.edu/468459.html
Ref: http://citeseer.ist.psu.edu/kahveci04speeding.html
Feb  2 2008 09:35:40 string_test.h182


Thanks.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

>
_________________________________________________________________
Need to know the score, the latest news, or you need your Hotmail?-get your "fix".
http://www.msnmobilefix.com/Default.aspx


From larye at info-engineering-svc.com  Mon Feb 18 16:47:00 2008
From: larye at info-engineering-svc.com (Larye Parkins)
Date: Mon, 18 Feb 2008 14:47:00 -0700
Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for
 tool	testing
In-Reply-To: <BAY108-W32880870920C13BE056F6BBE210@phx.gbl>
References: <b5bbbc970712211700l6a0ebd92wbd666df551a23ac1@mail.gmail.com>
	<BAY108-W32880870920C13BE056F6BBE210@phx.gbl>
Message-ID: <47B9FCD4.1030002@info-engineering-svc.com>

Mike Marchywka wrote:
> Hi,
> As I mentioned in previous posts, I'm using the drosophila DSCAM genes for testing some tools.
> I assembled a fasta file composed of 3 fly entries,
> 
> $ cat all_fasta | grep ">"
> 
>>AF260530  Drosophila melanogaster Dscam gene, complete cds.
>>DQ317106  Drosophila yakuba Dscam gene, exons 3 through 24.
>>DQ317109  Drosophila pseudoobscura Dscam gene, exons 3 through 24.
> 
> 
> and tried aligning them with clustalw but minutes later still didn't have a result. I was wondering if
> someone could suggest a set of parameters or alternative alignment tool to do a fast
> alignment, even if a bit sloppy. I had always used to slow/accurate approach and don't
> know what options may be available for faster work- these sequences are each about 50k long.
> 

We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid 
alignments of whole genomes, genomes and contigs, and searching for 
repeats and inverted repeats in multiple sequences.  MUMmer is very fast 
and has nucleotide and translated protein modes, as well as scatterplot 
graphical output, so is very good for finding regions of high identity 
in large sequences and graphically highlighting areas of interest.
> 
> In the meantime, I was able to get a satisfactory result using exact string matches using successively
> shorter and shorter strings. This approach yields acceptable results in under a minute and, if needed, you
> could segment the questionable areas and feed them to clustal or other tool for "better" alignment.
> It seems to be fast due to only comparing sequences to a reference sequence ( O(n*l^2) but "l" can be smaller
> than sequence length as unique features can be found O(l*log(l))  ) .  There are, of course, likely to
> be various pathological cases but for sequences known to be similar it seems to work ok and the indexing
> feature allows extraction of substrings with particular distributions ( occuring only once in each sample for example).
> I have aligned 2 ecoli strains in perhaps a few minutes and there weren't any obvious pathological
> results ( I obviously didn't check the whole thing either by eye or programatically). 
> 
> Others have asked about testing method, I'd like to show how I'm going about this with the DSCAM example.
> The alignment is only one part of more general interest in finding similar/different features between samples.
> These sequences, it turns out, have exon locations in the ncbi entries. So, it was pretty easy to check the alignments
> by examining the locations of the exons in the aligned composite. In this case, I aligned as follows,
> 
...
> I'm aware of the following related alignment literature, open to ideas:
> 
> $ string_test -about|unix2dos >/dev/clipboard
> 
> Contact: marchywka at hotmail.com Nov 2007
> Comment: uses some indexing to get speed up, 
> Comment: motivation for RC rules from this etc , 
> Ref:http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1431710
> Commment: and should work well on text or (modified slightly ) binary code too
> Note: More code in mm_align_tool
> Note: Based loosely on references such as these but 'common sense'
> Note: seemed to work well as these are after-the-fact lookups
> Ref: http://www.google.com/search?hl=en&safe=off&q=string+alignment+site%3Aciteseer.ist.psu.edu 
> Ref: http://citeseer.ist.psu.edu/csuros05rapid.html
> Comment: Csuros, M., Ma, B.: Rapid homology search with two-stage extension and
> Comment: daughter seeds. In: Proc. 11th Int. Computing and Combinatorics Conf. (COCOON).
> Comment: Volume 3595 of LNCS., Springer-Verlag (2005) 104-- 114
> Ref: http://citeseer.ist.psu.edu/468459.html
> Ref: http://citeseer.ist.psu.edu/kahveci04speeding.html
> Feb  2 2008 09:35:40 string_test.h182
> 
> 
> 
> 
> 
> Thanks.
> 
> 
> 
> 
> Mike Marchywka
> 586 Saint James Walk
> Marietta GA 30067-7165
> 404-788-1216 (C)<- leave message
> 989-348-4796 (P)<- emergency only
> marchywka at hotmail.com
> Note: Hotmail is blocking my mom's entire
> ISP claiming it is to reduce spam but probably
> to force users to use hotmail. Please DON'T
> assume I am ignoring you and try
> me on marchywka at yahoo.com if no reply
> here. Thanks.
> 
> 
> _________________________________________________________________
> Need to know the score, the latest news, or you need your Hotmail?-get your "fix".
> http://www.msnmobilefix.com/Default.aspx
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
> 
> 


-- 
--
Larye D. Parkins
Information Engineering Services
PMB 435, 610 N. 1st St., Ste 5
Hamilton, MT 59840
http://www.info-engineering-svc.com

Making IT work since 1965.
Member of: ACM, IEEE Computer Society, USENIX, SAGE, LOPSA


From landman at scalableinformatics.com  Mon Feb 18 19:27:13 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 18 Feb 2008 19:27:13 -0500
Subject: [BiO BB] MPI-HMMER mercurial repo now public
Message-ID: <47BA2261.2010902@scalableinformatics.com>

[forwarded]

The MPI-HMMER mercurial repository has been made publicly viewable.  In
addition, users may now download the most up-to-date snapshot of the
MPI-HMMER source code through the "Releases" link available on the
website.  The snapshot is generated each time a commit is made to the
mercurial repository.  Current updates include support for MPICH2 and a
couple of small memory leaks have been plugged.

JP

[ed

	uri:   		http://mpihmmer.org
	repository:	http://mpihmmer.org/hg
]
_______________________________________________
Mpihmmer mailing list
Mpihmmer at mail.scalableinformatics.com
http://lists.scalableinformatics.com/mailman/listinfo/mpihmmer

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From marchywka at hotmail.com  Tue Feb 19 11:51:39 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 19 Feb 2008 11:51:39 -0500
Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for
 tool	testing
In-Reply-To: <47B9FCD4.1030002@info-engineering-svc.com>
References: <b5bbbc970712211700l6a0ebd92wbd666df551a23ac1@mail.gmail.com>
	<BAY108-W32880870920C13BE056F6BBE210@phx.gbl> 
	<47B9FCD4.1030002@info-engineering-svc.com>
Message-ID: <BAY108-W279EC90DA6756B20F2F34BBE220@phx.gbl>


> We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid
> alignments of whole genomes, genomes and contigs, and searching for

Thanks- that looks like a good tool that I didn't know about. I noticed they advertize e coli results
prompting me to go back and check my own. I'd have to go check the suffix tree literature
to see what exactly they claim to do in 17 seconds on e coli, but under cygwin, I was able to 
index all matching strings of length 25 or more, in about 67 seconds ,

$ date;$progpath/string_test -fastas both_fasta -index 8 -length 25 -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchors ;date
Sat Nov 10 18:45:23 EST 2007
string_test.cpp177 loaded 2 fastas
Sat Nov 10 18:46:30 EST 2007


and create a coarse alignment  in another 25 seconds,

$ date; $progpath/mm_align_tool -fastas both_fasta -v -pair_rules anchors  -doall -pair_align 0 -output text> align1 ;date
Sat Nov 10 18:50:01 EST 2007
mm_hit_classes.h389
annotation_model.h57 Loaded 33373 pair rules.
mm_align_tool.cpp309 Doing string PAIR align with cutoff 3
mm_align_tool.h227 do_all with only one rule, did you mean -mrules?
mm_align_tool.cpp318 doing 0 vs 1
mm_align_tool.cpp326 do hit dump rules
Sat Nov 10 18:50:26 EST 2007


Do you have actual timing tests for various complete tasks or is 17 seconds about it? 
So, ok 67+25=92 seconds is not real impressive compared to 17, and I'm not sure how
much I can blame cygwin for this :) I guess once I'm sure I have a useful algorithm,
I can subtract IO time which has been significant in many cases.
Someone also privately suggested blast's bl2seq and I would point out that this is quite fast on pairs
of 50k sequences.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.


_________________________________________________________________
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/


From marchywka at hotmail.com  Tue Feb 19 15:12:09 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 19 Feb 2008 15:12:09 -0500
Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for
 tool	testing
In-Reply-To: <BAY108-W279EC90DA6756B20F2F34BBE220@phx.gbl>
References: <b5bbbc970712211700l6a0ebd92wbd666df551a23ac1@mail.gmail.com>
	<BAY108-W32880870920C13BE056F6BBE210@phx.gbl> 
	<47B9FCD4.1030002@info-engineering-svc.com> 
	<BAY108-W279EC90DA6756B20F2F34BBE220@phx.gbl>
Message-ID: <BAY108-W3026396593F82B30C18B42BE220@phx.gbl>


> So, ok 67+25=92 seconds is not real impressive compared to 17, and I'm not sure how
> much I can blame cygwin for this :) I guess once I'm sure I have a useful algorithm,
> I can subtract IO time which has been significant in many cases.

I wasn't going to bother to look given the time differences are> 4x but I did note they tested on
a 3Ghz Pentium 4 and I have something that comes up as "x86 Family 6 Model 8 Stepping 3"
which is probably ca. 1 Ghz ( I never bothered to check since I thought a 2-3x factor wasn't
important). I guess by the time you subtract IO it may be pretty close. It would
be hard to blame cygwin for the computational time however :)


> From: marchywka at hotmail.com
> To: bbb at bioinformatics.org; larye at info-engineering-svc.com
> Date: Tue, 19 Feb 2008 11:51:39 -0500
> Subject: Re: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing
>
>
>> We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid
>> alignments of whole genomes, genomes and contigs, and searching for
>
> Thanks- that looks like a good tool that I didn't know about. I noticed they advertize e coli results
> prompting me to go back and check my own. I'd have to go check the suffix tree literature
> to see what exactly they claim to do in 17 seconds on e coli, but under cygwin, I was able to
> index all matching strings of length 25 or more, in about 67 seconds ,
>
> $ date;$progpath/string_test -fastas both_fasta -index 8 -length 25 -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchors ;date
> Sat Nov 10 18:45:23 EST 2007
> string_test.cpp177 loaded 2 fastas
> Sat Nov 10 18:46:30 EST 2007
>
>
> and create a coarse alignment in another 25 seconds,
>
> $ date; $progpath/mm_align_tool -fastas both_fasta -v -pair_rules anchors -doall -pair_align 0 -output text> align1 ;date
> Sat Nov 10 18:50:01 EST 2007
> mm_hit_classes.h389
> annotation_model.h57 Loaded 33373 pair rules.
> mm_align_tool.cpp309 Doing string PAIR align with cutoff 3
> mm_align_tool.h227 do_all with only one rule, did you mean -mrules?
> mm_align_tool.cpp318 doing 0 vs 1
> mm_align_tool.cpp326 do hit dump rules
> Sat Nov 10 18:50:26 EST 2007
>
>
> Do you have actual timing tests for various complete tasks or is 17 seconds about it?
> So, ok 67+25=92 seconds is not real impressive compared to 17, and I'm not sure how
> much I can blame cygwin for this :) I guess once I'm sure I have a useful algorithm,
> I can subtract IO time which has been significant in many cases.
> Someone also privately suggested blast's bl2seq and I would point out that this is quite fast on pairs
> of 50k sequences.
>
>
>
>
> Mike Marchywka
> 586 Saint James Walk
> Marietta GA 30067-7165
> 404-788-1216 (C)<- leave message
> 989-348-4796 (P)<- emergency only
> marchywka at hotmail.com
> Note: Hotmail is blocking my mom's entire
> ISP claiming it is to reduce spam but probably
> to force users to use hotmail. Please DON'T
> assume I am ignoring you and try
> me on marchywka at yahoo.com if no reply
> here. Thanks.
>
>
> _________________________________________________________________
> Shed those extra pounds with MSN and The Biggest Loser!
> http://biggestloser.msn.com/
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Helping your favorite cause is as easy as instant messaging.?You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join


From Sterten at aol.com  Tue Feb 19 11:11:38 2008
From: Sterten at aol.com (Sterten at aol.com)
Date: Tue, 19 Feb 2008 11:11:38 EST
Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for
	tool testing
Message-ID: <c14.2fa96ae4.34ec59ba@aol.com>

I recommend this for alignment:
 
_http://align.bmr.kyushu-u.ac.jp/mafft/online/server/_ 
(http://align.bmr.kyushu-u.ac.jp/mafft/online/server/) 
 

From ethan.strauss at promega.com  Tue Feb 19 11:29:25 2008
From: ethan.strauss at promega.com (Ethan Strauss)
Date: Tue, 19 Feb 2008 10:29:25 -0600
Subject: [BiO BB] database search/alignment with ciruclar molecules?
In-Reply-To: <47B9FCD4.1030002@info-engineering-svc.com>
References: <b5bbbc970712211700l6a0ebd92wbd666df551a23ac1@mail.gmail.com><BAY108-W32880870920C13BE056F6BBE210@phx.gbl>
	<47B9FCD4.1030002@info-engineering-svc.com>
Message-ID: <D8D8119118899D4A8EB5AD9BD24C1932034DD6B2@MADMSG003.promega.com>

Hi, 
	I have created a database which holds plasmid sequences and I am
running into an issue with doing database similarity searches due to the
fact that the molecules are circular. Right now, I am using a brute
force approach where I pull each plasmid out of the database and perform
a Smith Watermann alignment to it to find similar sequences. I plan to
go to BLAST sometime soon. Anyway, I am having problems due to the fact
that the molecules involved are circular, but the alignment treats them
as linear. For Smith Watermann, I know I can deal with circularity by
treating the sequences as dimers, but I don't know if that is the best
way to approach it and it will make something which is incredibly slow
incredibly slower! I would appreciate any thoughts on this issue. 
Thanks!
Ethan

Ethan Strauss Ph.D.
Bioinformatics Scientist
Promega Corporation
2800 Woods Hollow Rd.
Madison, WI 53711
608-274-4330
800-356-9526
ethan.strauss at promega.com


From larye at info-engineering-svc.com  Tue Feb 19 13:46:48 2008
From: larye at info-engineering-svc.com (Larye Parkins)
Date: Tue, 19 Feb 2008 11:46:48 -0700 (MST)
Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for
 tool	testing
In-Reply-To: <BAY108-W279EC90DA6756B20F2F34BBE220@phx.gbl>
Message-ID: <Pine.GSO.4.10.10802191051370.7491-100000@xavier>


On Tue, 19 Feb 2008, Mike Marchywka wrote:

...    
> Do you have actual timing tests for various complete tasks or is 17
> seconds about it?  

Aligning 172 sequences totaling 1.2MB with each other (average ~7000 bases
each, longest 22,830 bases):

real    3m17.458s
user    3m2.385s
sys     0m9.294s

MUMmer output generated 693 alignment files, out of the possible 29584
combinations of sequences, with alignment lengths ranging from about 90
bases to 22830 (the longest with itself).

The process used the 1.2MB multi-sequence file as both reference and
query for 'nucmer,' then ran 'show-coords' to generate a delta file.  The
majority of the run-time was spent parsing the delta file and generating
the alignments from sequence pairs with significant alignments, using
'show-aligns.'

The final step was to generate a Postscript scatterplot.  I later rewrote
that part to generate 16 separate plots with 43x43 sequences each to make
them readable on standard paper size.

---cut---
#!/usr/bin/env perl -w
$pref="all";
$ref=$ARGV[0];
$qry=$ref;
system("nucmer --prefix=${pref} $ref $qry");
system("show-coords -rcl ${pref}.delta > ${pref}.coords");
open(DELTA,"<${pref}.delta");

while (<DELTA>) {
    next if $_ !~ /^>/;
    @inlin = split(/ /,$_);
    $inlin[0] =~ y/>//d;
    print STDERR "Processing $inlin[0] -> $inlin[1]\n";
    system("show-aligns ${pref}.delta $inlin[0] $inlin[1] >
${inlin[0]}_${inlin[1]}.aligns");
}

system("delta-filter -q -r ${pref}.delta > ${pref}.filter");
system("mummerplot ${pref}.delta -R $ref -Q $qry --layout -p ${pref} -S -t
postscript");
---cut---

Example alignment file output:

-- BEGIN alignment [ +1 89 - 187 | -1 11054 - 10956 ]


89         gccatcgcagagcttcgctaagctcactgaacgacagcagcagtatgct
11054      gccatcgcagagcttcgctaagctcactgagcggcagcagcagtatgct
                                         ^  ^

138        acgttcctctccctcgccgcctttgctggagcccccgtcctcttcgatc
11005      acgttcctctccctcgccgcctttgctggagcccccgtcctcttcgatc


187        a
10956      a


--   END alignment [ +1 89 - 187 | -1 11054 - 10956 ]

In this case, the alignment is forward versus reverse strands, with two
SNPs detected.

System: Sun Blade 2000 (2x900MHz SPARC), Solaris 10;
MUMmer 3.20, compiled 32-bit.

--
Larye D. Parkins
Information Engineering Services
PMB 435, 610 N. 1st St., Ste 5
Hamilton, MT 59840
http://www.info-engineering-svc.com

Making IT work since 1965.
Member of: ACM, IEEE Computer Society, USENIX, SAGE, LOPSA

On Tue, 19 Feb 2008, Mike Marchywka wrote:

> 
> > We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid
> > alignments of whole genomes, genomes and contigs, and searching for
> 
> Thanks- that looks like a good tool that I didn't know about. I
> noticed they advertize e coli results prompting me to go back and
> check my own. I'd have to go check the suffix tree literature to see
> what exactly they claim to do in 17 seconds on e coli, but under
> cygwin, I was able to index all matching strings of length 25 or more,
> in about 67 seconds ,
> 
> $ date;$progpath/string_test -fastas both_fasta -index 8 -length 25
> -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchors
> ;date Sat Nov 10 18:45:23 EST 2007 string_test.cpp177 loaded 2 fastas
> Sat Nov 10 18:46:30 EST 2007
> 
> 
> and create a coarse alignment in another 25 seconds,
> 
> $ date; $progpath/mm_align_tool -fastas both_fasta -v -pair_rules
> anchors -doall -pair_align 0 -output text> align1 ;date Sat Nov 10
> 18:50:01 EST 2007 mm_hit_classes.h389 annotation_model.h57 Loaded
> 33373 pair rules. mm_align_tool.cpp309 Doing string PAIR align with
> cutoff 3 mm_align_tool.h227 do_all with only one rule, did you mean
> -mrules? mm_align_tool.cpp318 doing 0 vs 1 mm_align_tool.cpp326 do hit
> dump rules Sat Nov 10 18:50:26 EST 2007
> 
> 
> Do you have actual timing tests for various complete tasks or is 17
> seconds about it?  So, ok 67+25=92 seconds is not real impressive
> compared to 17, and I'm not sure how much I can blame cygwin for this
> :) I guess once I'm sure I have a useful algorithm, I can subtract IO
> time which has been significant in many cases. Someone also privately
> suggested blast's bl2seq and I would point out that this is quite fast
> on pairs of 50k sequences.
> 
> 
> 
> 
> Mike Marchywka
> 586 Saint James Walk
> Marietta GA 30067-7165
> 404-788-1216 (C)<- leave message
> 989-348-4796 (P)<- emergency only
> marchywka at hotmail.com
> Note: Hotmail is blocking my mom's entire
> ISP claiming it is to reduce spam but probably
> to force users to use hotmail. Please DON'T
> assume I am ignoring you and try
> me on marchywka at yahoo.com if no reply
> here. Thanks.
> 
> 
> _________________________________________________________________
> Shed those extra pounds with MSN and The Biggest Loser!
> http://biggestloser.msn.com/
> 
> 


From marchywka at hotmail.com  Wed Feb 20 07:39:14 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Wed, 20 Feb 2008 07:39:14 -0500
Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM
	for	tool testing
In-Reply-To: <c14.2fa96ae4.34ec59ba@aol.com>
References: <c14.2fa96ae4.34ec59ba@aol.com>
Message-ID: <BAY108-W17F4DB3692D505AC881208BE230@phx.gbl>


Thanks. Do you know off hand if this has a memory saving mode? It wasn't immediately obvious
from the ./mafft --help

Also, there may be some ability to use concepts from here if you want to optimize this :http://www.fftw.org/ 

I tried it on the e coli strains and the memory usage went to 1Gb ( VM limit, I only have 256M physical)
and of course ( well, doesn't have to do but normally does ) CPU went to low levels ( more VM action
than computation presumably). It seemed to know it should go to something called "memsave mode" but I'm
not sure it realized it was playing with virtaul memory ( do you know if it is supposed to
be cache aware or otherwise know about memory?): 

$ ./mafft --retree 1 /cygdrive/e/new/temp/both_fasta> xxxxx
reallocating...
done.
reallocating...
done.
generating 200PAM scoring matrix for nucleotides ... done
scoremtx = -1
Gap Penalty = -1.53, +0.00, -0.12


Making a distance matrix ..
    1 / 2
done.

Constructing a UPGMA tree ...
    0 / 2
done.

Progressive alignment ...
STEP     1 / 1
len1=4979619, len2=4639675, Switching to the memsave mode
fm FFT ...


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> From: Sterten at aol.com
> Date: Tue, 19 Feb 2008 11:11:38 -0500
> To: bbb at bioinformatics.org
> Subject: Re: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing
>
> I recommend this for alignment:
>
> _http://align.bmr.kyushu-u.ac.jp/mafft/online/server/_
> (http://align.bmr.kyushu-u.ac.jp/mafft/online/server/)
>
>
>
>
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Climb to the top of the charts!?Play the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan


From bsmagic at gmail.com  Fri Feb 22 02:23:17 2008
From: bsmagic at gmail.com (Sheng Wang)
Date: Fri, 22 Feb 2008 15:23:17 +0800
Subject: [BiO BB] LINES SINES MICROSATELLITES and TFOs
In-Reply-To: <16381612.686151200151021279.JavaMail.coremail@bj126app66.126.com>
References: <E1JChAE-0001wN-FR@dallas0.bioinformatics.org>
	<dd4cef240801100331n2b1dc373i50c4d0a1cc52b356@mail.gmail.com>
	<16381612.686151200151021279.JavaMail.coremail@bj126app66.126.com>
Message-ID: <793f8aed0802212323r17818144s1d3c732616f2c4a3@mail.gmail.com>

better use WU-BLAST as enginee.

On 1/12/08, ocean <oceanhu at 126.com> wrote:
>
>
>   i think you can try ucsc, use the repeatmasker information of human
>
>   the following is RepeatMasker (rmsk) Track Description
>
> Short interspersed nuclear elements (SINE), which include ALUs
> Long interspersed nuclear elements (LINE)
> Long terminal repeat elements (LTR), which include retroposons
> DNA repeat elements (DNA)
> Simple repeats (micro-satellites)
> Low complexity repeats
> Satellite repeats
> RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)
> Other repeats, which includes class RC (Rolling Circle)
> Unknown
>
>
>
> ?2008-01-10?"Narendran GR" <narengr at gmail.com> ???
>
> Hi friends....
>
> > I want to collect information on LINES SINES MICROSATELLITES and TFOs in
> > Human genome...
> >
> > Where can i find the information???
> > Is there any information about them in NCBI???
>
> Please help me with the same....
> >
>
> --
> Regards
> Narendran G R
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
Best Regards
Sheng Wang


From marchywka at hotmail.com  Fri Feb 22 06:47:57 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Fri, 22 Feb 2008 06:47:57 -0500
Subject: [BiO BB] LINES SINES MICROSATELLITES and TFOs
In-Reply-To: <793f8aed0802212323r17818144s1d3c732616f2c4a3@mail.gmail.com>
References: <E1JChAE-0001wN-FR@dallas0.bioinformatics.org>
	<dd4cef240801100331n2b1dc373i50c4d0a1cc52b356@mail.gmail.com>
	<16381612.686151200151021279.JavaMail.coremail@bj126app66.126.com> 
	<793f8aed0802212323r17818144s1d3c732616f2c4a3@mail.gmail.com>
Message-ID: <BAY108-W41AA5C0B20D08902A88D82BE1D0@phx.gbl>


On a related but more primitive topic, anyone have test cases or
new tools for CRISPR's?   
( these come up on pubmed, but I'll pass thislink along for background
since it is all I have in front of me now:

http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17537822 

)


I could run my modified tools to find suspects on the e coli strains I happen to have,

$ date;$progpath/rules_annotater -fastas f2 -rci -regex "[\-1]{24,47}.{26,72}[\-1]{24,47}"> f2_crisps 2>f2_diags ;date
Wed Feb 20 14:04:42 EST 2008
Wed Feb 20 14:04:53 EST 2008

and in fact determine that there were some suspects regularly separated by about 120 bases, but
have no idea if these are "right:"

>gi|48994873|gb|U00096.2| Escherichia coli K12 MG1655, complete genome

$ cat f2_crsips | awk '{print $1" "$1-last" "$3; last=$1}'


[...]

2660373 94198 CACTGTAGGCCTGATAAGACGCATTACGCGTCGCATCAGGCAACGGCTGTCGGATGCGGCGTGAACGCCTTATCCGACCTACGGTTCTGTTCACTGTAGGCCTGATAAGACGCAT
2660488 115 TACGCGTCGCATCAGGCAACGGCTGTCGGATGCGGCGTGAACGCCTTATCCGACCTACGGTTCTGTTCACTGTAGGCCTGATAAGACGCATTACGCGTCGCATCAGGCAACGGCT
2875903 215415 CCCGGTTTATCCCCGCTGGCGCGGGGAACTCCCGGGGGATAATGTTTACGGTCATGCGCCCCCCGGTTTATCCCCGCTGGCGCGG
2876027 124 CGGTTTATCCCCGCTGGCGCGGGGAACTCAAGCTGGCTGGCAATCTCTTTCGGGGTGAGTCCGGTTTATCCCCGCTGGCGCGGGG
2876149 122 CGGTTTATCCCCGCTGGCGCGGGGAACTCGCAGGCGGCGACGCGCAGGGTATGCGCGATTCGCGGTTTATCCCCGCTGGCGCGGGG
2876273 124 CGGTTTATCCCCGCTGGCGCGGGGAACTCTCAACATTATCAATTACAACCGACAGGGAGCCCGGTTTATCCCCGCTGGCGCGGGG
2876394 121 GCGGTTTATCCCCGCTGGCGCGGGGAACTCTGCGTGAGCGTATCGCCGCGCGTCTGCGAAAGCGGTTTATCCCCGCTGGCGCGGG
2902035 25641 GGTTTATCCCCGCTGGCGCGGGGAACTCGACAGAACGGCCTCAGTAGTCTCGTCAGGCTCCGGTTTATCCCCGCTGGCGCGGGGA
2902155 120 TCGGTTTATCCCCGCTGGCGCGGGGAACACGGGCGCACGGAATACAAAGCCGTGTATCTGCTCGGTTTATCCCCGCTGGCGCGGG
2902279 124 GGTTTATCCCCGCTGGCGCGGGGAACACGAAATGCTGGTGAGCGTTAATGCCGCAAACACAGGTTTATCCCCGCTGGCGCGGGGA
2945406 43127 GACGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCATAACCCGAAGGTCGTCGGTTCAAATCCGGCCCCCGCAACCAATTAAAATTTGATGAAGTAAAGCAGTACGGTGACGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCA
2945554 148 TAACCCGAAGGTCGTCGGTTCAAATCCGGCCCCCGCAACCAATCAAATTTGATGAAGTAAAAGCAGTACGGTGACGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCATAACCCGAAGGTCGTCGGTTCAAATCCGGCCCCCGCAA
3229358 283804 CTGCACCGCGCCACTGGCGGATGCGGCGTGAACGCCTTATCCGCCCTACATGTGTGTTCCCGTAGGTCGGATAAGACGCGACAAGCGTCGCATCCGGCATCTGCACCGCGCCACTGGCGGATGCGGCG

[...]

Wasn't sure if anyone knows about these things- known test cases or
open issues. 
Thanks.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> Date: Fri, 22 Feb 2008 15:23:17 +0800
> From: bsmagic at gmail.com
> To: bbb at bioinformatics.org
> Subject: Re: [BiO BB] LINES SINES MICROSATELLITES and TFOs
>
> better use WU-BLAST as enginee.
>
> On 1/12/08, ocean  wrote:
>>
>>
>> i think you can try ucsc, use the repeatmasker information of human
>>
>> the following is RepeatMasker (rmsk) Track Description
>>
>> Short interspersed nuclear elements (SINE), which include ALUs
>> Long interspersed nuclear elements (LINE)
>> Long terminal repeat elements (LTR), which include retroposons
>> DNA repeat elements (DNA)
>> Simple repeats (micro-satellites)
>> Low complexity repeats
>> Satellite repeats
>> RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)
>> Other repeats, which includes class RC (Rolling Circle)
>> Unknown
>>
>>
>>
>> ?2008-01-10?"Narendran GR"  ???
>>
>> Hi friends....
>>
>>> I want to collect information on LINES SINES MICROSATELLITES and TFOs in
>>> Human genome...
>>>
>>> Where can i find the information???
>>> Is there any information about them in NCBI???
>>
>> Please help me with the same....
>>>
>>
>> --
>> Regards
>> Narendran G R
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
>
>
>
> --
> Best Regards
> Sheng Wang
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Connect and share in new ways with Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008


From idoerg at gmail.com  Sun Feb 24 18:18:03 2008
From: idoerg at gmail.com (Iddo Friedberg)
Date: Sun, 24 Feb 2008 15:18:03 -0800
Subject: [BiO BB] First call for talks: AFP / Biosapiens 2008 July 18-19
	Toronto, Canada
Message-ID: <b5bbbc970802241518k60ec2bb3g3e64ea16e9b56364@mail.gmail.com>

Joint AFP-Biosapiens SIG

http://2008.BioFunctionPrediction.org<http://2008.biofunctionprediction.org/>

The Automated Function Prediction (AFP) SIG and the Biosapiens European
network of excellence are teaming up to hold a two-day Special Interest
Group (SIG) meeting July 18-19, 2007 alongside ISMB 2008 in Toronto, Canada.

The deluge of genomic information is challenging biologists to annotate
this data, from locating genes in the raw data through predicting the
function form protein sequence and structure. AFP and Biosapiens share
many common goals, and this year we have decided to join forces for a
SIG that will deal with a wide scope of gene, protein, and genomic
annotations.

For more information:
http://2008.BioFunctionPrediction.org<http://2008.biofunctionprediction.org/>


Talks are sought in, but are not limited to:

    * Various aspects of gene and protein function prediction
          o Function prediction using sequence based methods. This
would include "classic" methods such as detection of functional motifs
and inferring function from sequence similarity.
          o Function from genomic information: prediction by genomic
location; locus comparison with other organisms; function gain and loss.
          o Phylogeny based methods
          o Function from molecular interactions
          o Function from structure
          o Function prediction using combined methods
          o "Meta-talks" discussing the limitations and horizons of
computational function prediction.
          o Assessing function prediction programs
    * Genomic annotation
          o Gene finding
          o Genome visualization
          o Collaborative annotation
          o Cooperation between experimental and computational biologists
          o Metagenomics


This year we are considering proposals fro mini-tutorials. For more
information see the AFP / Biosapiens site.

Confirmed speakers include:

       *  Barry Honig,Columbia University and Howard Hughes Medical
Institute, USA
       *  Peer Bork, European Molecular Biology Laboratories, Germany
       *  Andrew Emily, University of Toronto, Canada
       *  Olga Troyanskaya, Princeton University, USA
       *  Kimmen Sjolander, University of California Berkeley, USA

Important dates::


   - April 20, 2008: Talk, tutorial and poster abstracts due.
   - May 16, 2008: notification of acceptance
   - May 25, 2007: final abstracts due
   - July 18-19, 2008: AFP-Biosapiens SIG alongside ISMB 2008 in Toronto,
   Canada.

For inquiries, including sponsorship opportunities, please email:

afpbiosap2008 at gmail.com


-- 

Iddo Friedberg, Ph.D.
CALIT2, mail code 0440
University of California, San Diego
9500 Gilman Drive
La Jolla, CA 92093-0440, USA
T: +1 (858) 534-0570
T: +1 (858) 646-3100 x3516
http://iddo-friedberg.org


From bioinfosm at gmail.com  Mon Feb 25 13:26:28 2008
From: bioinfosm at gmail.com (Samantha Fox)
Date: Mon, 25 Feb 2008 12:26:28 -0600
Subject: [BiO BB] tissue specificity
Message-ID: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com>

Hi all.

Are there any tools, or prediction software for tissue specificity; given a
set of genes... what tissue they are most likely to be, or given a set of
expression data and tissues.. integrate it all to determing tissue specific
genes!

Thanks ..

~S


From idoerg at gmail.com  Mon Feb 25 14:12:58 2008
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 25 Feb 2008 11:12:58 -0800
Subject: [BiO BB] First call for participation AFP / Biosapiens SIG:
	deadline dates correction
Message-ID: <b5bbbc970802251112v6ab94531ge7397372ecf1fd1@mail.gmail.com>

Please note important change in deadline dates.

Important dates:


   - March 31, 2008: Talk, tutorial and poster abstracts due.
   - April 20, 2008: notification of acceptance
   - May 5, 2007: final abstracts due
   - July 18-19, 2008: AFP-Biosapiens SIG alongside ISMB 2008 in Toronto,
   Canada.


Joint AFP-Biosapiens SIG

http://2008.BioFunctionPrediction.org<http://2008.biofunctionprediction.org/>

The Automated Function Prediction (AFP) SIG and the Biosapiens European
network of excellence are teaming up to hold a two-day Special Interest
Group (SIG) meeting July 18-19, 2007 alongside ISMB 2008 in Toronto, Canada.

The deluge of genomic information is challenging biologists to annotate
this data, from locating genes in the raw data through predicting the
function form protein sequence and structure. AFP and Biosapiens share
many common goals, and this year we have decided to join forces for a
SIG that will deal with a wide scope of gene, protein, and genomic
annotations.

This year we are also considering proposals for mini-tutorials. see the AFP
/ Biosapiens 2008 site for more details.

For more information:
http://2008.BioFunctionPrediction.org<http://2008.biofunctionprediction.org/>


Talks are sought in, but are not limited to:

    * Various aspects of gene and protein function prediction
          o Function prediction using sequence based methods. This
would include "classic" methods such as detection of functional motifs
and inferring function from sequence similarity.
          o Function from genomic information: prediction by genomic
location; locus comparison with other organisms; function gain and loss.
          o Phylogeny based methods
          o Function from molecular interactions
          o Function from structure
          o Function prediction using combined methods
          o "Meta-talks" discussing the limitations and horizons of
computational function prediction.
          o Assessing function prediction programs
    * Genomic annotation
          o Gene finding
          o Genome visualization
          o Collaborative annotation
          o Cooperation between experimental and computational biologists
          o Metagenomics


This year we are considering proposals fro mini-tutorials. For more
information see the AFP / Biosapiens site.

Confirmed speakers include:

       *  Barry Honig,Columbia University and Howard Hughes Medical
Institute, USA
       *  Peer Bork, European Molecular Biology Laboratories, Germany
       *  Andrew Emily, University of Toronto, Canada
       *  Olga Troyanskaya, Princeton University, USA
       *  Kimmen Sjolander, University of California Berkeley, USA

Important dates::


   - March 31, 2008: Talk, tutorial and poster abstracts due.
   - April 20, 2008: notification of acceptance
   - May 5, 2007: final abstracts due
   - July 18-19, 2008: AFP-Biosapiens SIG alongside ISMB 2008 in Toronto,
   Canada.

For inquiries, including sponsorship opportunities, please email:

afpbiosap2008 at gmail.com

-- 

Iddo Friedberg, Ph.D.
CALIT2, mail code 0440
University of California, San Diego
9500 Gilman Drive
La Jolla, CA 92093-0440, USA
T: +1 (858) 534-0570
T: +1 (858) 646-3100 x3516
http://iddo-friedberg.org


From marchywka at hotmail.com  Mon Feb 25 15:39:12 2008
From: marchywka at hotmail.com (Mike Marchywka)
Date: Mon, 25 Feb 2008 15:39:12 -0500
Subject: [BiO BB] tissue specificity
In-Reply-To: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com>
References: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com>
Message-ID: <BAY108-W428D5EF21A0B9F5F739442BE180@phx.gbl>


Normally when I post to this list I try to provide some background as I have no idea where
everyone else is interest-wise. I vaguely remember running into something related while
researching some other topic and went back to do a quick literature search. I'm not sure
of your immediate problem or state of knowledge but, from what I can find, this is generally an open
area. For example, you could try reading stuff like this,

http://www.ncbi.nlm.nih.gov/pubmed/18194723?ordinalpos=9&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum

Clin Lab Med. 2008 Mar;28(1):127-43. Links
Data mining for biomarker development: a review of tissue specificity analysis.Klee EW.
Division of Experimental Pathology, Department of Laboratory Medicine and Pathology, Mayo Clinic, 200 1st Street SW, Stabile 2-50, Rochester, MN 55905, USA


I was surprised to find many genes labelled, "tissue specific"

http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term="tissue%20specific"

and maybe you could look into some of the related publications. 

Do you have a specific thesis, problem, or objective?


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> Date: Mon, 25 Feb 2008 12:26:28 -0600
> From: bioinfosm at gmail.com
> To: bio_bulletin_board at bioinformatics.org
> Subject: [BiO BB] tissue specificity
>
> Hi all.
>
> Are there any tools, or prediction software for tissue specificity; given a
> set of genes... what tissue they are most likely to be, or given a set of
> expression data and tissues.. integrate it all to determing tissue specific
> genes!
>
> Thanks ..
>
> ~S
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
Need to know the score, the latest news, or you need your Hotmail?-get your "fix".
http://www.msnmobilefix.com/Default.aspx


From bioinfosm at gmail.com  Mon Feb 25 15:54:53 2008
From: bioinfosm at gmail.com (Samantha Fox)
Date: Mon, 25 Feb 2008 14:54:53 -0600
Subject: [BiO BB] tissue specificity
In-Reply-To: <BAY108-W428D5EF21A0B9F5F739442BE180@phx.gbl>
References: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com>
	<BAY108-W428D5EF21A0B9F5F739442BE180@phx.gbl>
Message-ID: <726450810802251254y4a98ea83q8da04ca7ac86e000@mail.gmail.com>

Mike,

I appreciate your response. Well, my objective is to look at expression
data, with respect to tissue specificity. I needed specific help in this
case, as I am not able to find any help or tools/software to deal with
tissue specificity information.
I came across this tool GeneMerge (http://genemerge.bioteam.net/), and
tissuedb from HUSAR group (
http://genome.dkfz-heidelberg.de/menu/tissue_db/index.html) ... but have not
yet conquered them.

Anyone with experience on these or similar tools, do point me to FAQs or
other details, when using a group of genes to determine tissue specificity
of the group.

~S
On Mon, Feb 25, 2008 at 2:39 PM, Mike Marchywka <marchywka at hotmail.com>
wrote:

>
> Normally when I post to this list I try to provide some background as I
> have no idea where
> everyone else is interest-wise. I vaguely remember running into something
> related while
> researching some other topic and went back to do a quick literature
> search. I'm not sure
> of your immediate problem or state of knowledge but, from what I can find,
> this is generally an open
> area. For example, you could try reading stuff like this,
>
>
> http://www.ncbi.nlm.nih.gov/pubmed/18194723?ordinalpos=9&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum
>
> Clin Lab Med. 2008 Mar;28(1):127-43. Links
> Data mining for biomarker development: a review of tissue specificity
> analysis.Klee EW.
> Division of Experimental Pathology, Department of Laboratory Medicine and
> Pathology, Mayo Clinic, 200 1st Street SW, Stabile 2-50, Rochester, MN
> 55905, USA
>
>
> I was surprised to find many genes labelled, "tissue specific"
>
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term="tissue%20specific"
>
> and maybe you could look into some of the related publications.
>
> Do you have a specific thesis, problem, or objective?
>
>
>
> Mike Marchywka
> 586 Saint James Walk
> Marietta GA 30067-7165
> 404-788-1216 (C)<- leave message
> 989-348-4796 (P)<- emergency only
> marchywka at hotmail.com
> Note: Hotmail is blocking my mom's entire
> ISP claiming it is to reduce spam but probably
> to force users to use hotmail. Please DON'T
> assume I am ignoring you and try
> me on marchywka at yahoo.com if no reply
> here. Thanks.
>
> > Date: Mon, 25 Feb 2008 12:26:28 -0600
> > From: bioinfosm at gmail.com
> > To: bio_bulletin_board at bioinformatics.org
> > Subject: [BiO BB] tissue specificity
>  >
> > Hi all.
> >
> > Are there any tools, or prediction software for tissue specificity;
> given a
> > set of genes... what tissue they are most likely to be, or given a set
> of
> > expression data and tissues.. integrate it all to determing tissue
> specific
> > genes!
> >
> > Thanks ..
> >
> > ~S
> > _______________________________________________
> > BBB mailing list
> > BBB at bioinformatics.org
> > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
> _________________________________________________________________
> Need to know the score, the latest news, or you need your Hotmail(R)-get
> your "fix".
> http://www.msnmobilefix.com/Default.aspx
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


From ryan at raaum.org  Mon Feb 25 20:19:10 2008
From: ryan at raaum.org (Ryan Raaum)
Date: Mon, 25 Feb 2008 20:19:10 -0500
Subject: [BiO BB] tissue specificity
In-Reply-To: <726450810802251254y4a98ea83q8da04ca7ac86e000@mail.gmail.com>
References: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com>
	<BAY108-W428D5EF21A0B9F5F739442BE180@phx.gbl>
	<726450810802251254y4a98ea83q8da04ca7ac86e000@mail.gmail.com>
Message-ID: <a33f26610802251719q4da61926yb2c052eeed71d47@mail.gmail.com>

Have you looked at the Novartis SymAtlas?

http://symatlas.gnf.org/SymAtlas/

The published reference is http://www.pnas.org/cgi/content/abstract/012025199v1

-Ryan

On Mon, Feb 25, 2008 at 3:54 PM, Samantha Fox <bioinfosm at gmail.com> wrote:
> Mike,
>
>  I appreciate your response. Well, my objective is to look at expression
>  data, with respect to tissue specificity. I needed specific help in this
>  case, as I am not able to find any help or tools/software to deal with
>  tissue specificity information.
>  I came across this tool GeneMerge (http://genemerge.bioteam.net/), and
>  tissuedb from HUSAR group (
>  http://genome.dkfz-heidelberg.de/menu/tissue_db/index.html) ... but have not
>  yet conquered them.
>
>  Anyone with experience on these or similar tools, do point me to FAQs or
>  other details, when using a group of genes to determine tissue specificity
>  of the group.
>
>  ~S
>  On Mon, Feb 25, 2008 at 2:39 PM, Mike Marchywka <marchywka at hotmail.com>
>  wrote:
>
>
>
>  >
>  > Normally when I post to this list I try to provide some background as I
>  > have no idea where
>  > everyone else is interest-wise. I vaguely remember running into something
>  > related while
>  > researching some other topic and went back to do a quick literature
>  > search. I'm not sure
>  > of your immediate problem or state of knowledge but, from what I can find,
>  > this is generally an open
>  > area. For example, you could try reading stuff like this,
>  >
>  >
>  > http://www.ncbi.nlm.nih.gov/pubmed/18194723?ordinalpos=9&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum
>  >
>  > Clin Lab Med. 2008 Mar;28(1):127-43. Links
>  > Data mining for biomarker development: a review of tissue specificity
>  > analysis.Klee EW.
>  > Division of Experimental Pathology, Department of Laboratory Medicine and
>  > Pathology, Mayo Clinic, 200 1st Street SW, Stabile 2-50, Rochester, MN
>  > 55905, USA
>  >
>  >
>  > I was surprised to find many genes labelled, "tissue specific"
>  >
>  > http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term="tissue%20specific"
>  >
>  > and maybe you could look into some of the related publications.
>  >
>  > Do you have a specific thesis, problem, or objective?
>  >
>  >
>  >
>  > Mike Marchywka
>  > 586 Saint James Walk
>  > Marietta GA 30067-7165
>  > 404-788-1216 (C)<- leave message
>  > 989-348-4796 (P)<- emergency only
>  > marchywka at hotmail.com
>  > Note: Hotmail is blocking my mom's entire
>  > ISP claiming it is to reduce spam but probably
>  > to force users to use hotmail. Please DON'T
>  > assume I am ignoring you and try
>  > me on marchywka at yahoo.com if no reply
>  > here. Thanks.
>  >
>  > > Date: Mon, 25 Feb 2008 12:26:28 -0600
>  > > From: bioinfosm at gmail.com
>  > > To: bio_bulletin_board at bioinformatics.org
>  > > Subject: [BiO BB] tissue specificity
>  >  >
>  > > Hi all.
>  > >
>  > > Are there any tools, or prediction software for tissue specificity;
>  > given a
>  > > set of genes... what tissue they are most likely to be, or given a set
>  > of
>  > > expression data and tissues.. integrate it all to determing tissue
>  > specific
>  > > genes!
>  > >
>  > > Thanks ..
>  > >
>  > > ~S
>  > > _______________________________________________
>  > > BBB mailing list
>  > > BBB at bioinformatics.org
>  > > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board
>  >
>  > _________________________________________________________________
>  > Need to know the score, the latest news, or you need your Hotmail(R)-get
>
> > your "fix".
>  > http://www.msnmobilefix.com/Default.aspx
>  > _______________________________________________
>  > BBB mailing list
>  > BBB at bioinformatics.org
>  > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board
>  >
>
>
> _______________________________________________
>  BBB mailing list
>  BBB at bioinformatics.org
>  http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


-- 
Ryan Raaum
Anthropology
Lehman College
The City University of New York
250 Bedford Park Blvd W.
Bronx, NY 10468
e: ryan.raaum at lehman.cuny.edu
w: http://raaum.org
o: (718) 960-8845
f: (718) 960-8406


From jeff at bioinformatics.org  Fri Feb 29 18:32:53 2008
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Fri, 29 Feb 2008 18:32:53 -0500
Subject: [BiO BB] February '08 issue of the Bioinformatics.Org Newsletter is
 now available
Message-ID: <47C89625.3070902@bioinformatics.org>

The newsletter includes some of the best of our various online forums and details some of our internal (and external) activities.

IN THIS ISSUE:
- Unveiling Pipet (part one)
- Project spotlight
- Job search highlight
- Franklin Award laureate (first announcement)
- Upcoming events

URL: http://www.bioinformatics.org/newsletter/v01-n02.pdf

Cheers,
Jeff
-- 
J.W. Bizzaro
Bioinformatics Organization, Inc. (Bioinformatics.Org)
E-mail: jeff at bioinformatics.org
Phone:  +1 978 562 4800
--


From kanzure at gmail.com  Thu Feb 28 15:18:54 2008
From: kanzure at gmail.com (Bryan Bishop)
Date: Thu, 28 Feb 2008 14:18:54 -0600
Subject: [BiO BB]  Will this implementation of the lac operon work?
Message-ID: <200802281418.54503.kanzure@gmail.com>

Hi all,

I am designing a regulatory circuit and have been scratching my head 
over how to include enhancers, promoters and TATA boxes, etc. I have 
decided to make some progress by randomly guessing and getting feedback 
from the community. Here's my work:

http://heybryan.org/genetic-circuits.html

Basically I've taken the sequences of BBa_I14032, BBa_R0011, and 
BBa_B0034, attached the peptide I want to express, and have at it. It 
is my understanding, then, that given a lactose-full environment, my 
repressible circuit should express the peptide, and if I wanted to add 
a second circuit I could express a repressor that would bond to prevent 
RNA polymerase II from manufacturing my peptide as frequently. Correct?

The documentation is rather sparse, so once I get my head around this 
I'll be sure to go back and fill in the gaps in the documents.

Thanks,
- Bryan
(I also sent this over to OWW to see what they have to say.)
________________________________________
Bryan Bishop
http://heybryan.org/


From T.Hulsen at cmbi.ru.nl  Fri Feb 29 13:23:25 2008
From: T.Hulsen at cmbi.ru.nl (Tim Hulsen)
Date: Fri, 29 Feb 2008 19:23:25 +0100
Subject: [BiO BB] MyJournals.org
Message-ID: <007f01c87b00$2f3380a0$6bfdfea9@Tim>

Do you want to have easy access to the latest issues of your favourite 
journals, from all over the world, just through your web browser? 

Please visit http://www.myjournals.org , create a login and make your 
pick from the 422 journals currently available.


From jeff at bioinformatics.org  Fri Feb 29 19:54:14 2008
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Fri, 29 Feb 2008 19:54:14 -0500
Subject: [BiO BB] List configuration
In-Reply-To: <476C1868.8060101@bioinformatics.org>
References: <476C1868.8060101@bioinformatics.org>
Message-ID: <47C8A936.2060206@bioinformatics.org>

Please note, for the reasons specified below, that email should be addressed to <bbb at bioinformatics.org> instead of the old address.

Thanks,
Jeff

J.W. Bizzaro wrote:
> 
> FYI, the Mailman mailing list system that we use no longer accepts
> underscore characters in the mailing list name and was printing
> "bio_bulletin_board-bounces at bioinformatics.org" as
> "bio_bulletin-bounces at board" in outgoing messages.  This was of
> course causing messages to be bounced back, leading to many
> subscribers being automatically unsubscribed.
> 

-- 
J.W. Bizzaro
Bioinformatics Organization, Inc. (Bioinformatics.Org)
E-mail: jeff at bioinformatics.org
Phone:  +1 978 562 4800
--