From mourad12345678 at yahoo.com Sun Feb 3 20:04:37 2008 From: mourad12345678 at yahoo.com (Mourad Elloumi) Date: Sun, 3 Feb 2008 17:04:37 -0800 (PST) Subject: [BiO BB] Call for Paper : Algorithms in Molecular Biology - ALBIO'08 (Vienna, July 2008 ) Message-ID: <910936.72662.qm@web31514.mail.mud.yahoo.com> CALL FOR PAPERS Higher School of Sciences and Technologies of Tunis (Tunisia) Organizes Algorithms in Molecular Biology (ALBIO'08) Workshop held in parallel with 2nd International Conference on Bioinformatics Research and Development (BIRD?08) www.birdconf.org Technical University of Vienna, Austria July 7-9, 2008 Computational Molecular Biology has emerged from the Human Genome Project as an important discipline for academic research and industrial application. The exponential growth of the size of biological databases, the complexity of biological problems and the necessity to deal with errors in biological sequences, result in time efficiency and memory requirements. The development of fast, low memory requirements and high-performances algorithms is thus increasingly important in Computational Molecular Biology. We are interested in papers that deal with algorithms that solve fundamental and/or applied problems in Molecular Biology, that are computationally efficient, that have been implemented and experimented on simulated and/or on real biological sequences, and that provide interesting new results. The submitted papers should present recent research results and identify and explore directions for future research. Topics include, but not limited to: (i) strings processing, (ii) biological sequences comparison, (iii) structures prediction, (iv) phylogeny reconstruction, (v) DNA sequences assembly, clustering, and mapping, (vi) molecular evolution, (vii) genes prediction/recognition, (viii) genes expression (ix) haplotyping (x) genomes rearrangement (xi) strings barecoding. You are invited to submit a draft paper in PDF format before March 1, 2008 to the Workshop Chair: Dr. Mourad Elloumi, E.Mail: Mourad.Elloumi at fsegt.rnu.tn or Mourad12345678 at yahoo.com Papers should not exceed 10 pages in Lecture Notes in Bioinformatics (LNBI) format. All accepted papers will be published in LNBI www.springer.de/comp/lncs/authors.html by Springer Verlag. Program Committee: . Mourad Elloumi, University of Tunis, Tunisia, (Chair) . Sami Khuri, San Jos? State University, USA . Alain Gu?noche, Institute of Mathematics of Luminy, Marseille, France. . Nadia Pisanti, University of Pisa, Italy . Gianluca Della Vedova, University of Milano-Bicocca, Italy . Pierre Peterlongo, IRISA-INRIA, Rennes, France . Jan Holub, Czech Technical University in Prague, Czech Republic Important Dates: Submission of Full Papers: March 1, 2008 Notification of Acceptance: April 1, 2008 Camera-ready Copies: April 15, 2008 ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From marchywka at hotmail.com Mon Feb 4 08:22:45 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Mon, 4 Feb 2008 08:22:45 -0500 Subject: [BiO BB] looking for reference on DSCAM exon locations. In-Reply-To: <2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com> References: <10f601c857c5$26607a90$0301a8c0@openhelia1076a> <2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com> Message-ID: Hi, I'm using DSCAM, and mostly fly DSCAM, as a test case to develop more general tools for exploring base sequences. Some early results don't appear to be trivially wrong, but I have a few missing pieces of info I can't quite locate to further explore initial output. If you could point me to a link that may address these issues that would be most helpful. Essentially, I just need to know the exact location of each exon variant, preferably in as many species as possible but so far I have only located this for exon 4 and otherwise had to guess from ref [3] results. I'm trying to generalize results in [4] and [5] and search for DNA features that may suggest splice rules or answer some questions posed in [6]. I'm searching for stem-loop structures as in [4] and [5], as well as reverse-complement matches that may be well separated as in [8]. >From [1], I gather that there are a certain number of exon variants for melanogaster. Notably, 12 for exon 4, 48 for exon 6, and 33 for exon 9. I can get exact locations for exon 4 and 5 starts from [2], but am stuck using ambiguous flybase exons. From [3], I end up with 98 exons which is short of the 100+ I get from adding up earlier variants or the 115 cited in [6]( or [7] ). I tried a sloppy version of the stem-loop in [4] that relates to pseudoexon. In my bastardized regular-expression format (I'm using '[]' for a group, not the normal PERL convention, don't ask..., and implicitly match one group to its reverse complement- and,yes, the quantifiers are redundant. Otherwise, this is just a PERL REGEX): [\1]{6,6}.{2,10}[\2]{2,3}.{1,8}[\3]{1,5}.{0,4}[\3]{1,5}.{1,8}[\2]{2,3}.{2,10}[\1]{6,6}.{6,11}[\4]{4,7}.{0,4}[\4]{4,7}> RC5|5|CFTR I was rather excited that these "hits" are in many locations BUT are excluded in the range of exon 4 variant. In particular, this mish-mash of hits shows where things seem to occur. It appears that exon 6 may or may not obey similar distribtion of hits. Each line is the location of some "rule hit" where the first number is location in genome, "Dscam" indicates the flybase exon number, "RC5" is my rule hit, and the other things are rule hits to known locations such as exon 4 starts: ( I tried to make the numbers useful to outside reader but this confuses things like the exon 4 rule hits that end where Dscam starts- my hits are leaders, the dscam labelled hits are where exon actually starts): $ cat flybase_exon_starts ffx fg | sort -g | awk '{$1=3269374-$1; print $0;}' | more> mish_mash.txt 3269374 Drosophila melanogaster chromosome 2R 3269374 Drosophila melanogaster chromosome 2R 3269375 Dscam:98 3268566 875 TATTTCATGCTACTTTTTATTTATAAATCGAGTTTTAGAGGAAATAATTGCAGTCCCTGAATTTTCAG> RC5|5|CFTR 3267836 1597 TAATTTCTGTTTACATTGATACTCCGCTTAATGTAAATTATTATACTTATTTTACAATAA> RC5|5|CFTR 3265322 4114 TTTATAAGCACAAAAGGAGTAGCCCCTATAAAAAATGTATAAACAAAATAAATCATATAATAT> RC5|5|CFTR 3265220 Dscam:97 3264108 5333 ATTTATTCCTCCATTTTACTTTTCCCTATTATCGTAATAATTGATAAATTGCATATGCAAACTATTTG> RC5|5|CFTR 3263246 6195 AATGCGATGTTTATGTTGTTGTTCCTGTCTCCGCTACAGTCGGACGCATTTAATTCGCAATTTCATTG> RC5|5|CFTR 3259916 9510 TTGCTTAAATTAATTAAAGCATTGGCTTAAAGAAGCAAAGAATCTATAATTAT> RC5|5|CFTR 3257467 11974 TTAAACTATTACTTTATAGATAAAAGTATATCCTCACAATAATTTTGTTTAACAAATGCATTCAAATG> RC5|5|CFTR 3257239 12192 AATTGTTCATTGCATTCACATTATTTAATTAACAATTAATAAATAATTTTATTTTAAA> RC5|5|CFTR 3257148 12285 TAAGAACATAACTATACTTATTCTGTGCCTTTGAGCTTTCTTATATTAATGGATTTAAAT> RC5|5|CFTR 3256238 Dscam:96 3255817 13612 TTAAAAAAGGATAGATATGAGCTTTATATATTTTTAAAAAGTTTAAAAAAATATTT> RC5|5|CFTR 3254485 14902 CGGCCTTTTCCCAG>local|i|DNA Fly DCAM Exon 4.1 3254472 Dscam:95 3254146 15241 TCCTACCTGTTTAG>local|i|DNA Fly DCAM Exon 4.2 3254133 Dscam:94 3253623 15764 CATTGCTGTTTTAG>local|i|DNA Fly DCAM Exon 4.3 3253610 Dscam:93 3253001 16386 GAACTCACCTTCAG>local|i|DNA Fly DCAM Exon 4.4 3252988 Dscam:92 3252698 16689 CTCTTGCTTTACAG>local|i|DNA Fly DCAM Exon 4.5 3252685 Dscam:91 3252412 16975 ATTTTAAATCGCAG>local|i|DNA Fly DCAM Exon 4.6 3252399 Dscam:90 3252136 17251 GCACACCTTTGCAG>local|i|DNA Fly DCAM Exon 4.7 3252123 Dscam:89 3251867 17520 TATTCGATTCAAAG>local|i|DNA Fly DCAM Exon 4.8 3251854 Dscam:88 3251567 17820 TTCTATCGACTCAG>local|i|DNA Fly DCAM Exon 4.9 3251554 Dscam:87 3251284 18103 CTGATTTCCTTCAG>local|i|DNA Fly DCAM Exon 4.10 3251271 Dscam:86 3251009 18378 CTCCCGTCTTGCAG>local|i|DNA Fly DCAM Exon 4.11 3250996 Dscam:85 3250713 18674 CGTACACTTTGCAG>local|i|DNA Fly DCAM Exon 4.12 3250700 Dscam:84 3249574 19855 ATTTTTGCACAATTAAAAGTAACACAAAATGAAAAATGATTACCAGCCATGTGGCT> RC5|5|CFTR 3249386 20001 TATCAAAATATCAG>local|i|DNA Fly DCAM Exon 5 3249373 Dscam:83 3248960 20486 TTTGTATCTTTTGGAGTTTTCTCATCTACAGCTCAAATAGAATAGATACAAATCAAGTATTAAAATACATATT> RC5|5|CFTR 3248760 20675 AATTTAAAACTTATCATATTTCAAATATTTTTGAACACATAAATTTAATGTCAAATTGTTTG> RC5|5|CFTR 3248545 20904 TTTACAAATATAAATATATATATAATTCAATATAAATATTGAAATATCAAAAATGTAAATATTTAAAATGATATTT> RC5|5|CFTR 3248155 Dscam:82 3247920 Dscam:81 3247711 Dscam:80 3247513 Dscam:79 3247296 Dscam:78 3247071 Dscam:77 3246851 Dscam:76 3246645 Dscam:75 3246436 Dscam:74 3246233 Dscam:73 3245845 Dscam:72 3245421 Dscam:71 3245220 Dscam:70 3245029 Dscam:69 3244602 Dscam:68 3244374 Dscam:67 3244156 Dscam:66 3243946 Dscam:65 3243736 Dscam:64 3243530 Dscam:63 3243315 Dscam:62 3242920 Dscam:61 3242716 Dscam:60 3242511 Dscam:59 3242315 Dscam:58 3242055 27370 ATAGAATACGTACGGCTGGGTGAAATCGTTTCTATAATGTGTCCTGCGCAGG> RC5|5|CFTR 3241906 Dscam:57 3241442 Dscam:56 3241198 Dscam:55 3240871 Dscam:54 3240528 Dscam:53 3239545 Dscam:52 3239328 Dscam:51 3238953 30482 ATATTTATGATACGGGAATGTTAGATTTGATATTCAAATATACTCCACTTCTTTATGTTAAA> RC5|5|CFTR 3238803 Dscam:50 3238210 Dscam:49 3238003 Dscam:48 3237466 31973 CTACAACATCAATAAGTCCCATAAGAAGCATATTGTTATTACTTTTGTAGAGCCAGTTGGCGCCAA> RC5|5|CFTR 3237417 Dscam:47 3237019 Dscam:46 3236481 Dscam:45 3235516 Dscam:44 3235203 Dscam:43 3234956 34477 CGTGTGTGGCCAGGAATGCGGCCGGGGTCATCTACCACACGGCAGAGCTGCGCGTTAACG> RC5|5|CFTR 3234817 34627 CCTCGCCCTCCTCCGCAGTTCTGCCCCAGATCGTGCCCTTCGATTTTGGCGAGGAGACCGTCAACGAGTTG> RC5|5|CFTR 3234800 Dscam:42 3234435 Dscam:41 3234062 Dscam:40 3233672 Dscam:39 3233281 Dscam:38 3233199 36235 TCAAGGGGGACCTGCCCTTGAGAATCCACTGGACCTTGAATGGTGAGCCTGTGGCAACAGG> RC5|5|CFTR 3232857 Dscam:37 3232742 36707 CACTAAACTCGGCTCTCATTGTAAACGGTGAAATGGGATTCACGTTAGTGCGGCTGAATAAGCGAACCAGTTCGCT> RC5|5|CFTR 3232460 Dscam:36 3232075 Dscam:35 3231673 37750 ATATGATATTTGTGCTGAATGTCATATAAATCAGAAAAATTAGGTGTAAT> RC5|5|CFTR 3231128 Dscam:34 3230754 Dscam:33 3230387 Dscam:32 3229897 Dscam:31 3229501 Dscam:30 3229124 Dscam:29 3228738 Dscam:28 3228338 Dscam:27 3227948 Dscam:26 3227576 Dscam:25 3227196 Dscam:24 3226762 42662 AGTCTCTGTGACTTGTTTGATATCCAGTGGAGACTTACCCATCGATATCGA> RC5|5|CFTR 3226434 Dscam:23 3226043 Dscam:22 3225665 Dscam:21 3225287 Dscam:20 3225060 44371 TAGTTGCCGGGCAAAGAACTACGCAGCAGCCGTCAACTACAGCACTGAACTCATAGTT> RC5|5|CFTR 3224228 45215 CCCGTGGACATCACCTGGTTGTTCAATGACTATGCCATCAACGAGTATCACGGGGTCACCTCTTCCAAGA> RC5|5|CFTR 3223509 Dscam:19 3222724 Dscam:18 3222172 Dscam:17 3219886 Dscam:16 3219708 Dscam:15 3219235 50201 TCCGGAGATGCCATATGCTTTGAAGGTACTCGACAAATCCGGACGTTCCGTGCAGCTGAGCTG> RC5|5|CFTR 3218320 Dscam:14 3218106 Dscam:13 3217357 Dscam:12 3217195 Dscam:11 3217178 52257 GCTTCTGACATTTTGAACACCCGGACCAAGGGACAGAAGCCCAAGCTGCCCGAGAAACCTCG> RC5|5|CFTR 3216961 Dscam:10 3216459 52976 AACAAATTGCACAGTATATAAAATTATATTATTCCTATTTTTTGTTGTTCAAACCAAGCTTG> RC5|5|CFTR 3216293 53130 AAAATCATTAGTGTAAAATAATAATGATTTTTCTTACGTAAATGCAATTT> RC5|5|CFTR 3216028 53416 TTTTGTTCAGTTTTTCAGCTCACGTAAGGTTAAAAAAAAAAAAACAAAAGTAGAGCTTTCTTAAATTTTAA> RC5|5|CFTR 3214571 54860 CGAAAACGACTACATATCGACAAGTTAACCTTTGAATTTTTCGCCTGCCACAGTCTGT> RC5|5|CFTR 3214290 Dscam:9 3213870 Dscam:8 3213504 55919 TATTATCCTTTCATTTACAAAGATAATATTTTGCATCCAATTAACTAATT> RC5|5|CFTR 3212243 Dscam:7 3211474 Dscam:6 3211209 58231 GGCTTAATATGTCTGGATTAGCTAGTCTATAATCTATGTTAAGCCATACTGCCTCTACTCTTTGAGT> RC5|5|CFTR 3210838 Dscam:5 3210462 Dscam:4 3210224 Dscam:3 3209155 Dscam:2 3208270 Dscam:1 References ========== [1] Graveley 2004 , http://www.rnajournal.org/cgi/reprint/10/10/1499 [2] Celotto and Graveley, 2001 http://www.genetics.org/cgi/reprint/159/2/599.pdf [3] http://flybase.bio.indiana.edu/reports/FBgn0033159.html [4] Buratti 2007, http://nar.oxfordjournals.org/cgi/reprint/35/13/4369 [5] Kreahling and Graveley 2005 http://mcb.asm.org/cgi/content/full/25/23/10251 [6] Olson... Graveley 2007 http://www.nature.com/nsmb/journal/v14/n12/full/nsmb1339.html [7] ref 5 in [6], Schmucker etl al 2000 http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WSN-4194S59-F&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=d159aee1d55f9b955b8a9dc96344a5f4 [8] Anastassiou 2006 http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1431710&blobtype=pdf Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. _________________________________________________________________ Helping your favorite cause is as easy as instant messaging.?You IM, we give. http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join From jeff at bioinformatics.org Wed Feb 6 21:31:32 2008 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Wed, 06 Feb 2008 21:31:32 -0500 Subject: [BiO BB] Courses: Gene Expression Analysis and Biostatistics Message-ID: <47AA6D84.6050101@bioinformatics.org> Greetings, The following courses are being offered at Bioinformatics.Org this month: Gene Expression Analysis; Feb 18-22, 2008 This course helps to demystify Affymetrix analysis so that any researcher can take the basic steps to go from a chip image to a list of genes that are up- or down-regulated in an experiment. Various tools will be covered, e.g. GCOS, Excel, MATLAB, and free tools like R and Dchip. It is geared towards researchers who conduct microarray experiments to study genome-wide expression changes and understand the underlying mechanisms of gene regulation in samples of interest. Most scientists are not able to analyze the resulting data themselves. They are not able to get desired results using traditional tools like Microsoft Word and Excel, or with advanced software provided by commercial vendors. The freeware solutions come either with a steep learning curve or as black-box interfaces that provide limited functionality with little or no technical support. In the midst of all this is the fundamental lack of understanding among scientists on how the technology works and what the fundam ental parts of the analysis are. FOR MORE INFORMATION: http://wiki.bioinformatics.org/BI201A_Gene_Expression_Analysis Biostatistics: Distributions, Tests and Graphics; Feb 25-29, 2008 The various statistical distributions covered will help you know when assumptions can be made about a normal distribution and how to test whether or not these assumptions are true. Essential descriptive statistics are reviewed and then used in various situations to calculate background, noise, normalization and thresholding. Additionally, hypothesis testing is introduced so that you can assess groups of observations for a particular parameter and calculate whether or not the difference between groups is significant. Data visualization using various graphs will also be reviewed. Armed with these techniques, you will be able to better deal with the challenges of data analysis. Plus, you'll be able to understand and interpret data at a more fundamental level and draw the correct conclusions about them. FOR MORE INFORMATION: http://wiki.bioinformatics.org/MA101A_Distributions,_Tests_and_Graphics Cheers, Jeff -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- From aao at fe.up.pt Thu Feb 7 04:49:26 2008 From: aao at fe.up.pt (alexandra) Date: Thu, 7 Feb 2008 09:49:26 -0000 Subject: [BiO BB] First Announcement NN2008 Message-ID: <000b01c8696e$ba28bc50$a56aa8c0@ineb.fe.up.pt> Apologies for multiple copies. We appreciate if you can forward this Announcement to potential candidates. ============================================================= SUMMER SCHOOL NN2008 NEURAL NETWORKS in CLASSIFICATION, REGRESSION and DATA MINING July 7-11, 2008, Porto, Portugal ============================================================= http://www.nn.isep.ipp.pt email: nn-2008 at isep.ipp.pt GENERAL INFORMATION The Summer School will be held at Porto, Portugal, jointly organized by the Polytechnic School of Engineering of Porto (ISEP) and the Faculty of Engineering, Porto University (FEUP). Following last year experience, this year's edition also includes a POSTER/WORKSHOP SESSION providing a discussion forum where the participants can obtain peer guidance for their projects. PROGRAMME COMMITTEE * Alexander Zien (Research Scientist at the Friedrich Miescher Laboratory, Germany) * Carlos Soares (Assistant Professor, Faculty of Economy, University of Porto, Portugal) * Christopher Bishop (Deputy Managing Director at Microsoft Research Laboratory in Cambridge and Chair of Computer Science at the University of Edinburgh, UK) * Joaquim Marques de S? (Full Professor, Dept. Electr. and Comp. Engineering, Fac. of Engineering, University of Porto, Portugal) * Jorge Santos(Assistant Professor, Engineering Polythecnic Institute, Porto, Portugal) * Mark Embrecht (Associate Professor, Rensselaer Polytechnic Institute, RPI Troy, New York, U.S.A.) * Noelia S?nchez Maro?o (Assistant Professor, Coruna University, Spain) * Paulo Cortez (Assistant Professor, University of Minho, Portugal) * Petia Georgieva (Assistant Professor, University of Aveiro, Portugal) * Yann Guermeur ((Scientific Director of the Laboratoire Lorrain de Recherche en Informatique et ses Applications, France) COURSE CONTENTS Neural networks (NN) have become a very important tool in classification and regression tasks. The applications are nowadays abundant, e.g. in the engineering, economy and biology areas. The Summer School on NN is dedicated to explain relevant NN paradigms, namely multilayer perceptrons (MLP), radial basis function networks (RBF) and support vector machines (SVM) used for classification and regression tasks, illustrated with applications to real data. Specific topics are also presented, namely Multi-Valued and UB Neurons , Functional Networks , MLP's with Entropic Criteria and Data Mining using NN. Classes include practical sessions with appropriate software tools. The trainee has, therefore, the opportunity to apply the taught concepts and become conversant with a broad range of NN topics and applications. A special workshop session will provide a discussion forum where the participants can obtain peer guidance for their projects. PRELIMINARY PROGRAMME A preliminary programme and further information about the classes are available at the school webpage ( http://www.nn.isep.ipp.pt) IMPORTANT DEADLINES Early Registration: 18 May 2008 Poster Submission: 15 June 2008 Hotel booking : 15 June 2008 Summer School: 7-11 July 2008 All participants are required to register prior to the start of the School - until the June 15 - even if you choose to pay the late registration fee at the registration desk. Please note that only a LIMITED number of participants can be accepted. REGISTRATION In order to attend the School you must fill in the registration form, available at the School web page. Please note that if you have any guests who would like to take part in the social programme, you must register them as well, by filling in the corresponding field in the registration form. SCHOOL FEES The registration fee for participants amounts to: - Early registration fee (paid before the 18th of May) * 350 Euro (students, ISEP and FEUP staff) * 400 Euro (all other participants) - Late registration fee (paid after the 18th of May) * 400 Euro (students, ISEP and FEUP staff) * 450 Euro (all other participants) The registration fee includes: * school package (manuscripts, lecture's notes, CD) * coffee breaks * daily lunch * welcome reception * school banquet NOTE: The registration fee for those who attended previous editions amounts to 25/30 euro per lecture and includes the school package and coffee-breaks. Please, contact the LOC for further details. LOCAL ORGANIZING COMMITTEE (LOC) - Helena Br?s Silva - Assistant Professor, Dept. Mathematics, ISEP, Portugal - Jorge M. Santos - Assistant Professor, Dept. Mathematics, ISEP, Portugal - Rui Chibante - Assistant Professor, Dept. Mathematics, ISEP, Portugal CONTACT ADDRESS Local Organizing Committee (LOC) - Summer School NN2008 A/C Jorge M. Santos Departamento de Matem?tica Instituto Superior de Engenharia do Porto Rua Dr. Ant?nio Bernardino de Almeida 431 4200-072 PORTO / PORTUGAL Email: nn-2008 at isep.ipp.pt NN2008 Secretariat Ms. Gabriela Afonso Email: gafonso at fe.up.pt Programme Chair: Prof. Joaquim Marques de S? Tel. +351 225081828 - Email: jmsa at fe.up.pt ======================================== From isbra-l at engr.uconn.edu Thu Feb 7 22:42:51 2008 From: isbra-l at engr.uconn.edu (ISBRA Symposium Announcements) Date: Thu, 7 Feb 2008 22:42:51 -0500 (EST) Subject: [BiO BB] [ISBRA-L] ISBRA 2008 Call for Posters in Bioinformatics Message-ID: CALL FOR POSTERS IN BIOINFORMATICS ================================================ ISBRA 2008 International Symposium on Bioinformatics Research and Applications May 6-8, 2008 Georgia State University Atlanta, Georgia http://www.cs.gsu.edu/isbra08/ ================================================ The International Symposium on Bioinformatics Research and Applications (ISBRA) provides a forum for the exchange of ideas and results among researchers, developers, and practitioners working on all aspects of bioinformatics and computational biology and their applications. Authors are invited to submit posters that demonstrate original research in all areas of bioinformatics and computational biology, including the development of experimental or commercial systems. Topics of interest include but are not limited to: * Biomedical databases and data integration * Biomedical image processing * Bio-ontologies * Comparative genomics * Computational genetic epidemiology * Computational proteomics * Data mining and visualization * Gene expression analysis * Genome analysis * High-performance bio-computing * Molecular evolution and phylogenetics * Molecular modeling and simulation * Pattern discovery and classification * Population genetics * RNA and protein structure prediction * Sequence assembly * Software tools and applications * Systems biology SUBMISSION REQUIREMENTS Poster submission must be made electronically at: http://www.easychair.org/conferences/?conf=ISBRA08 Submissions must be formatted using the Springer LNCS style and must not exceed 4 pages. The accepted poster papers will be published on CD-ROM and the symposium website. Submission implies the willingness of at least one of the authors to register and present the poster at the symposium. One best poster award will be given at ISBRA08. IMPORTANT DATES Submission deadline March 14, 2007 Notification of acceptance March 21, 2008 Final Version Submission March 31, 2008 LOCATION ISBRA 2008 will be held at Georgia State University in Atlanta. Atlanta's major attractions--Centennial Olympic Park, Underground Atlanta, CNN Center, the World of Coca-Cola, and the Georgia Aquarium (the largest in the world)--can all be reached by a ten-minute walk from the GSU campus. GENERAL CHAIRS Dan Gusfield, University of California, Davis Yi Pan, Georgia State University PROGRAM CHAIRS Ion Mandoiu, University of Connecticut Raj Sunderraman, Georgia State University Alexander Zelikovsky, Georgia State University POSTER CHAIRS Gulsah Altun, Georgia State University Stefan Gremalschi, Georgia State University CONTACT INFORMATION Please direct questions to Ion Mandoiu (ion at engr.uconn.edu), Alexander Zelikovsky (alexz at cs.gsu.edu), or Raj Sunderraman (raj at cs.gsu.edu). CONFERENCE WEB SITE: http://www.cs.gsu.edu/isbra08/ _______________________________________________ ISBRA-L mailing list ISBRA-L at dna.engr.uconn.edu http://dna.engr.uconn.edu/mailman/listinfo/isbra-l From marchywka at hotmail.com Sun Feb 10 13:47:16 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Sun, 10 Feb 2008 13:47:16 -0500 Subject: [BiO BB] looking for reference on DSCAM exon locations. In-Reply-To: <2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com> References: <10f601c857c5$26607a90$0301a8c0@openhelia1076a> <2c8757af0801300652x814edco13c2e5940148067e@mail.gmail.com> Message-ID: As it turns out, to answer most of my earlier question, http://www.mail-archive.com/bbb at bioinformatics.org/msg00026.html the exon locations are reasonably well described at NCBI following the links contained here, http://genomebiology.com/2006/7/1/R2 ( or here, I can't get figures to render at above link, http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1431710&blobtype=pdf ) Variable window binding for mutually exclusive alternative splicing Dimitris Anastassiou , Hairuo Liu and Vinay Varadan Center for Computational Biology and Bioinformatics, and Department of Electrical Engineering, Columbia University, New York, NY 07670, USA author email corresponding author email Genome Biology 2006, 7:R2doi:10.1186/gb-2006-7-1-r2 "Because the Dscam gene of four out of the six Drosophila spp. had not previously been annotated [11], we first generated the missing annotations for all exons of cluster 6 using the existing annotations as benchmarks and ensuring that exons are located in open reading frames. The resulting annotated sequences for D. yakuba, D. ananassae, D. mojavensis and D. pseudoobscura have been deposited in GenBank under accession numbers DQ317106, DQ317107, DQ317108 and DQ317109, respectively. These can be accessed in addition to the previously available annotated sequences for D. melanogaster (accession number AF260530) and D. virilis (accession number AY686597)." for mealnogaster, this would be here http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=8072216 Not sure how I missed this earlier but, anyway DSCAM does seem like a good test case and example to follow for splicing literature. And I was able to verify that I can use my reverse-complement and rule code to find the patterns previously reported by the authors. I'm still trying to determine what, if any, significance there may be to the pattern I mentioned earlier. I did some surveys on random genome segments and it does come up pretty often but it doesn't seem to be completed excluded from DSCAM exon clusters. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. _________________________________________________________________ Climb to the top of the charts!?Play the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan From akunthavai at yahoo.co.in Mon Feb 11 04:17:56 2008 From: akunthavai at yahoo.co.in (A KUNTHAVAI) Date: Mon, 11 Feb 2008 09:17:56 +0000 (GMT) Subject: [BiO BB] Homological DNA sequences Message-ID: <152662.42187.qm@web8912.mail.in.yahoo.com> Sir, I want to know the list of homological rice gene sequence to give as an input to Blastn, Blastp , blast2sq program. Please provide me the answer as early as possible. A.Kunthavai Research Scholar Anna University --------------------------------- Why delete messages? Unlimited storage is just a click away. From rebekah.rogers at gmail.com Fri Feb 8 20:56:41 2008 From: rebekah.rogers at gmail.com (Rebekah Rogers) Date: Fri, 8 Feb 2008 20:56:41 -0500 Subject: [BiO BB] Inconsistent Blast Results In-Reply-To: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com> References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com> Message-ID: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> Hi: I'm currently running blast 2.2.14 locally on my mac. I've noticed that the printout from a blastn run at an E cutoff of 10^-10 reads differently than a blast run at an E cutoff of 10^-7 when hits worse than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of 10^-11 appear that weren't there before and even the relative strength of different hits can change. I'm not certain I understand why this is true and it has a huge impact on my results. I know that the Eval is dependent on certain constants taken from the compared sequences, but I don't understand how this could possibly change when I'm using the exact same input file and database. Does anyone have an explanation? -Rebekah From marty.gollery at gmail.com Mon Feb 11 12:28:40 2008 From: marty.gollery at gmail.com (Martin Gollery) Date: Mon, 11 Feb 2008 09:28:40 -0800 Subject: [BiO BB] Inconsistent Blast Results In-Reply-To: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com> <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> Message-ID: Hi Rebekah, I believe you are seeing differences because of scores getting thrown out at an earlier step. What I think is happening is that the hits are being cut off with the 10^-10 threshold that would have given better results in the alignment regeneration phase. Then when you run the search with the 10^-7 cutoff, those hits are allowed into the final step and they are extended to yield better scores. Best Regards, Marty On Feb 8, 2008 5:56 PM, Rebekah Rogers wrote: > Hi: > > I'm currently running blast 2.2.14 locally on my mac. I've noticed > that the printout from a blastn run at an E cutoff of 10^-10 reads > differently than a blast run at an E cutoff of 10^-7 when hits worse > than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of > 10^-11 appear that weren't there before and even the relative strength > of different hits can change. > > I'm not certain I understand why this is true and it has a huge impact > on my results. I know that the Eval is dependent on certain constants > taken from the compared sequences, but I don't understand how this > could possibly change when I'm using the exact same input file and > database. > > Does anyone have an explanation? > > -Rebekah > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist TimeLogic- a Division of Active Motif 775-833-9113 880 Northwood Blvd. Suite 7 Incline Village, NV 89451 From aey1531 at comcast.net Mon Feb 11 10:50:32 2008 From: aey1531 at comcast.net (aey1531) Date: Mon, 11 Feb 2008 10:50:32 -0500 Subject: [BiO BB] Inconsistent Blast Results In-Reply-To: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com> <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> Message-ID: <00d501c86cc5$d6b45a00$841d0e00$@net> Can you remove me from your email list thanks -----Original Message----- From: bbb-bounces at bioinformatics.org [mailto:bbb-bounces at bioinformatics.org] On Behalf Of Rebekah Rogers Sent: Friday, February 08, 2008 8:57 PM To: bbb at bioinformatics.org Subject: [BiO BB] Inconsistent Blast Results Hi: I'm currently running blast 2.2.14 locally on my mac. I've noticed that the printout from a blastn run at an E cutoff of 10^-10 reads differently than a blast run at an E cutoff of 10^-7 when hits worse than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of 10^-11 appear that weren't there before and even the relative strength of different hits can change. I'm not certain I understand why this is true and it has a huge impact on my results. I know that the Eval is dependent on certain constants taken from the compared sequences, but I don't understand how this could possibly change when I'm using the exact same input file and database. Does anyone have an explanation? -Rebekah _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb From rzimmer at MPLNet.com Mon Feb 11 10:58:59 2008 From: rzimmer at MPLNet.com (Rob Zimmer) Date: Mon, 11 Feb 2008 10:58:59 -0500 Subject: [BiO BB] Inconsistent Blast Results References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com> <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> Message-ID: <73ACA48AF9871543A87B5CF26C311B3ED159EE@MPLNMail.mplnet.com> Please note that Apocom Genomics provides the GrailEXP software which incorporates BLAST and specific exon, Cpg island ID functions (as well as many other features). Anyone interested in learning more, should e-mail me back. Robin Zimmer -----Original Message----- From: bbb-bounces at bioinformatics.org [mailto:bbb-bounces at bioinformatics.org] On Behalf Of Rebekah Rogers Sent: Friday, February 08, 2008 8:57 PM To: bbb at bioinformatics.org Subject: [BiO BB] Inconsistent Blast Results Hi: I'm currently running blast 2.2.14 locally on my mac. I've noticed that the printout from a blastn run at an E cutoff of 10^-10 reads differently than a blast run at an E cutoff of 10^-7 when hits worse than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of 10^-11 appear that weren't there before and even the relative strength of different hits can change. I'm not certain I understand why this is true and it has a huge impact on my results. I know that the Eval is dependent on certain constants taken from the compared sequences, but I don't understand how this could possibly change when I'm using the exact same input file and database. Does anyone have an explanation? -Rebekah _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb From mleczny at gmail.com Mon Feb 11 11:00:10 2008 From: mleczny at gmail.com (Paco B C) Date: Mon, 11 Feb 2008 17:00:10 +0100 Subject: [BiO BB] Ensembl and Gene Ontology terms Message-ID: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> Hi! this is my first message in this list. My name is Paco and I'm doing my PhD. on Bioinformatics in University of Leuven, Belgium. I would like to build a java module that, given a list of Ensembl Gene Identifiers, it would give back their related Gene Ontology terms. I've accessed the GO database, but I can't find ENSG terms and I've read in the Ensembl website that they give the link to external databases for translation and transcript objects but not for genes (maybe in the future, but not now). My question is, do you know which database could I query in order to get this relation within Ensembl and GO terms? Thanks! Paco From delete at elfdata.com Mon Feb 11 11:21:23 2008 From: delete at elfdata.com (Theodore H. Smith) Date: Mon, 11 Feb 2008 16:21:23 +0000 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention Message-ID: Hi everyone, So I've been working, on and off, on this algorithm for quite a while now. It's basically an invention of mine. It is a "blast-like" algorithm, in that it does "Fuzzy lookup" operations across a database of letters. I am designing this algorithm to be useful for bio- informatics, this is the main field I am initially targetting. The database will be filled with protein sequences, and the search across the database will be another protein sequence. The algorithm has a "scoring matrix", which can accept different protein replacement scores. The cost of inserting letters (protein letters) can be configured also. In this sense, it's no different to Smith-Waterman. The same input, the same output! The real difference from Smith-Waterman, is it's speed. My algorithm will be hugely faster. This is because I use many techniques to avoid processing unnecessary parts of the Smith-Waterman matrix. I also use many tricks to reuse computations across various proteins. For example, the matrix for protein "ABCDE", is identical, at first anyhow, for the matrix for "ABCDEFG". This means if I have both proteins "ABCDE", and "ABCDEFG" in my protein database, I can test both of them against the search query, in almost half the time. My algorithm also runs in logarithmic-time with respect to the size of the database. Basically, bigger databases run disproportionately faster. I want to turn this algorithm, into something useful for people. My first challenge here, is to answer the question "is this algorithm faster, or better than BLAST". If it is not faster, my algorithm basically has little use. But I have good hopes it will be faster! I am very good with these sort of things, you see :) Speed is my strong- point. Currently, I do not know about the speed, because I haven't implemented a C++ version of my algorithm, or a good speed testing framework. I do however know that my algorithm is more accurate than BLAST, because it is just as accurate as SSEARCH, as mine uses the Smith- Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- work basically. A fine heuristic, but still a heuristic. Mine is methodological, not heuristic based. So here is what I am looking for! I am hoping, that someone in the field will be able to offer me guidance, interest, enthusiasm, suggestions and maybe even do some testing for me. Perhaps a student doing a bio-informatics related degree, who would like to write a paper on an alternative way of processing protein databases. My invention could be an interesting subject for a paper. Or perhaps a researcher who just has an interest in these sort of things! Perhaps a researcher who feels there must be a better way of doing these things. Or anyone really in this field with the time and interest, and feels helping me could help him (or her) too in some way. I'd like someone I can ask a lot of questions to, and show my software to, and explain my hopes what I can achieve with it. Basically, my first question to you, would be "how would I set this up to be useful for someone", and "how would I test it's usefulness, what would you need to know about my algorithm that you would decide to use it over blast" It's sort of a vague question from me, like "what do you need me to do", but... well that's where I am right now. Sort of a bit on the outside hoping someone on the inside will show me something. So it's an opportunity to tell me what you want, basically!! Tell me, and I might just make it. Who knows? Maybe one day in a few years time, everyone will be using this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You might be part of something. Thanks to anyone who replies! -- http://elfdata.com/plugin/ "String processing, done right" From marchywka at hotmail.com Mon Feb 11 12:51:58 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Mon, 11 Feb 2008 12:51:58 -0500 Subject: [BiO BB] Inconsistent Blast Results In-Reply-To: References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com> <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> Message-ID: >> than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of >> 10^-11 appear that weren't there before and even the relative strength >> of different hits can change. >> I think someone else suggested using the score not the e-value. I'd seen cases using a blast server where I got confusing results so I just got in the habit of asking for a lot of marginal hits and then sort them out locally with text scripts. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. > Date: Mon, 11 Feb 2008 09:28:40 -0800 > From: marty.gollery at gmail.com > To: bbb at bioinformatics.org > Subject: Re: [BiO BB] Inconsistent Blast Results > > Hi Rebekah, > I believe you are seeing differences because of scores getting thrown > out at an earlier step. What I think is happening is that the hits are > being cut off with the 10^-10 threshold that would have given better > results in the alignment regeneration phase. Then when you run the > search with the 10^-7 cutoff, those hits are allowed into the final > step and they are extended to yield better scores. > > Best Regards, > Marty > > On Feb 8, 2008 5:56 PM, Rebekah Rogers wrote: >> Hi: >> >> I'm currently running blast 2.2.14 locally on my mac. I've noticed >> that the printout from a blastn run at an E cutoff of 10^-10 reads >> differently than a blast run at an E cutoff of 10^-7 when hits worse >> than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of >> 10^-11 appear that weren't there before and even the relative strength >> of different hits can change. >> >> I'm not certain I understand why this is true and it has a huge impact >> on my results. I know that the Eval is dependent on certain constants >> taken from the compared sequences, but I don't understand how this >> could possibly change when I'm using the exact same input file and >> database. >> >> Does anyone have an explanation? >> >> -Rebekah >> >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > > > > -- > -- > Martin Gollery > Senior Bioinformatics Scientist > TimeLogic- a Division of Active Motif > 775-833-9113 > 880 Northwood Blvd. Suite 7 > Incline Village, NV 89451 > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Helping your favorite cause is as easy as instant messaging.?You IM, we give. http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join From golharam at umdnj.edu Mon Feb 11 17:28:18 2008 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 11 Feb 2008 17:28:18 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: <47B0CC02.8010206@umdnj.edu> Why don't you write up a paper describing the algorithm in detail and submit it to a bioinformatics journal? And, why not make the executable available with documentation so that people can download it and try it out for themselves. Do you have any test cases that show it runs faster/better than BLAST? Describe them and make them available. Theodore H. Smith wrote: > Hi everyone, > > So I've been working, on and off, on this algorithm for quite a while > now. It's basically an invention of mine. It is a "blast-like" > algorithm, in that it does "Fuzzy lookup" operations across a database > of letters. I am designing this algorithm to be useful for bio- > informatics, this is the main field I am initially targetting. > > The database will be filled with protein sequences, and the search > across the database will be another protein sequence. The algorithm > has a "scoring matrix", which can accept different protein replacement > scores. The cost of inserting letters (protein letters) can be > configured also. > > In this sense, it's no different to Smith-Waterman. The same input, > the same output! > > The real difference from Smith-Waterman, is it's speed. My algorithm > will be hugely faster. This is because I use many techniques to avoid > processing unnecessary parts of the Smith-Waterman matrix. > > I also use many tricks to reuse computations across various proteins. > For example, the matrix for protein "ABCDE", is identical, at first > anyhow, for the matrix for "ABCDEFG". This means if I have both > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test > both of them against the search query, in almost half the time. My > algorithm also runs in logarithmic-time with respect to the size of > the database. Basically, bigger databases run disproportionately faster. > > I want to turn this algorithm, into something useful for people. My > first challenge here, is to answer the question "is this algorithm > faster, or better than BLAST". If it is not faster, my algorithm > basically has little use. But I have good hopes it will be faster! I > am very good with these sort of things, you see :) Speed is my strong- > point. > > Currently, I do not know about the speed, because I haven't > implemented a C++ version of my algorithm, or a good speed testing > framework. > > I do however know that my algorithm is more accurate than BLAST, > because it is just as accurate as SSEARCH, as mine uses the Smith- > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- > work basically. A fine heuristic, but still a heuristic. Mine is > methodological, not heuristic based. > > So here is what I am looking for! > > I am hoping, that someone in the field will be able to offer me > guidance, interest, enthusiasm, suggestions and maybe even do some > testing for me. > > Perhaps a student doing a bio-informatics related degree, who would > like to write a paper on an alternative way of processing protein > databases. My invention could be an interesting subject for a paper. > > Or perhaps a researcher who just has an interest in these sort of > things! Perhaps a researcher who feels there must be a better way of > doing these things. Or anyone really in this field with the time and > interest, and feels helping me could help him (or her) too in some way. > > I'd like someone I can ask a lot of questions to, and show my software > to, and explain my hopes what I can achieve with it. > > Basically, my first question to you, would be "how would I set this up > to be useful for someone", and "how would I test it's usefulness, what > would you need to know about my algorithm that you would decide to use > it over blast" > > It's sort of a vague question from me, like "what do you need me to > do", but... well that's where I am right now. Sort of a bit on the > outside hoping someone on the inside will show me something. > > So it's an opportunity to tell me what you want, basically!! Tell me, > and I might just make it. > > Who knows? Maybe one day in a few years time, everyone will be using > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You > might be part of something. > > Thanks to anyone who replies! > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > > From marty.gollery at gmail.com Mon Feb 11 17:49:10 2008 From: marty.gollery at gmail.com (Martin Gollery) Date: Mon, 11 Feb 2008 14:49:10 -0800 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: On Feb 11, 2008 8:21 AM, Theodore H. Smith wrote: > > Hi everyone, > > So I've been working, on and off, on this algorithm for quite a while > now. It's basically an invention of mine. It is a "blast-like" > algorithm, in that it does "Fuzzy lookup" operations across a database > of letters. I am designing this algorithm to be useful for bio- > informatics, this is the main field I am initially targetting. > > The database will be filled with protein sequences, and the search > across the database will be another protein sequence. The algorithm > has a "scoring matrix", which can accept different protein replacement > scores. The cost of inserting letters (protein letters) can be > configured also. > > In this sense, it's no different to Smith-Waterman. The same input, > the same output! > > The real difference from Smith-Waterman, is it's speed. My algorithm > will be hugely faster. This is because I use many techniques to avoid > processing unnecessary parts of the Smith-Waterman matrix. > > I also use many tricks to reuse computations across various proteins. > For example, the matrix for protein "ABCDE", is identical, at first > anyhow, for the matrix for "ABCDEFG". This means if I have both > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test > both of them against the search query, in almost half the time. My > algorithm also runs in logarithmic-time with respect to the size of > the database. Basically, bigger databases run disproportionately faster. > > I want to turn this algorithm, into something useful for people. My > first challenge here, is to answer the question "is this algorithm > faster, or better than BLAST". If it is not faster, my algorithm > basically has little use. But I have good hopes it will be faster! I > am very good with these sort of things, you see :) Speed is my strong- > point. > > Currently, I do not know about the speed, because I haven't > implemented a C++ version of my algorithm, or a good speed testing > framework. > > I do however know that my algorithm is more accurate than BLAST, > because it is just as accurate as SSEARCH, as mine uses the Smith- > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- > work basically. A fine heuristic, but still a heuristic. Mine is > methodological, not heuristic based. > > So here is what I am looking for! > > I am hoping, that someone in the field will be able to offer me > guidance, interest, enthusiasm, suggestions and maybe even do some > testing for me. > > Perhaps a student doing a bio-informatics related degree, who would > like to write a paper on an alternative way of processing protein > databases. My invention could be an interesting subject for a paper. > > Or perhaps a researcher who just has an interest in these sort of > things! Perhaps a researcher who feels there must be a better way of > doing these things. Or anyone really in this field with the time and > interest, and feels helping me could help him (or her) too in some way. > > I'd like someone I can ask a lot of questions to, and show my software > to, and explain my hopes what I can achieve with it. > > Basically, my first question to you, would be "how would I set this up > to be useful for someone", and "how would I test it's usefulness, what > would you need to know about my algorithm that you would decide to use > it over blast" > > It's sort of a vague question from me, like "what do you need me to > do", but... well that's where I am right now. Sort of a bit on the > outside hoping someone on the inside will show me something. > > So it's an opportunity to tell me what you want, basically!! Tell me, > and I might just make it. > > Who knows? Maybe one day in a few years time, everyone will be using > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You > might be part of something. > > Thanks to anyone who replies! > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist TimeLogic- a Division of Active Motif 775-833-9113 880 Northwood Blvd. Suite 7 Incline Village, NV 89451 From marty.gollery at gmail.com Mon Feb 11 17:51:15 2008 From: marty.gollery at gmail.com (Martin Gollery) Date: Mon, 11 Feb 2008 14:51:15 -0800 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: The first step is to implement it in C++ to see how fast it is. Once you have an executable, testing it will be relatively straightforward. Marty On Feb 11, 2008 8:21 AM, Theodore H. Smith wrote: > > Hi everyone, > > So I've been working, on and off, on this algorithm for quite a while > now. It's basically an invention of mine. It is a "blast-like" > algorithm, in that it does "Fuzzy lookup" operations across a database > of letters. I am designing this algorithm to be useful for bio- > informatics, this is the main field I am initially targetting. > > The database will be filled with protein sequences, and the search > across the database will be another protein sequence. The algorithm > has a "scoring matrix", which can accept different protein replacement > scores. The cost of inserting letters (protein letters) can be > configured also. > > In this sense, it's no different to Smith-Waterman. The same input, > the same output! > > The real difference from Smith-Waterman, is it's speed. My algorithm > will be hugely faster. This is because I use many techniques to avoid > processing unnecessary parts of the Smith-Waterman matrix. > > I also use many tricks to reuse computations across various proteins. > For example, the matrix for protein "ABCDE", is identical, at first > anyhow, for the matrix for "ABCDEFG". This means if I have both > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test > both of them against the search query, in almost half the time. My > algorithm also runs in logarithmic-time with respect to the size of > the database. Basically, bigger databases run disproportionately faster. > > I want to turn this algorithm, into something useful for people. My > first challenge here, is to answer the question "is this algorithm > faster, or better than BLAST". If it is not faster, my algorithm > basically has little use. But I have good hopes it will be faster! I > am very good with these sort of things, you see :) Speed is my strong- > point. > > Currently, I do not know about the speed, because I haven't > implemented a C++ version of my algorithm, or a good speed testing > framework. > > I do however know that my algorithm is more accurate than BLAST, > because it is just as accurate as SSEARCH, as mine uses the Smith- > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- > work basically. A fine heuristic, but still a heuristic. Mine is > methodological, not heuristic based. > > So here is what I am looking for! > > I am hoping, that someone in the field will be able to offer me > guidance, interest, enthusiasm, suggestions and maybe even do some > testing for me. > > Perhaps a student doing a bio-informatics related degree, who would > like to write a paper on an alternative way of processing protein > databases. My invention could be an interesting subject for a paper. > > Or perhaps a researcher who just has an interest in these sort of > things! Perhaps a researcher who feels there must be a better way of > doing these things. Or anyone really in this field with the time and > interest, and feels helping me could help him (or her) too in some way. > > I'd like someone I can ask a lot of questions to, and show my software > to, and explain my hopes what I can achieve with it. > > Basically, my first question to you, would be "how would I set this up > to be useful for someone", and "how would I test it's usefulness, what > would you need to know about my algorithm that you would decide to use > it over blast" > > It's sort of a vague question from me, like "what do you need me to > do", but... well that's where I am right now. Sort of a bit on the > outside hoping someone on the inside will show me something. > > So it's an opportunity to tell me what you want, basically!! Tell me, > and I might just make it. > > Who knows? Maybe one day in a few years time, everyone will be using > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You > might be part of something. > > Thanks to anyone who replies! > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- -- Martin Gollery Senior Bioinformatics Scientist TimeLogic- a Division of Active Motif 775-833-9113 880 Northwood Blvd. Suite 7 Incline Village, NV 89451 From akunthavai at yahoo.co.in Mon Feb 11 22:28:45 2008 From: akunthavai at yahoo.co.in (A KUNTHAVAI) Date: Tue, 12 Feb 2008 03:28:45 +0000 (GMT) Subject: [BiO BB] Homological DNA sequences Message-ID: <28421.18270.qm@web8904.mail.in.yahoo.com> Sir, I want to know the list of homological rice gene sequence to give as an input to Blastn, Blastp , blast2sq program. Please provide me the answer as early as possible. A.Kunthavai Research Scholar Anna University --------------------------------- Did you know? You can CHAT without downloading messenger. Click here From marchywka at hotmail.com Tue Feb 12 08:42:35 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 12 Feb 2008 08:42:35 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: >> I also use many tricks to reuse computations across various proteins. >> For example, the matrix for protein "ABCDE", is identical, at first Have you gotten any blast source code? This would be a good thing to start with for a number of reasons. But, don't assume that a given implementation is either well optimized of naive. Sure, they could have code like get_parameters(); metric=do_expensive_metric_thing(); if ( metric _________________________________________________________________ Helping your favorite cause is as easy as instant messaging.?You IM, we give. http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join From aalibes at gmail.com Tue Feb 12 10:45:00 2008 From: aalibes at gmail.com (=?ISO-8859-1?Q?Andreu_Alib=E9s?=) Date: Tue, 12 Feb 2008 16:45:00 +0100 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: <885c6c040802120745g7c55440ai8275e132d3932da2@mail.gmail.com> Why not making the code available to everybody in an Open Source repository like sourceforge? A On Feb 11, 2008 5:21 PM, Theodore H. Smith wrote: > > Hi everyone, > > So I've been working, on and off, on this algorithm for quite a while > now. It's basically an invention of mine. It is a "blast-like" > algorithm, in that it does "Fuzzy lookup" operations across a database > of letters. I am designing this algorithm to be useful for bio- > informatics, this is the main field I am initially targetting. > > The database will be filled with protein sequences, and the search > across the database will be another protein sequence. The algorithm > has a "scoring matrix", which can accept different protein replacement > scores. The cost of inserting letters (protein letters) can be > configured also. > > In this sense, it's no different to Smith-Waterman. The same input, > the same output! > > The real difference from Smith-Waterman, is it's speed. My algorithm > will be hugely faster. This is because I use many techniques to avoid > processing unnecessary parts of the Smith-Waterman matrix. > > I also use many tricks to reuse computations across various proteins. > For example, the matrix for protein "ABCDE", is identical, at first > anyhow, for the matrix for "ABCDEFG". This means if I have both > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test > both of them against the search query, in almost half the time. My > algorithm also runs in logarithmic-time with respect to the size of > the database. Basically, bigger databases run disproportionately faster. > > I want to turn this algorithm, into something useful for people. My > first challenge here, is to answer the question "is this algorithm > faster, or better than BLAST". If it is not faster, my algorithm > basically has little use. But I have good hopes it will be faster! I > am very good with these sort of things, you see :) Speed is my strong- > point. > > Currently, I do not know about the speed, because I haven't > implemented a C++ version of my algorithm, or a good speed testing > framework. > > I do however know that my algorithm is more accurate than BLAST, > because it is just as accurate as SSEARCH, as mine uses the Smith- > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- > work basically. A fine heuristic, but still a heuristic. Mine is > methodological, not heuristic based. > > So here is what I am looking for! > > I am hoping, that someone in the field will be able to offer me > guidance, interest, enthusiasm, suggestions and maybe even do some > testing for me. > > Perhaps a student doing a bio-informatics related degree, who would > like to write a paper on an alternative way of processing protein > databases. My invention could be an interesting subject for a paper. > > Or perhaps a researcher who just has an interest in these sort of > things! Perhaps a researcher who feels there must be a better way of > doing these things. Or anyone really in this field with the time and > interest, and feels helping me could help him (or her) too in some way. > > I'd like someone I can ask a lot of questions to, and show my software > to, and explain my hopes what I can achieve with it. > > Basically, my first question to you, would be "how would I set this up > to be useful for someone", and "how would I test it's usefulness, what > would you need to know about my algorithm that you would decide to use > it over blast" > > It's sort of a vague question from me, like "what do you need me to > do", but... well that's where I am right now. Sort of a bit on the > outside hoping someone on the inside will show me something. > > So it's an opportunity to tell me what you want, basically!! Tell me, > and I might just make it. > > Who knows? Maybe one day in a few years time, everyone will be using > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You > might be part of something. > > Thanks to anyone who replies! > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Andreu Alib?s, PhD Systems Biology Program - Center for Genomic Regulation c/ Dr. Aiguader 88, 08003 Barcelona, Spain Phone: +34 93 316 0258 http://aalibes.googlepages.com/ From bsmagic at gmail.com Mon Feb 11 21:45:33 2008 From: bsmagic at gmail.com (Sheng Wang) Date: Tue, 12 Feb 2008 10:45:33 +0800 Subject: [BiO BB] Homological DNA sequences In-Reply-To: <152662.42187.qm@web8912.mail.in.yahoo.com> References: <152662.42187.qm@web8912.mail.in.yahoo.com> Message-ID: <793f8aed0802111845pc48d59bi5753281f37927c00@mail.gmail.com> homology to what? On 2/11/08, A KUNTHAVAI wrote: > > Sir, > I want to know the list of homological rice gene sequence to give > as an input to Blastn, Blastp , blast2sq program. Please provide me the > answer as early as possible. > A.Kunthavai > Research Scholar > Anna University > > > > > --------------------------------- > Why delete messages? Unlimited storage is just a click away. > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Best Regards Sheng Wang From bsmagic at gmail.com Mon Feb 11 21:48:03 2008 From: bsmagic at gmail.com (Sheng Wang) Date: Tue, 12 Feb 2008 10:48:03 +0800 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: <793f8aed0802111848u1ce88078hff246eaffe83218d@mail.gmail.com> Maybe the BLAST package should be a software to which the user could develop 3rd-part addon. On 2/12/08, Martin Gollery wrote: > > The first step is to implement it in C++ to see how fast it is. Once > you have an executable, testing it will be relatively straightforward. > > Marty > > > On Feb 11, 2008 8:21 AM, Theodore H. Smith wrote: > > > > Hi everyone, > > > > So I've been working, on and off, on this algorithm for quite a while > > now. It's basically an invention of mine. It is a "blast-like" > > algorithm, in that it does "Fuzzy lookup" operations across a database > > of letters. I am designing this algorithm to be useful for bio- > > informatics, this is the main field I am initially targetting. > > > > The database will be filled with protein sequences, and the search > > across the database will be another protein sequence. The algorithm > > has a "scoring matrix", which can accept different protein replacement > > scores. The cost of inserting letters (protein letters) can be > > configured also. > > > > In this sense, it's no different to Smith-Waterman. The same input, > > the same output! > > > > The real difference from Smith-Waterman, is it's speed. My algorithm > > will be hugely faster. This is because I use many techniques to avoid > > processing unnecessary parts of the Smith-Waterman matrix. > > > > I also use many tricks to reuse computations across various proteins. > > For example, the matrix for protein "ABCDE", is identical, at first > > anyhow, for the matrix for "ABCDEFG". This means if I have both > > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test > > both of them against the search query, in almost half the time. My > > algorithm also runs in logarithmic-time with respect to the size of > > the database. Basically, bigger databases run disproportionately faster. > > > > I want to turn this algorithm, into something useful for people. My > > first challenge here, is to answer the question "is this algorithm > > faster, or better than BLAST". If it is not faster, my algorithm > > basically has little use. But I have good hopes it will be faster! I > > am very good with these sort of things, you see :) Speed is my strong- > > point. > > > > Currently, I do not know about the speed, because I haven't > > implemented a C++ version of my algorithm, or a good speed testing > > framework. > > > > I do however know that my algorithm is more accurate than BLAST, > > because it is just as accurate as SSEARCH, as mine uses the Smith- > > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- > > work basically. A fine heuristic, but still a heuristic. Mine is > > methodological, not heuristic based. > > > > So here is what I am looking for! > > > > I am hoping, that someone in the field will be able to offer me > > guidance, interest, enthusiasm, suggestions and maybe even do some > > testing for me. > > > > Perhaps a student doing a bio-informatics related degree, who would > > like to write a paper on an alternative way of processing protein > > databases. My invention could be an interesting subject for a paper. > > > > Or perhaps a researcher who just has an interest in these sort of > > things! Perhaps a researcher who feels there must be a better way of > > doing these things. Or anyone really in this field with the time and > > interest, and feels helping me could help him (or her) too in some way. > > > > I'd like someone I can ask a lot of questions to, and show my software > > to, and explain my hopes what I can achieve with it. > > > > Basically, my first question to you, would be "how would I set this up > > to be useful for someone", and "how would I test it's usefulness, what > > would you need to know about my algorithm that you would decide to use > > it over blast" > > > > It's sort of a vague question from me, like "what do you need me to > > do", but... well that's where I am right now. Sort of a bit on the > > outside hoping someone on the inside will show me something. > > > > So it's an opportunity to tell me what you want, basically!! Tell me, > > and I might just make it. > > > > Who knows? Maybe one day in a few years time, everyone will be using > > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You > > might be part of something. > > > > Thanks to anyone who replies! > > > > -- > > http://elfdata.com/plugin/ > > "String processing, done right" > > > > > > > > _______________________________________________ > > BBB mailing list > > BBB at bioinformatics.org > > http://www.bioinformatics.org/mailman/listinfo/bbb > > > > > > -- > -- > Martin Gollery > Senior Bioinformatics Scientist > TimeLogic- a Division of Active Motif > 775-833-9113 > 880 Northwood Blvd. Suite 7 > Incline Village, NV 89451 > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Best Regards Sheng Wang From bsmagic at gmail.com Mon Feb 11 21:50:14 2008 From: bsmagic at gmail.com (Sheng Wang) Date: Tue, 12 Feb 2008 10:50:14 +0800 Subject: [BiO BB] Ensembl and Gene Ontology terms In-Reply-To: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> Message-ID: <793f8aed0802111850j63b44dd9s865ea1514ad2e649@mail.gmail.com> it seemed that there's a similar software listed in GO official website. On 2/12/08, Paco B C wrote: > > Hi! > this is my first message in this list. My name is Paco and I'm doing my > PhD. > on Bioinformatics in University of Leuven, Belgium. > I would like to build a java module that, given a list of Ensembl Gene > Identifiers, it would give back their related Gene Ontology terms. I've > accessed the GO database, but I can't find ENSG terms and I've read in the > Ensembl website that they give the link to external databases for > translation and transcript objects but not for genes (maybe in the future, > but not now). > My question is, do you know which database could I query in order to get > this relation within Ensembl and GO terms? > Thanks! > Paco > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Best Regards Sheng Wang From alen295200 at gmail.com Tue Feb 12 06:49:20 2008 From: alen295200 at gmail.com (anil kumar) Date: Tue, 12 Feb 2008 03:49:20 -0800 Subject: [BiO BB] Homological DNA sequences In-Reply-To: <28421.18270.qm@web8904.mail.in.yahoo.com> References: <28421.18270.qm@web8904.mail.in.yahoo.com> Message-ID: <3d0047290802120349ic9f4e6emc4c8c8de3ce580b7@mail.gmail.com> hi, rice genome u can get from TIGR. your problem does not seem to me defined. any way u download the data and work. On 2/11/08, A KUNTHAVAI wrote: > > Sir, > I want to know the list of homological rice gene sequence to > give as an input to Blastn, Blastp , blast2sq program. Please provide > me the answer as early as possible. > A.Kunthavai > Research Scholar > Anna University > > > > > --------------------------------- > Did you know? You can CHAT without downloading messenger. Click here > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From dan.bolser at gmail.com Tue Feb 12 04:15:48 2008 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 12 Feb 2008 10:15:48 +0100 Subject: [BiO BB] Ensembl and Gene Ontology terms In-Reply-To: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> Message-ID: <2c8757af0802120115s3c64ce62hf421b9e442b8804e@mail.gmail.com> You could try SRS? Not sure if it has what you need... Else I think Uniprot has links to GO and Ensembl ... I think. At least Uniprot links to go, and SRS links Uniprot and Ensembl! You would think we had all this sorted out by now! ---- Talk to the experts; irc://irc.freenode.net/#bioinformatics On 11/02/2008, Paco B C wrote: > Hi! > this is my first message in this list. My name is Paco and I'm doing my PhD. > on Bioinformatics in University of Leuven, Belgium. > I would like to build a java module that, given a list of Ensembl Gene > Identifiers, it would give back their related Gene Ontology terms. I've > accessed the GO database, but I can't find ENSG terms and I've read in the > Ensembl website that they give the link to external databases for > translation and transcript objects but not for genes (maybe in the future, > but not now). > My question is, do you know which database could I query in order to get > this relation within Ensembl and GO terms? > Thanks! > Paco > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- hello From delete at elfdata.com Mon Feb 11 18:56:41 2008 From: delete at elfdata.com (Theodore H. Smith) Date: Mon, 11 Feb 2008 23:56:41 +0000 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: <47B0CC02.8010206@umdnj.edu> References: <47B0CC02.8010206@umdnj.edu> Message-ID: On 11 Feb 2008, at 22:28, Ryan Golhar wrote: > Why don't you write up a paper describing the algorithm in detail and > submit it to a bioinformatics journal? And, why not make the > executable > available with documentation so that people can download it and try it > out for themselves. > > Do you have any test cases that show it runs faster/better than BLAST? > Describe them and make them available. The first thing I'd need to do is make a good test. I'm not sure what constitutes "a good test", in this case. How big should the databanks be to make the test reasonable? Is randomly generated data good enough, or is a randomly selected sample better. If a sample is better, how large a dataset must I gather to do the test. Perhaps certain settings make my algorithm work better or worse relative to BLAST. But then how do I know which settings are more likely to be used and which aren't? I think someone who uses BLAST frequently, and knows it well from a user's perspective... might have a better feel for creating a test than I might. The worst thing that could happen is I make a test, which is unfairly prejudiced to my algorithm :) The next thing that would happen is people would see my test has "suspiciously good" results, and... be annoyed about that, and lose interest, even if it were an innocent mistake on my end. I'd rather avoid that sort of mistake by getting a knowledged eye in the designing of a test! Like I said, I haven't gotten all the code in C++ yet. I've got a framework in C++ already, I mean I know how to write C++. And I know what to do, as I've written it in a proto-typing language. The C++ version will come soon, though. > Theodore H. Smith wrote: >> Hi everyone, >> >> So I've been working, on and off, on this algorithm for quite a while >> now. It's basically an invention of mine. It is a "blast-like" >> algorithm, in that it does "Fuzzy lookup" operations across a >> database >> of letters. I am designing this algorithm to be useful for bio- >> informatics, this is the main field I am initially targetting. >> >> The database will be filled with protein sequences, and the search >> across the database will be another protein sequence. The algorithm >> has a "scoring matrix", which can accept different protein >> replacement >> scores. The cost of inserting letters (protein letters) can be >> configured also. >> >> In this sense, it's no different to Smith-Waterman. The same input, >> the same output! >> >> The real difference from Smith-Waterman, is it's speed. My algorithm >> will be hugely faster. This is because I use many techniques to avoid >> processing unnecessary parts of the Smith-Waterman matrix. >> >> I also use many tricks to reuse computations across various proteins. >> For example, the matrix for protein "ABCDE", is identical, at first >> anyhow, for the matrix for "ABCDEFG". This means if I have both >> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test >> both of them against the search query, in almost half the time. My >> algorithm also runs in logarithmic-time with respect to the size of >> the database. Basically, bigger databases run disproportionately >> faster. >> >> I want to turn this algorithm, into something useful for people. My >> first challenge here, is to answer the question "is this algorithm >> faster, or better than BLAST". If it is not faster, my algorithm >> basically has little use. But I have good hopes it will be faster! I >> am very good with these sort of things, you see :) Speed is my >> strong- >> point. >> >> Currently, I do not know about the speed, because I haven't >> implemented a C++ version of my algorithm, or a good speed testing >> framework. >> >> I do however know that my algorithm is more accurate than BLAST, >> because it is just as accurate as SSEARCH, as mine uses the Smith- >> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent >> guess- >> work basically. A fine heuristic, but still a heuristic. Mine is >> methodological, not heuristic based. >> >> So here is what I am looking for! >> >> I am hoping, that someone in the field will be able to offer me >> guidance, interest, enthusiasm, suggestions and maybe even do some >> testing for me. >> >> Perhaps a student doing a bio-informatics related degree, who would >> like to write a paper on an alternative way of processing protein >> databases. My invention could be an interesting subject for a paper. >> >> Or perhaps a researcher who just has an interest in these sort of >> things! Perhaps a researcher who feels there must be a better way of >> doing these things. Or anyone really in this field with the time and >> interest, and feels helping me could help him (or her) too in some >> way. >> >> I'd like someone I can ask a lot of questions to, and show my >> software >> to, and explain my hopes what I can achieve with it. >> >> Basically, my first question to you, would be "how would I set this >> up >> to be useful for someone", and "how would I test it's usefulness, >> what >> would you need to know about my algorithm that you would decide to >> use >> it over blast" >> >> It's sort of a vague question from me, like "what do you need me to >> do", but... well that's where I am right now. Sort of a bit on the >> outside hoping someone on the inside will show me something. >> >> So it's an opportunity to tell me what you want, basically!! Tell me, >> and I might just make it. >> >> Who knows? Maybe one day in a few years time, everyone will be using >> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You >> might be part of something. >> >> Thanks to anyone who replies! >> >> -- >> http://elfdata.com/plugin/ >> "String processing, done right" >> >> >> >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> >> > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb -- http://elfdata.com/plugin/ "String processing, done right" From dorjetarap at googlemail.com Tue Feb 12 09:30:18 2008 From: dorjetarap at googlemail.com (dorje tarap) Date: Tue, 12 Feb 2008 14:30:18 +0000 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: Hi Mike, "Faster than blast" or even "more accurate than blast" type algorithms have been around for some time now. Some interesting examples are patternhunter (commercial) http://www.bioinformaticssolutions.com/products/ph/ and YAST (opensource); Both claim to be significantly faster and more accurate than BLAST, unfortunately they are not as popular. I suspect this is for a few reasons: Blast has been around for a while, and has gained some confidence in the bioinformatics sector; The reporting of the statistical significance (e-values) is easy to interpret; And it has a large genomics database readily available. For an algorithm to replace blast, it would have to tick a lot of these boxes. Your approach seems pretty interesting as you mention it is not a heuristic algorithm, whereas the main approach recently seems to be using the "spaced-seeds" concept introduced in PatternHunter. Your approach sounds somewhat similar to the four-russians speedup and any way to speed up the dynamic programming algorithm, without sacrificing speed would benefit a number of feilds, not just bioinformatics. I guess, the first step would be to outline your algorithm into a draft paper to get a better understanding of your approach. HTH Karma On 12/02/2008, Mike Marchywka wrote: > > > > >> I also use many tricks to reuse computations across various proteins. > >> For example, the matrix for protein "ABCDE", is identical, at first > > Have you gotten any blast source code? This would be a good thing to > start with for a number of reasons. But, don't assume that a given > implementation > is either well optimized of naive. Sure, they could have > code like > > get_parameters(); > metric=do_expensive_metric_thing(); > if ( metric > _________________________________________________________________ > Helping your favorite cause is as easy as instant messaging. You IM, we > give. > http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From dwrice at indiana.edu Tue Feb 12 01:49:24 2008 From: dwrice at indiana.edu (Danny Rice) Date: Tue, 12 Feb 2008 01:49:24 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: <47B14174.1080109@indiana.edu> If you have a way to speed up the full S&W algorithm it would be interesting whether or not it is faster than BLAST. I would focus on showing it is significantly faster than the current implementations of the smith and waterman. Such an algorithm could be incorporated into BLAST or any other dynamic programming algorithm. You can test it, for example, by searching the swissprot database ftp://ftp.ncbi.nih.gov/blast/db/swissprot.tar.gz with a bunch of queries pulled from this database. It would seem, however, that you could calculate the time savings directly as a function of conditions. You shouldn't need any help with this. Just show it is significantly faster than the S&W, while searching the entire matrix, and you are golden. Theodore H. Smith wrote: > Hi everyone, > > So I've been working, on and off, on this algorithm for quite a while > now. It's basically an invention of mine. It is a "blast-like" > algorithm, in that it does "Fuzzy lookup" operations across a database > of letters. I am designing this algorithm to be useful for bio- > informatics, this is the main field I am initially targetting. > > The database will be filled with protein sequences, and the search > across the database will be another protein sequence. The algorithm > has a "scoring matrix", which can accept different protein replacement > scores. The cost of inserting letters (protein letters) can be > configured also. > > In this sense, it's no different to Smith-Waterman. The same input, > the same output! > > The real difference from Smith-Waterman, is it's speed. My algorithm > will be hugely faster. This is because I use many techniques to avoid > processing unnecessary parts of the Smith-Waterman matrix. > > I also use many tricks to reuse computations across various proteins. > For example, the matrix for protein "ABCDE", is identical, at first > anyhow, for the matrix for "ABCDEFG". This means if I have both > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test > both of them against the search query, in almost half the time. My > algorithm also runs in logarithmic-time with respect to the size of > the database. Basically, bigger databases run disproportionately faster. > > I want to turn this algorithm, into something useful for people. My > first challenge here, is to answer the question "is this algorithm > faster, or better than BLAST". If it is not faster, my algorithm > basically has little use. But I have good hopes it will be faster! I > am very good with these sort of things, you see :) Speed is my strong- > point. > > Currently, I do not know about the speed, because I haven't > implemented a C++ version of my algorithm, or a good speed testing > framework. > > I do however know that my algorithm is more accurate than BLAST, > because it is just as accurate as SSEARCH, as mine uses the Smith- > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- > work basically. A fine heuristic, but still a heuristic. Mine is > methodological, not heuristic based. > > So here is what I am looking for! > > I am hoping, that someone in the field will be able to offer me > guidance, interest, enthusiasm, suggestions and maybe even do some > testing for me. > > Perhaps a student doing a bio-informatics related degree, who would > like to write a paper on an alternative way of processing protein > databases. My invention could be an interesting subject for a paper. > > Or perhaps a researcher who just has an interest in these sort of > things! Perhaps a researcher who feels there must be a better way of > doing these things. Or anyone really in this field with the time and > interest, and feels helping me could help him (or her) too in some way. > > I'd like someone I can ask a lot of questions to, and show my software > to, and explain my hopes what I can achieve with it. > > Basically, my first question to you, would be "how would I set this up > to be useful for someone", and "how would I test it's usefulness, what > would you need to know about my algorithm that you would decide to use > it over blast" > > It's sort of a vague question from me, like "what do you need me to > do", but... well that's where I am right now. Sort of a bit on the > outside hoping someone on the inside will show me something. > > So it's an opportunity to tell me what you want, basically!! Tell me, > and I might just make it. > > Who knows? Maybe one day in a few years time, everyone will be using > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You > might be part of something. > > Thanks to anyone who replies! > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From marchywka at hotmail.com Tue Feb 12 10:00:44 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 12 Feb 2008 10:00:44 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: Message-ID: [ this stupid hotmail editor cutoff my last message, I guess "plain text" still expects formating... ] > I am hoping, that someone in the field will be able to offer me > guidance, interest, enthusiasm, suggestions and maybe even do some > testing for me. Anyway, the rest of my prior post wasn't all that interesting but I would also suggest you can read the literature and find problems. http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&&term=blast+limitation just skimming hits, there are new lab techniques and these may have artifacts or quirks that need certain ID features. Alternatively, if you can find a specific confusing result and sort it out with your technique that would be a good proof of concept. Don't immediately dismiss this approach as there is so much new literature these days that there may be problems and solutions waiting to be matched as research groups are too busy on their own or different matters. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. > From: delete at elfdata.com > To: bbb at bioinformatics.org > Date: Mon, 11 Feb 2008 16:21:23 +0000 > Subject: [BiO BB] Looking for researcher, to assist on blast-like invention > > > Hi everyone, > > So I've been working, on and off, on this algorithm for quite a while > now. It's basically an invention of mine. It is a "blast-like" > algorithm, in that it does "Fuzzy lookup" operations across a database > of letters. I am designing this algorithm to be useful for bio- > informatics, this is the main field I am initially targetting. > > The database will be filled with protein sequences, and the search > across the database will be another protein sequence. The algorithm > has a "scoring matrix", which can accept different protein replacement > scores. The cost of inserting letters (protein letters) can be > configured also. > > In this sense, it's no different to Smith-Waterman. The same input, > the same output! > > The real difference from Smith-Waterman, is it's speed. My algorithm > will be hugely faster. This is because I use many techniques to avoid > processing unnecessary parts of the Smith-Waterman matrix. > > I also use many tricks to reuse computations across various proteins. > For example, the matrix for protein "ABCDE", is identical, at first > anyhow, for the matrix for "ABCDEFG". This means if I have both > proteins "ABCDE", and "ABCDEFG" in my protein database, I can test > both of them against the search query, in almost half the time. My > algorithm also runs in logarithmic-time with respect to the size of > the database. Basically, bigger databases run disproportionately faster. > > I want to turn this algorithm, into something useful for people. My > first challenge here, is to answer the question "is this algorithm > faster, or better than BLAST". If it is not faster, my algorithm > basically has little use. But I have good hopes it will be faster! I > am very good with these sort of things, you see :) Speed is my strong- > point. > > Currently, I do not know about the speed, because I haven't > implemented a C++ version of my algorithm, or a good speed testing > framework. > > I do however know that my algorithm is more accurate than BLAST, > because it is just as accurate as SSEARCH, as mine uses the Smith- > Waterman algorithm. Whereas BLAST uses a heuristic, intelligent guess- > work basically. A fine heuristic, but still a heuristic. Mine is > methodological, not heuristic based. > > So here is what I am looking for! > > I am hoping, that someone in the field will be able to offer me > guidance, interest, enthusiasm, suggestions and maybe even do some > testing for me. > > Perhaps a student doing a bio-informatics related degree, who would > like to write a paper on an alternative way of processing protein > databases. My invention could be an interesting subject for a paper. > > Or perhaps a researcher who just has an interest in these sort of > things! Perhaps a researcher who feels there must be a better way of > doing these things. Or anyone really in this field with the time and > interest, and feels helping me could help him (or her) too in some way. > > I'd like someone I can ask a lot of questions to, and show my software > to, and explain my hopes what I can achieve with it. > > Basically, my first question to you, would be "how would I set this up > to be useful for someone", and "how would I test it's usefulness, what > would you need to know about my algorithm that you would decide to use > it over blast" > > It's sort of a vague question from me, like "what do you need me to > do", but... well that's where I am right now. Sort of a bit on the > outside hoping someone on the inside will show me something. > > So it's an opportunity to tell me what you want, basically!! Tell me, > and I might just make it. > > Who knows? Maybe one day in a few years time, everyone will be using > this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You > might be part of something. > > Thanks to anyone who replies! > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Connect and share in new ways with Windows Live. http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008 From oceanhu at 126.com Tue Feb 12 04:43:46 2008 From: oceanhu at 126.com (ocean) Date: Tue, 12 Feb 2008 17:43:46 +0800 (CST) Subject: [BiO BB] Ensembl and Gene Ontology terms In-Reply-To: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> Message-ID: <23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com> Hey,Paco i think you can try BIOMART(www.biomart.org). this database had done such thing already. you can contract them for some help. good luck! Huhaiyang ?2008-02-12?"Paco B C" ??? Hi! this is my first message in this list. My name is Paco and I'm doing my PhD. on Bioinformatics in University of Leuven, Belgium. I would like to build a java module that, given a list of Ensembl Gene Identifiers, it would give back their related Gene Ontology terms. I've accessed the GO database, but I can't find ENSG terms and I've read in the Ensembl website that they give the link to external databases for translation and transcript objects but not for genes (maybe in the future, but not now). My question is, do you know which database could I query in order to get this relation within Ensembl and GO terms? Thanks! Paco _______________________________________________ BBB mailing list BBB at bioinformatics.org http://www.bioinformatics.org/mailman/listinfo/bbb From pace_john at hotmail.com Tue Feb 12 10:50:46 2008 From: pace_john at hotmail.com (John Pace) Date: Tue, 12 Feb 2008 09:50:46 -0600 Subject: [BiO BB] Inconsistent Blast Results In-Reply-To: <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> References: <79def59f0802081159v5472f566hba05582d4c4eae77@mail.gmail.com> <79def59f0802081756w632381ccscc1d996ec9041ee2@mail.gmail.com> Message-ID: Rebekah, The reason for this is the way Blast calculates e-values. The e-value is a function of the score. The higher the score, the lower the e-value. The score gets lower as the alignment gets worse and also depends on the length of the query sequence. So, for a lower e-value to be obtained, say 10e-10, the alignment for the HSP must be better than the alignment for the HSP that generates an e-value of 10e-7. If the alignment can be worse, chances are that more of the query sequence will show up in the HSPs, thus creating different output. Also, the e-value is a function of the length of the sequence and the size of the database. So a shorter query sequence that is 10% diverged from the hit will have a higher e-value than a query sequence that is 5 times longer than the short sequence with the same divergence. I hope this helps. Unfortunately, comparing different e-values in Blast can be a little like comparing apples to oranges. I have found that this can be circumvented by using a sliding e-value. You can use this to make sure all query sequences, regardless of length, match a certain criteria, such as at least 50% similarity over the entire length of the query sequence. It gets a little more complicated, but at least it is comparing apples to apples. Thanks, John Pace PhD Candidate University of Texas at Arlington > Date: Fri, 8 Feb 2008 20:56:41 -0500> From: rebekah.rogers at gmail.com> To: bbb at bioinformatics.org> Subject: [BiO BB] Inconsistent Blast Results> > Hi:> > I'm currently running blast 2.2.14 locally on my mac. I've noticed> that the printout from a blastn run at an E cutoff of 10^-10 reads> differently than a blast run at an E cutoff of 10^-7 when hits worse> than 10^-10 are ignored. Suddenly at 10^-7 new hits with evals of> 10^-11 appear that weren't there before and even the relative strength> of different hits can change.> > I'm not certain I understand why this is true and it has a huge impact> on my results. I know that the Eval is dependent on certain constants> taken from the compared sequences, but I don't understand how this> could possibly change when I'm using the exact same input file and> database.> > Does anyone have an explanation?> > -Rebekah> > _______________________________________________> BBB mailing list> BBB at bioinformatics.org> http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Climb to the top of the charts!?Play the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan From marchywka at hotmail.com Tue Feb 12 13:57:06 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 12 Feb 2008 13:57:06 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: <47B0CC02.8010206@umdnj.edu> Message-ID: > I think someone who uses BLAST frequently, and knows it well from a > user's perspective... might have a better feel for creating a test > than I might. It can be hard to solicit problems from people but it helps if you have some idea a priori what you are trying to accomplish before you test :) Certainly test what you expect to achieve and see if the tradeoff/"pathological" cases are what you expect, and then test with pseudo random input if you have some way to generate an expected result. To give you an example, right now I'm writing stuff that re-invents the wheel with a few things I'm ultimately hoping to identify and improve. I have scripts to generate random numbers to obtain unknown pieces of genome and "blast" / search against those assuming they will be "negative controls." Of course, a "hit" would require further examination but you get the idea. On genome, you get all kinds of "odd stuff" including discovering the unappreciated but highly conserved sequence "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN" LOL. You get the idea. I also have some splice related code I'm working on and I eventually found out about particular cases like DSCAM that make good tests. If you are aware of hits that normal blast doesn't turn up with typical search parameters, that would obviously make a good test case. Most of the papers that came up in the pubmed search probably list open issues ( or else their conclusion would be translated, " therefore, we require not further funding from our sponsor" LOL) so just reading the literature should give you some relevant ideas. Also, besides the bio literature, check out computer/algorithm literature at places like citeseer. _________________________________________________________________ Shed those extra pounds with MSN and The Biggest Loser! http://biggestloser.msn.com/ From delete at elfdata.com Tue Feb 12 11:13:03 2008 From: delete at elfdata.com (Theodore H. Smith) Date: Tue, 12 Feb 2008 16:13:03 +0000 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: <885c6c040802120745g7c55440ai8275e132d3932da2@mail.gmail.com> References: <885c6c040802120745g7c55440ai8275e132d3932da2@mail.gmail.com> Message-ID: <411D6BDA-6BCA-441D-8345-AF8D3BE36029@elfdata.com> Hi Andreu, I am definitely making my source code available to everyone, under open source agreement. I am not going the commercial route. And while I will protect my intellectual property, I am not going the patent route. I am not a believer of the whole aggressive "stop people doing stuff" idea. I should have said that I am making this open source, at the start. The main reason I am delaying in making it open source, is that I don't have a C++ version yet, so I have nothing to offer. And also I find source forge awkward to use and wastes a lot of time, compared to me just uploading the source code directly to my website and just putting an agreement saying "this is open source". As for writing a paper... I don't really have the background in University to write a paper, meaning it would take me a lot longer to do than someone experienced in writing papers. And to be honest I feel it would distract me from my main goal, which is to spend my time doing something productive. I would rather someone else write a paper for me :) I think this would be a fair arrangement. But I am happy to explain my algorithm. I think I should write up a document however explaining it. Maybe not in academia, more in software developer style. Thanks for all the interest and suggestion everyone. It's helping a lot. On 12 Feb 2008, at 15:45, Andreu Alib?s wrote: > Why not making the code available to everybody in an Open Source > repository like sourceforge? > > A > > On Feb 11, 2008 5:21 PM, Theodore H. Smith wrote: >> >> Hi everyone, >> >> So I've been working, on and off, on this algorithm for quite a while >> now. It's basically an invention of mine. It is a "blast-like" >> algorithm, in that it does "Fuzzy lookup" operations across a >> database >> of letters. I am designing this algorithm to be useful for bio- >> informatics, this is the main field I am initially targetting. >> >> The database will be filled with protein sequences, and the search >> across the database will be another protein sequence. The algorithm >> has a "scoring matrix", which can accept different protein >> replacement >> scores. The cost of inserting letters (protein letters) can be >> configured also. >> >> In this sense, it's no different to Smith-Waterman. The same input, >> the same output! >> >> The real difference from Smith-Waterman, is it's speed. My algorithm >> will be hugely faster. This is because I use many techniques to avoid >> processing unnecessary parts of the Smith-Waterman matrix. >> >> I also use many tricks to reuse computations across various proteins. >> For example, the matrix for protein "ABCDE", is identical, at first >> anyhow, for the matrix for "ABCDEFG". This means if I have both >> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test >> both of them against the search query, in almost half the time. My >> algorithm also runs in logarithmic-time with respect to the size of >> the database. Basically, bigger databases run disproportionately >> faster. >> >> I want to turn this algorithm, into something useful for people. My >> first challenge here, is to answer the question "is this algorithm >> faster, or better than BLAST". If it is not faster, my algorithm >> basically has little use. But I have good hopes it will be faster! I >> am very good with these sort of things, you see :) Speed is my >> strong- >> point. >> >> Currently, I do not know about the speed, because I haven't >> implemented a C++ version of my algorithm, or a good speed testing >> framework. >> >> I do however know that my algorithm is more accurate than BLAST, >> because it is just as accurate as SSEARCH, as mine uses the Smith- >> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent >> guess- >> work basically. A fine heuristic, but still a heuristic. Mine is >> methodological, not heuristic based. >> >> So here is what I am looking for! >> >> I am hoping, that someone in the field will be able to offer me >> guidance, interest, enthusiasm, suggestions and maybe even do some >> testing for me. >> >> Perhaps a student doing a bio-informatics related degree, who would >> like to write a paper on an alternative way of processing protein >> databases. My invention could be an interesting subject for a paper. >> >> Or perhaps a researcher who just has an interest in these sort of >> things! Perhaps a researcher who feels there must be a better way of >> doing these things. Or anyone really in this field with the time and >> interest, and feels helping me could help him (or her) too in some >> way. >> >> I'd like someone I can ask a lot of questions to, and show my >> software >> to, and explain my hopes what I can achieve with it. >> >> Basically, my first question to you, would be "how would I set this >> up >> to be useful for someone", and "how would I test it's usefulness, >> what >> would you need to know about my algorithm that you would decide to >> use >> it over blast" >> >> It's sort of a vague question from me, like "what do you need me to >> do", but... well that's where I am right now. Sort of a bit on the >> outside hoping someone on the inside will show me something. >> >> So it's an opportunity to tell me what you want, basically!! Tell me, >> and I might just make it. >> >> Who knows? Maybe one day in a few years time, everyone will be using >> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You >> might be part of something. >> >> Thanks to anyone who replies! >> >> -- >> http://elfdata.com/plugin/ >> "String processing, done right" >> >> >> >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > > > > -- > Andreu Alib?s, PhD > Systems Biology Program - Center for Genomic Regulation > c/ Dr. Aiguader 88, 08003 Barcelona, Spain > Phone: +34 93 316 0258 > http://aalibes.googlepages.com/ > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb -- http://elfdata.com/plugin/ "String processing, done right" From dankoc at gmail.com Tue Feb 12 11:15:10 2008 From: dankoc at gmail.com (Charles Danko) Date: Tue, 12 Feb 2008 11:15:10 -0500 Subject: [BiO BB] Ensembl and Gene Ontology terms In-Reply-To: <23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com> References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> <23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com> Message-ID: <8adccabf0802120815y52942ca0ia0bbdd60cdccc627@mail.gmail.com> Hi, Paco, If you want to do it all programmically from Java, I would suggest Googling a protocol called DAS. ENSEMBL has a DAS server from which you should be able to pull GO annotations for ENSEMBL IDs quite easily. You can find java-based libraries to access a DAS connection, and parse the resulting information here: http://www.spice-3d.org/dasobert/. Good luck! Charles 2008/2/12 ocean : > Hey,Paco > > i think you can try BIOMART(www.biomart.org). this database had done such thing already. > you can contract them for some help. > > good luck! > > Huhaiyang > > > > > ?2008-02-12?"Paco B C" ??? > > > Hi! > this is my first message in this list. My name is Paco and I'm doing my PhD. > on Bioinformatics in University of Leuven, Belgium. > I would like to build a java module that, given a list of Ensembl Gene > Identifiers, it would give back their related Gene Ontology terms. I've > accessed the GO database, but I can't find ENSG terms and I've read in the > Ensembl website that they give the link to external databases for > translation and transcript objects but not for genes (maybe in the future, > but not now). > My question is, do you know which database could I query in order to get > this relation within Ensembl and GO terms? > Thanks! > Paco > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From theoriste at gmail.com Tue Feb 12 11:44:34 2008 From: theoriste at gmail.com (DT) Date: Tue, 12 Feb 2008 11:44:34 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: <47B0CC02.8010206@umdnj.edu> Message-ID: <6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com> On Feb 11, 2008 6:56 PM, Theodore H. Smith wrote: > > On 11 Feb 2008, at 22:28, Ryan Golhar wrote: > > > Why don't you write up a paper describing the algorithm in detail and > > submit it to a bioinformatics journal? And, why not make the > > executable > > available with documentation so that people can download it and try it > > out for themselves. > > > > Do you have any test cases that show it runs faster/better than BLAST? > > Describe them and make them available. > > The first thing I'd need to do is make a good test. I'm not sure what > constitutes "a good test", in this case. NR ALL VS ALL: This will test speed and somehow test performance. The nr database (non-redundant) from NCBI is a good place to start testing as a template database. I'd use your algorithm all-against-all in nr. Test against BLAST and then use your algorithm for each entry in nr versus all of nr, and then compare performance. You can generate a ROC plot for BLAST vs your algorithm against a known set of homologs and distant homologs, based on a p-value or significance level cutoff. A real randomization test would be this to test sensitivity and specificity: take known sequences in nr -- all or some of them -- and scramble them by 'homologous recombination" -- create chimeras of known sequences by different randomization criteria -- by domain (criteria based on domain annotation) or by individual sequence based on a known randomization function, and then test the sensitivity and specificity of BLAST vs your algorithm to detect the originating sequences that created the chimeras. You will also need to check the performance of your algorithm against nucleotide sequences. There are already test cases in BLAST for mouse-vs-human, that would be a good test case. Deanne Taylor From theoriste at gmail.com Tue Feb 12 11:46:03 2008 From: theoriste at gmail.com (DT) Date: Tue, 12 Feb 2008 11:46:03 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: <6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com> References: <47B0CC02.8010206@umdnj.edu> <6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com> Message-ID: <6e3504600802120846y38283248m1afd57dfe0df7937@mail.gmail.com> By the way, nr is ftp-able from NCBI and is a protein-based database if you didn't know. On Feb 12, 2008 11:44 AM, DT wrote: > > On Feb 11, 2008 6:56 PM, Theodore H. Smith wrote: > > > > > On 11 Feb 2008, at 22:28, Ryan Golhar wrote: > > > > > Why don't you write up a paper describing the algorithm in detail and > > > submit it to a bioinformatics journal? And, why not make the > > > executable > > > available with documentation so that people can download it and try it > > > out for themselves. > > > > > > Do you have any test cases that show it runs faster/better than BLAST? > > > Describe them and make them available. > > > > The first thing I'd need to do is make a good test. I'm not sure what > > constitutes "a good test", in this case. > > > > NR ALL VS ALL: This will test speed and somehow test performance. The nr > database (non-redundant) from NCBI is a good place to start testing as a > template database. I'd use your algorithm all-against-all in nr. Test > against BLAST and then use your algorithm for each entry in nr versus all > of nr, and then compare performance. You can generate a ROC plot for BLAST > vs your algorithm against a known set of homologs and distant homologs, > based on a p-value or significance level cutoff. > > A real randomization test would be this to test sensitivity and > specificity: take known sequences in nr -- all or some of them -- and > scramble them by 'homologous recombination" -- create chimeras of known > sequences by different randomization criteria -- by domain (criteria based > on domain annotation) or by individual sequence based on a known > randomization function, and then test the sensitivity and specificity of > BLAST vs your algorithm to detect the originating sequences that created the > chimeras. > > You will also need to check the performance of your algorithm against > nucleotide sequences. There are already test cases in BLAST for > mouse-vs-human, that would be a good test case. > > Deanne Taylor > > > From theoriste at gmail.com Tue Feb 12 11:49:02 2008 From: theoriste at gmail.com (DT) Date: Tue, 12 Feb 2008 11:49:02 -0500 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: <6e3504600802120846y38283248m1afd57dfe0df7937@mail.gmail.com> References: <47B0CC02.8010206@umdnj.edu> <6e3504600802120844o7d040702rc7afea8ad4285be3@mail.gmail.com> <6e3504600802120846y38283248m1afd57dfe0df7937@mail.gmail.com> Message-ID: <6e3504600802120849y2f242f95t60e2f78e10953dc3@mail.gmail.com> One more thing -- If you do a homologous recombination function, I would also include an additional mutator function to mimic genetic drift -- it can be sophisticated in allowing mutations vs the codon table and can be distributed by a known function of percent drift/difference, so you can adjust that and not only catch originating sequences by domains but also by drift criteria. D On Feb 12, 2008 11:46 AM, DT wrote: > By the way, nr is ftp-able from NCBI and is a protein-based database if > you didn't know. > > > On Feb 12, 2008 11:44 AM, DT wrote: > > > > > On Feb 11, 2008 6:56 PM, Theodore H. Smith wrote: > > > > > > > > On 11 Feb 2008, at 22:28, Ryan Golhar wrote: > > > > > > > Why don't you write up a paper describing the algorithm in detail > > > and > > > > submit it to a bioinformatics journal? And, why not make the > > > > executable > > > > available with documentation so that people can download it and try > > > it > > > > out for themselves. > > > > > > > > Do you have any test cases that show it runs faster/better than > > > BLAST? > > > > Describe them and make them available. > > > > > > The first thing I'd need to do is make a good test. I'm not sure what > > > constitutes "a good test", in this case. > > > > > > > > NR ALL VS ALL: This will test speed and somehow test performance. The > > nr database (non-redundant) from NCBI is a good place to start testing as a > > template database. I'd use your algorithm all-against-all in nr. Test > > against BLAST and then use your algorithm for each entry in nr versus all > > of nr, and then compare performance. You can generate a ROC plot for BLAST > > vs your algorithm against a known set of homologs and distant homologs, > > based on a p-value or significance level cutoff. > > > > A real randomization test would be this to test sensitivity and > > specificity: take known sequences in nr -- all or some of them -- and > > scramble them by 'homologous recombination" -- create chimeras of known > > sequences by different randomization criteria -- by domain (criteria based > > on domain annotation) or by individual sequence based on a known > > randomization function, and then test the sensitivity and specificity of > > BLAST vs your algorithm to detect the originating sequences that created the > > chimeras. > > > > You will also need to check the performance of your algorithm against > > nucleotide sequences. There are already test cases in BLAST for > > mouse-vs-human, that would be a good test case. > > > > Deanne Taylor > > > > > > > From cupton at uvic.ca Tue Feb 12 11:55:30 2008 From: cupton at uvic.ca (Chris Upton) Date: Tue, 12 Feb 2008 08:55:30 -0800 Subject: [BiO BB] Looking for researcher, to assist on blast-like invention In-Reply-To: References: <47B0CC02.8010206@umdnj.edu> Message-ID: <9FBE25AE-D4BA-4A87-9863-C84F52E4D6AB@uvic.ca> Hi, We do a lot of searching of protein databases, searching for distant homologs. If we send you protein sequences, can you search a protein database (NR)? Chris On Feb 11, 2008, at 3:56 PM, Theodore H. Smith wrote: > > On 11 Feb 2008, at 22:28, Ryan Golhar wrote: > >> Why don't you write up a paper describing the algorithm in detail and >> submit it to a bioinformatics journal? And, why not make the >> executable >> available with documentation so that people can download it and try >> it >> out for themselves. >> >> Do you have any test cases that show it runs faster/better than >> BLAST? >> Describe them and make them available. > > The first thing I'd need to do is make a good test. I'm not sure what > constitutes "a good test", in this case. > > How big should the databanks be to make the test reasonable? Is > randomly generated data good enough, or is a randomly selected sample > better. If a sample is better, how large a dataset must I gather to do > the test. > > Perhaps certain settings make my algorithm work better or worse > relative to BLAST. But then how do I know which settings are more > likely to be used and which aren't? > > I think someone who uses BLAST frequently, and knows it well from a > user's perspective... might have a better feel for creating a test > than I might. > > The worst thing that could happen is I make a test, which is unfairly > prejudiced to my algorithm :) The next thing that would happen is > people would see my test has "suspiciously good" results, and... be > annoyed about that, and lose interest, even if it were an innocent > mistake on my end. I'd rather avoid that sort of mistake by getting a > knowledged eye in the designing of a test! > > Like I said, I haven't gotten all the code in C++ yet. I've got a > framework in C++ already, I mean I know how to write C++. And I know > what to do, as I've written it in a proto-typing language. > > The C++ version will come soon, though. > >> Theodore H. Smith wrote: >>> Hi everyone, >>> >>> So I've been working, on and off, on this algorithm for quite a >>> while >>> now. It's basically an invention of mine. It is a "blast-like" >>> algorithm, in that it does "Fuzzy lookup" operations across a >>> database >>> of letters. I am designing this algorithm to be useful for bio- >>> informatics, this is the main field I am initially targetting. >>> >>> The database will be filled with protein sequences, and the search >>> across the database will be another protein sequence. The algorithm >>> has a "scoring matrix", which can accept different protein >>> replacement >>> scores. The cost of inserting letters (protein letters) can be >>> configured also. >>> >>> In this sense, it's no different to Smith-Waterman. The same input, >>> the same output! >>> >>> The real difference from Smith-Waterman, is it's speed. My algorithm >>> will be hugely faster. This is because I use many techniques to >>> avoid >>> processing unnecessary parts of the Smith-Waterman matrix. >>> >>> I also use many tricks to reuse computations across various >>> proteins. >>> For example, the matrix for protein "ABCDE", is identical, at first >>> anyhow, for the matrix for "ABCDEFG". This means if I have both >>> proteins "ABCDE", and "ABCDEFG" in my protein database, I can test >>> both of them against the search query, in almost half the time. My >>> algorithm also runs in logarithmic-time with respect to the size of >>> the database. Basically, bigger databases run disproportionately >>> faster. >>> >>> I want to turn this algorithm, into something useful for people. My >>> first challenge here, is to answer the question "is this algorithm >>> faster, or better than BLAST". If it is not faster, my algorithm >>> basically has little use. But I have good hopes it will be faster! I >>> am very good with these sort of things, you see :) Speed is my >>> strong- >>> point. >>> >>> Currently, I do not know about the speed, because I haven't >>> implemented a C++ version of my algorithm, or a good speed testing >>> framework. >>> >>> I do however know that my algorithm is more accurate than BLAST, >>> because it is just as accurate as SSEARCH, as mine uses the Smith- >>> Waterman algorithm. Whereas BLAST uses a heuristic, intelligent >>> guess- >>> work basically. A fine heuristic, but still a heuristic. Mine is >>> methodological, not heuristic based. >>> >>> So here is what I am looking for! >>> >>> I am hoping, that someone in the field will be able to offer me >>> guidance, interest, enthusiasm, suggestions and maybe even do some >>> testing for me. >>> >>> Perhaps a student doing a bio-informatics related degree, who would >>> like to write a paper on an alternative way of processing protein >>> databases. My invention could be an interesting subject for a paper. >>> >>> Or perhaps a researcher who just has an interest in these sort of >>> things! Perhaps a researcher who feels there must be a better way of >>> doing these things. Or anyone really in this field with the time and >>> interest, and feels helping me could help him (or her) too in some >>> way. >>> >>> I'd like someone I can ask a lot of questions to, and show my >>> software >>> to, and explain my hopes what I can achieve with it. >>> >>> Basically, my first question to you, would be "how would I set this >>> up >>> to be useful for someone", and "how would I test it's usefulness, >>> what >>> would you need to know about my algorithm that you would decide to >>> use >>> it over blast" >>> >>> It's sort of a vague question from me, like "what do you need me to >>> do", but... well that's where I am right now. Sort of a bit on the >>> outside hoping someone on the inside will show me something. >>> >>> So it's an opportunity to tell me what you want, basically!! Tell >>> me, >>> and I might just make it. >>> >>> Who knows? Maybe one day in a few years time, everyone will be using >>> this "ElfDataFuzzy" algorithm that I invented, instead of BLAST! You >>> might be part of something. >>> >>> Thanks to anyone who replies! >>> >>> -- >>> http://elfdata.com/plugin/ >>> "String processing, done right" >>> >>> >>> >>> _______________________________________________ >>> BBB mailing list >>> BBB at bioinformatics.org >>> http://www.bioinformatics.org/mailman/listinfo/bbb >>> >>> >> >> >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb > > -- > http://elfdata.com/plugin/ > "String processing, done right" > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb Chris Upton Ph.D. Associate Professor Biochemistry and Microbiology Tel. 250-721-6507 University of Victoria Fax 250-721-8855 P.O. Box 3055 STN CSC Victoria, BC V8W 3P6 Canada web.uvic.ca/~cupton www.virology.ca www.biodirectory.com/uptons_blog.html From sdua at coes.latech.edu Wed Feb 13 11:01:46 2008 From: sdua at coes.latech.edu (Sumeet Dua) Date: Wed, 13 Feb 2008 10:01:46 -0600 Subject: [BiO BB] CfP: IEEE-CIBCB08 Special Session on Data Mining for Bioinformatics. Message-ID: <52362CF6-4B26-481E-AF86-FD97BF4A4EE2@coes.latech.edu> Apologies for any duplicate transmissions. --------- Call for Papers: 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (IEEE-CIBCB08) September 15-17, 2008, Sun Valley, Idaho, USA. Sponsored by IEEE and the IEEE Computational Intelligence Society Special Session on: Data Mining for Bioinformatics The IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology 2008 special session on Data Mining for Bioinformatics will focus on novel results in the topics of computational intelligence as they are employed for data mining applications in bioinformatics. The computational intelligence (CI) areas of interest include Genetic Algorithms, Neural Computation, Fuzzy systems, Hidden Markov models, Rough Set Theory, Support Vector Machines, Chaos Theory, Simulated Annealing, Bayesian Framework, Probabilistic models, Statistical models and other emerging Evolutionary Computing techniques. We are specifically interested in topics of CI that have been developed to address data mining challenges in several important problems of bioinformatics, including but not limited to: ? Supervised and unsupervised methods for Microarray data analysis ? Knowledge modeling in biomedical databases ? Feature selection in biological data ? Medical imaging and pattern recognition ? Metabolic pathway analysis and Gene regulatory network modeling ? Motif and pattern discovery ? Protein, Enzyme, and RNA structure prediction and folding ? Evolutionary Computing for Bioinformatics ? Molecular sequence alignment and analysis ? Fuzzy Modeling for gene sequence analysis ? Cell simulation and modeling ? Molecular computing ? Bayesian Frameworks for Microarray data analysis ? Ontologies and taxonomies ? Neural Network Model for Genome sequence analysis ? Wavelets based data mining ? Fuzzy modeling in Bioinformatics ? Information and data visualization ? Automated text categorization and authority determination ? Dimensionality reduction in bioinformatics Paper Submission: Prospective authors are invited to submit papers of no more than eight (8) pages IEEE conference style papers including results, figures and references; submission details can be found on the symposium web site: www.cibcb.org Accepted papers will appear in the proceedings and indexed by IEEE Xplore. Important Dates: Paper submission deadline: March 31, 2008 Author notification: April 30, 2008 Camera-ready paper deadline: June 15, 2008 Conference: September 15-17, 2008 Special session webpage: http://www.latech.edu/~sdua/CIBCB08-SS/ For further information regarding the special session please contact the Session Chair: Sumeet Dua, Ph.D. Upchurch Associate Professor, Coordinator of IT Research Department of Computer Science, Louisiana Tech University, LA, USA E-mail: sdua at coes.latech.edu; Phone: 318-257-2830 For further information regarding CIBCB-2008 please contact the Symposium Chair or the Program Chair: Symposium chair: Scott Smith Program chair: Gwenn Volkert Technical Co-Chairs: Kay C. Wiese, Madhu Chetty, Elena Marchiori Finance Chair: Gary B. Fogel Publicity Chair: Lutz Hamel Regional Publicity Chairs: Joshua Knowles - Europe, P.N. Suganthan - Asia Proceedings Chair: Clare Bates Congdon Special Sessions Chair: Jennifer Hallinan Tutorials Chair: Dan Ashlock Web Chair: Wendy Ashlock --------- From mleczny at gmail.com Thu Feb 14 03:00:52 2008 From: mleczny at gmail.com (Paco B C) Date: Thu, 14 Feb 2008 09:00:52 +0100 Subject: [BiO BB] Ensembl and Gene Ontology terms In-Reply-To: <8adccabf0802120815y52942ca0ia0bbdd60cdccc627@mail.gmail.com> References: <604858190802110800n30b6de09i61c64efaac377810@mail.gmail.com> <23188614.237851202809426120.JavaMail.coremail@bj126app59.126.com> <8adccabf0802120815y52942ca0ia0bbdd60cdccc627@mail.gmail.com> Message-ID: <604858190802140000w1163e5f6md214e95b059f2c57@mail.gmail.com> Ey, DAS protocol looks very interesting for what I want to do. Thanks a lot to all of you! Paco 2008/2/12 Charles Danko : > Hi, Paco, > > If you want to do it all programmically from Java, I would suggest > Googling a protocol called DAS. ENSEMBL has a DAS server from which > you should be able to pull GO annotations for ENSEMBL IDs quite > easily. > > You can find java-based libraries to access a DAS connection, and > parse the resulting information here: > http://www.spice-3d.org/dasobert/. > > Good luck! > Charles > > 2008/2/12 ocean : > > Hey,Paco > > > > i think you can try BIOMART(www.biomart.org). this database had done > such thing already. > > you can contract them for some help. > > > > good luck! > > > > Huhaiyang > > > > > > > > > > ?2008-02-12?"Paco B C" ??? > > > > > > Hi! > > this is my first message in this list. My name is Paco and I'm doing my > PhD. > > on Bioinformatics in University of Leuven, Belgium. > > I would like to build a java module that, given a list of Ensembl Gene > > Identifiers, it would give back their related Gene Ontology terms. I've > > accessed the GO database, but I can't find ENSG terms and I've read in > the > > Ensembl website that they give the link to external databases for > > translation and transcript objects but not for genes (maybe in the > future, > > but not now). > > My question is, do you know which database could I query in order to get > > this relation within Ensembl and GO terms? > > Thanks! > > Paco > > _______________________________________________ > > BBB mailing list > > BBB at bioinformatics.org > > http://www.bioinformatics.org/mailman/listinfo/bbb > > _______________________________________________ > > BBB mailing list > > BBB at bioinformatics.org > > http://www.bioinformatics.org/mailman/listinfo/bbb > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > From dan.bolser at gmail.com Sat Feb 16 05:46:03 2008 From: dan.bolser at gmail.com (Dan Bolser) Date: Sat, 16 Feb 2008 11:46:03 +0100 Subject: [BiO BB] List of mailing lists? Message-ID: <2c8757af0802160246r56fa71e2u4e4a7f879d3e2c0e@mail.gmail.com> Hi, Together with some friends we started to put together this page; List of mailing lists for biologists; http://biodatabase.org/index.php/List_of_mailing_lists_for_biologists I am not sure if there is a better place for this kind of project somewhere within the Bioinformatics.Org set of sites, which I know maintains several related projects, or even if such lists already exist elsewhere on the internet. Sorry if I sent this before... It was hanging around in my drafts folder... Can anyone point me towards related resources to integrate or transfer effort to? In the mean time please feel free to edit, add or update the existing (rather short) list of mailing lists on that page. Cheers, Dan. ---- Talk to the experts? irc://freenode.net/#bioinformatics From marchywka at hotmail.com Mon Feb 18 11:12:10 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Mon, 18 Feb 2008 11:12:10 -0500 Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing In-Reply-To: References: Message-ID: Hi, As I mentioned in previous posts, I'm using the drosophila DSCAM genes for testing some tools. I assembled a fasta file composed of 3 fly entries, $ cat all_fasta | grep ">" >AF260530 Drosophila melanogaster Dscam gene, complete cds. >DQ317106 Drosophila yakuba Dscam gene, exons 3 through 24. >DQ317109 Drosophila pseudoobscura Dscam gene, exons 3 through 24. and tried aligning them with clustalw but minutes later still didn't have a result. I was wondering if someone could suggest a set of parameters or alternative alignment tool to do a fast alignment, even if a bit sloppy. I had always used to slow/accurate approach and don't know what options may be available for faster work- these sequences are each about 50k long. In the meantime, I was able to get a satisfactory result using exact string matches using successively shorter and shorter strings. This approach yields acceptable results in under a minute and, if needed, you could segment the questionable areas and feed them to clustal or other tool for "better" alignment. It seems to be fast due to only comparing sequences to a reference sequence ( O(n*l^2) but "l" can be smaller than sequence length as unique features can be found O(l*log(l)) ) . There are, of course, likely to be various pathological cases but for sequences known to be similar it seems to work ok and the indexing feature allows extraction of substrings with particular distributions ( occuring only once in each sample for example). I have aligned 2 ecoli strains in perhaps a few minutes and there weren't any obvious pathological results ( I obviously didn't check the whole thing either by eye or programatically). Others have asked about testing method, I'd like to show how I'm going about this with the DSCAM example. The alignment is only one part of more general interest in finding similar/different features between samples. These sequences, it turns out, have exon locations in the ncbi entries. So, it was pretty easy to check the alignments by examining the locations of the exons in the aligned composite. In this case, I aligned as follows, $ string_test -fastas all_fasta -index 8 -length 25 -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchor_hits and could feed the "anchor_hits" locations to an aligner that could start with these ( actually, it can now start on its own but that is a developmental issue ) and refine subsections ( or spot gross transpositions perhaps, would have to check), $ mm_align_tool -fastas all_fasta -pair_rules anchor_hits -ref 0 -all_samples -refine marlow -pair_params uniq -pair_align monotonize -doall -dest algned_fasta -output fasta The aligned sequences can be output in several formats and the "algned_fasta" file can be presented along with various rules or annotations using another tool to create bmp, html, or txt files : ( right now it requires some input parameters, hence the dummy echo) 716 echo | $progpath/annotater -source manno.src> testcomp5.html 464 echo | $progpath/annotater -source bmp.src 585 echo |$progpath/annotater -source text.src>text.txt 2>junk where the ".src" file just contains command line parameters, $ cat text.src -width 120 -font $progpath/4x6-KOI8.pcf -mrules fixed_exons2 -merge_rules comp_5_hitss -font /cygdrive/c/mydocs/scripts/cc/affx//4x6-KOI8-R.pcf -acid_rank /cygdrive/c/mydocs/scripts/cc/affx//nmstrings -acid_map 20 -xlate -inter -banner -annotate algned_fasta So, I could first look at a different alignment metric by outputting a table of correspondences between input and aligned locations, 602 $progpath/mm_align_tool -fastas all_fasta -pair_rules anchor_hits -ref 0 -all_samples -refine marlow -pair_params uniq -pair_align monotonize -doall -dest xxxx_raw -output table and using the table to move "absolute location" rules to their location in the aligned sequence: $table_tool -v -table table_table -table_rules ncbi_exons | sed -e 's/exon/exon /g'| sed -e 's/[.:]/ /g' | sort -k 3 -g | more This generates a cryptic feature comparison map which shows that most of the exons end up in the same location on each sequence but see the publication below, even the differently named exons were aligned from different species in most cases ( these exon rules are followed by {sequence number, aligned position, offset from first entry } ): >exon |1|exon 3 {2,16121,0}{3,16121,0}{4,16157,36} >exon |1|exon 4 1 {2,18582,0}{3,18582,0}{4,18582,0} >exon |1|exon 4 10 {2,22532,0}{3,22532,0}{4,22532,0} >exon |1|exon 4 11 {2,22836,0}{3,22836,0}{4,22845,9} >exon |1|exon 4 12 {2,23217,0}{3,23217,0}{4,23217,0} >exon |1|exon 4 2 {2,19006,0}{3,19006,0}{4,19006,0} >exon |1|exon 4 3 {2,19736,0}{3,19736,0}{4,19736,0} >exon |1|exon 4 4 {2,20545,0}{3,20545,0}{4,20545,0} >exon |1|exon 4 5 {2,20872,0}{3,20872,0}{4,20872,0} >exon |1|exon 4 6 {2,21269,0}{3,21269,0}{4,21269,0} >exon |1|exon 4 7 {2,21597,0}{3,21597,0}{4,21597,0} >exon |1|exon 4 8 {2,21895,0}{3,21895,0}{4,21895,0} >exon |1|exon 4 9 {2,22229,0}{3,22229,0}{4,22212,-17} >exon |1|exon 5 {2,25020,0}{3,25020,0}{4,25020,0} >exon |1|exon 6 1 {2,27251,0}{3,27249,-2}{4,27251,0} >exon |1|exon 6 10 {2,29659,0}{3,29826,167}{4,29451,-208} >exon |1|exon 6 11 {2,29845,0}{3,30074,229}{4,29640,-205} >exon |1|exon 6 12 {2,30074,0}{3,30614,540}{4,30114,40} >exon |1|exon 6 13 {2,30614,0}{3,31054,440}{4,30438,-176} >exon |1|exon 6 14 {2,30831,0}{3,31255,424}{4,30614,-217} >exon |1|exon 6 15 {2,31054,0}{3,31475,421}{4,30831,-223} >exon |1|exon 6 16 {2,31255,0}{3,31684,429}{4,31054,-201} >exon |1|exon 6 17 {2,31475,0}{3,32161,686}{4,32160,685} >exon |1|exon 6 18 {2,31688,0}{3,32437,749}{4,32376,688} >exon |1|exon 6 19 {2,31926,0}{3,33251,1325}{4,32590,664} >exon |1|exon 6 2 {2,27524,0}{3,27524,0}{4,27497,-27} ...etc... >exon |1|exon 14 {2,66424,0}{3,66424,0}{4,66424,0} >exon |1|exon 15 {2,66631,0}{3,66631,0}{4,66631,0} >exon |1|exon 16 {2,66884,0}{3,66884,0}{4,66884,0} >exon |1|exon 17 1 {2,70469,0}{3,70469,0}{4,70469,0} >exon |1|exon 17 2 {2,71004,0}{3,71004,0}{4,71004,0} >exon |1|exon 18 {2,72995,0}{3,72995,0}{4,72995,0} >exon |1|exon 19 {2,74100,0}{3,74100,0}{4,74063,-37} >exon |1|exon 20 {2,74948,0}{3,74948,0}{4,74948,0} >exon |1|exon 21 {2,75334,0}{3,75334,0}{4,75334,0} >exon |1|exon 22 {2,75594,0}{3,75594,0}{4,75594,0} >exon |1|exon 23 {2,76979,0}{3,76979,0}{4,76997,18} >exon |1|exon 24 {2,78233,0}{3,78233,0}{4,78233,0} that can be reconciled with known inter-species exon similarities, for example http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1431710&blobtype=pdf It turns out that the exon3 offset for sequence 4 is probably due to a rule issue, not an alignment issue ( excerpt from the test alignment map including aligner diagnostics such as '{' in fasta file and likely translation products in all 3 fwd frames ) as the alignment in this high-similarity region appears good: New Section : 16080 to 16200 section 134 of 669 8 9 0 1 2 3 4 5 6 7 8 9 >>AF260530 Drosophila melanogaster Dscam gene, complete cds. .....................................}AGCTTGTGGTAGTCAGACCCTAGCTGCCAATCCCCCAGATGCCGACCAAAAAGGACCCGTCTTCCTCAAGGAACCCACCAAC **************************************SALLCVWGV*SVSQRDTPPL*SALCAPQNISPPPPQRDMCAPRDTPQKKKKRGDTPPRVSLFSPLSQKRGENTPPHTPQN >>AF260530 Drosophila melanogaster Dscam gene, complete cds. .....................................}AGCTTGTGGTAGTCAGACCCTAGCTGCCAATCCCCCAGATGCCGACCAAAAAGGACCCGTCTTCCTCAAGGAACCCACCAAC **************************************SALLCVWGV*SVSQRDTPPL*SALCAPQNISPPPPQRDMCAPRDTPQKKKKRGDTPPRVSLFSPLSQKRGENTPPHTPQN +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >exon|1|exon3 XXX >>DQ317106 Drosophila yakuba Dscam gene, exons 3 through 24. TATTTACTAATTGGCGGCGTTGTTCTTGTTTCATTTC}AGCTGGTGGTAGTCAGACCCTGGCTGCCAATCCCCCCGATGCCGACCAAAAAGGACCCGTCTTTCTCAAGGAACCCACCAAC YIFLYTL*NILWGARGARVLCVFSLLCVFFSHIFF***SALWGVWGV*SVSQRDTPPLWGALCAPQNISPPPPPRDMCAPRDTPQKKKKRGDTPPRVSLFFSLSQKRGENTPPHTPQN +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >exon|1|exon3 XXX >>DQ317109 Drosophila pseudoobscura Dscam gene, exons 3 through 24. ??????????????????????????????????????AGCTTGTGGCAGTCAGACTTTGGCTGCCAATCCACCAGATGCCGACCAGAAGGGACCCGTCTTCCTCAAAGAGCCCACCAAC **************************************SALLCVWGAQSVSQRDTLFLWGALCAPQNISPHTPQRDMCAPRDTPQREKRGGDTPPRVSLFSPLSQKKRESAPPHTPQN +++++++++++++++++++++++++++++++++++++++++++ >exon|1|exon3 XXX I'm aware of the following related alignment literature, open to ideas: $ string_test -about|unix2dos >/dev/clipboard Contact: marchywka at hotmail.com Nov 2007 Comment: uses some indexing to get speed up, Comment: motivation for RC rules from this etc , Ref:http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1431710 Commment: and should work well on text or (modified slightly ) binary code too Note: More code in mm_align_tool Note: Based loosely on references such as these but 'common sense' Note: seemed to work well as these are after-the-fact lookups Ref: http://www.google.com/search?hl=en&safe=off&q=string+alignment+site%3Aciteseer.ist.psu.edu Ref: http://citeseer.ist.psu.edu/csuros05rapid.html Comment: Csuros, M., Ma, B.: Rapid homology search with two-stage extension and Comment: daughter seeds. In: Proc. 11th Int. Computing and Combinatorics Conf. (COCOON). Comment: Volume 3595 of LNCS., Springer-Verlag (2005) 104-- 114 Ref: http://citeseer.ist.psu.edu/468459.html Ref: http://citeseer.ist.psu.edu/kahveci04speeding.html Feb 2 2008 09:35:40 string_test.h182 Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. > _________________________________________________________________ Need to know the score, the latest news, or you need your Hotmail?-get your "fix". http://www.msnmobilefix.com/Default.aspx From larye at info-engineering-svc.com Mon Feb 18 16:47:00 2008 From: larye at info-engineering-svc.com (Larye Parkins) Date: Mon, 18 Feb 2008 14:47:00 -0700 Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing In-Reply-To: References: Message-ID: <47B9FCD4.1030002@info-engineering-svc.com> Mike Marchywka wrote: > Hi, > As I mentioned in previous posts, I'm using the drosophila DSCAM genes for testing some tools. > I assembled a fasta file composed of 3 fly entries, > > $ cat all_fasta | grep ">" > >>AF260530 Drosophila melanogaster Dscam gene, complete cds. >>DQ317106 Drosophila yakuba Dscam gene, exons 3 through 24. >>DQ317109 Drosophila pseudoobscura Dscam gene, exons 3 through 24. > > > and tried aligning them with clustalw but minutes later still didn't have a result. I was wondering if > someone could suggest a set of parameters or alternative alignment tool to do a fast > alignment, even if a bit sloppy. I had always used to slow/accurate approach and don't > know what options may be available for faster work- these sequences are each about 50k long. > We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid alignments of whole genomes, genomes and contigs, and searching for repeats and inverted repeats in multiple sequences. MUMmer is very fast and has nucleotide and translated protein modes, as well as scatterplot graphical output, so is very good for finding regions of high identity in large sequences and graphically highlighting areas of interest. > > In the meantime, I was able to get a satisfactory result using exact string matches using successively > shorter and shorter strings. This approach yields acceptable results in under a minute and, if needed, you > could segment the questionable areas and feed them to clustal or other tool for "better" alignment. > It seems to be fast due to only comparing sequences to a reference sequence ( O(n*l^2) but "l" can be smaller > than sequence length as unique features can be found O(l*log(l)) ) . There are, of course, likely to > be various pathological cases but for sequences known to be similar it seems to work ok and the indexing > feature allows extraction of substrings with particular distributions ( occuring only once in each sample for example). > I have aligned 2 ecoli strains in perhaps a few minutes and there weren't any obvious pathological > results ( I obviously didn't check the whole thing either by eye or programatically). > > Others have asked about testing method, I'd like to show how I'm going about this with the DSCAM example. > The alignment is only one part of more general interest in finding similar/different features between samples. > These sequences, it turns out, have exon locations in the ncbi entries. So, it was pretty easy to check the alignments > by examining the locations of the exons in the aligned composite. In this case, I aligned as follows, > ... > I'm aware of the following related alignment literature, open to ideas: > > $ string_test -about|unix2dos >/dev/clipboard > > Contact: marchywka at hotmail.com Nov 2007 > Comment: uses some indexing to get speed up, > Comment: motivation for RC rules from this etc , > Ref:http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1431710 > Commment: and should work well on text or (modified slightly ) binary code too > Note: More code in mm_align_tool > Note: Based loosely on references such as these but 'common sense' > Note: seemed to work well as these are after-the-fact lookups > Ref: http://www.google.com/search?hl=en&safe=off&q=string+alignment+site%3Aciteseer.ist.psu.edu > Ref: http://citeseer.ist.psu.edu/csuros05rapid.html > Comment: Csuros, M., Ma, B.: Rapid homology search with two-stage extension and > Comment: daughter seeds. In: Proc. 11th Int. Computing and Combinatorics Conf. (COCOON). > Comment: Volume 3595 of LNCS., Springer-Verlag (2005) 104-- 114 > Ref: http://citeseer.ist.psu.edu/468459.html > Ref: http://citeseer.ist.psu.edu/kahveci04speeding.html > Feb 2 2008 09:35:40 string_test.h182 > > > > > > Thanks. > > > > > Mike Marchywka > 586 Saint James Walk > Marietta GA 30067-7165 > 404-788-1216 (C)<- leave message > 989-348-4796 (P)<- emergency only > marchywka at hotmail.com > Note: Hotmail is blocking my mom's entire > ISP claiming it is to reduce spam but probably > to force users to use hotmail. Please DON'T > assume I am ignoring you and try > me on marchywka at yahoo.com if no reply > here. Thanks. > > > _________________________________________________________________ > Need to know the score, the latest news, or you need your Hotmail?-get your "fix". > http://www.msnmobilefix.com/Default.aspx > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > > -- -- Larye D. Parkins Information Engineering Services PMB 435, 610 N. 1st St., Ste 5 Hamilton, MT 59840 http://www.info-engineering-svc.com Making IT work since 1965. Member of: ACM, IEEE Computer Society, USENIX, SAGE, LOPSA From landman at scalableinformatics.com Mon Feb 18 19:27:13 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 18 Feb 2008 19:27:13 -0500 Subject: [BiO BB] MPI-HMMER mercurial repo now public Message-ID: <47BA2261.2010902@scalableinformatics.com> [forwarded] The MPI-HMMER mercurial repository has been made publicly viewable. In addition, users may now download the most up-to-date snapshot of the MPI-HMMER source code through the "Releases" link available on the website. The snapshot is generated each time a commit is made to the mercurial repository. Current updates include support for MPICH2 and a couple of small memory leaks have been plugged. JP [ed uri: http://mpihmmer.org repository: http://mpihmmer.org/hg ] _______________________________________________ Mpihmmer mailing list Mpihmmer at mail.scalableinformatics.com http://lists.scalableinformatics.com/mailman/listinfo/mpihmmer -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From marchywka at hotmail.com Tue Feb 19 11:51:39 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 19 Feb 2008 11:51:39 -0500 Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing In-Reply-To: <47B9FCD4.1030002@info-engineering-svc.com> References: <47B9FCD4.1030002@info-engineering-svc.com> Message-ID: > We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid > alignments of whole genomes, genomes and contigs, and searching for Thanks- that looks like a good tool that I didn't know about. I noticed they advertize e coli results prompting me to go back and check my own. I'd have to go check the suffix tree literature to see what exactly they claim to do in 17 seconds on e coli, but under cygwin, I was able to index all matching strings of length 25 or more, in about 67 seconds , $ date;$progpath/string_test -fastas both_fasta -index 8 -length 25 -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchors ;date Sat Nov 10 18:45:23 EST 2007 string_test.cpp177 loaded 2 fastas Sat Nov 10 18:46:30 EST 2007 and create a coarse alignment in another 25 seconds, $ date; $progpath/mm_align_tool -fastas both_fasta -v -pair_rules anchors -doall -pair_align 0 -output text> align1 ;date Sat Nov 10 18:50:01 EST 2007 mm_hit_classes.h389 annotation_model.h57 Loaded 33373 pair rules. mm_align_tool.cpp309 Doing string PAIR align with cutoff 3 mm_align_tool.h227 do_all with only one rule, did you mean -mrules? mm_align_tool.cpp318 doing 0 vs 1 mm_align_tool.cpp326 do hit dump rules Sat Nov 10 18:50:26 EST 2007 Do you have actual timing tests for various complete tasks or is 17 seconds about it? So, ok 67+25=92 seconds is not real impressive compared to 17, and I'm not sure how much I can blame cygwin for this :) I guess once I'm sure I have a useful algorithm, I can subtract IO time which has been significant in many cases. Someone also privately suggested blast's bl2seq and I would point out that this is quite fast on pairs of 50k sequences. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. _________________________________________________________________ Shed those extra pounds with MSN and The Biggest Loser! http://biggestloser.msn.com/ From marchywka at hotmail.com Tue Feb 19 15:12:09 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 19 Feb 2008 15:12:09 -0500 Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing In-Reply-To: References: <47B9FCD4.1030002@info-engineering-svc.com> Message-ID: > So, ok 67+25=92 seconds is not real impressive compared to 17, and I'm not sure how > much I can blame cygwin for this :) I guess once I'm sure I have a useful algorithm, > I can subtract IO time which has been significant in many cases. I wasn't going to bother to look given the time differences are> 4x but I did note they tested on a 3Ghz Pentium 4 and I have something that comes up as "x86 Family 6 Model 8 Stepping 3" which is probably ca. 1 Ghz ( I never bothered to check since I thought a 2-3x factor wasn't important). I guess by the time you subtract IO it may be pretty close. It would be hard to blame cygwin for the computational time however :) > From: marchywka at hotmail.com > To: bbb at bioinformatics.org; larye at info-engineering-svc.com > Date: Tue, 19 Feb 2008 11:51:39 -0500 > Subject: Re: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing > > >> We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid >> alignments of whole genomes, genomes and contigs, and searching for > > Thanks- that looks like a good tool that I didn't know about. I noticed they advertize e coli results > prompting me to go back and check my own. I'd have to go check the suffix tree literature > to see what exactly they claim to do in 17 seconds on e coli, but under cygwin, I was able to > index all matching strings of length 25 or more, in about 67 seconds , > > $ date;$progpath/string_test -fastas both_fasta -index 8 -length 25 -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchors ;date > Sat Nov 10 18:45:23 EST 2007 > string_test.cpp177 loaded 2 fastas > Sat Nov 10 18:46:30 EST 2007 > > > and create a coarse alignment in another 25 seconds, > > $ date; $progpath/mm_align_tool -fastas both_fasta -v -pair_rules anchors -doall -pair_align 0 -output text> align1 ;date > Sat Nov 10 18:50:01 EST 2007 > mm_hit_classes.h389 > annotation_model.h57 Loaded 33373 pair rules. > mm_align_tool.cpp309 Doing string PAIR align with cutoff 3 > mm_align_tool.h227 do_all with only one rule, did you mean -mrules? > mm_align_tool.cpp318 doing 0 vs 1 > mm_align_tool.cpp326 do hit dump rules > Sat Nov 10 18:50:26 EST 2007 > > > Do you have actual timing tests for various complete tasks or is 17 seconds about it? > So, ok 67+25=92 seconds is not real impressive compared to 17, and I'm not sure how > much I can blame cygwin for this :) I guess once I'm sure I have a useful algorithm, > I can subtract IO time which has been significant in many cases. > Someone also privately suggested blast's bl2seq and I would point out that this is quite fast on pairs > of 50k sequences. > > > > > Mike Marchywka > 586 Saint James Walk > Marietta GA 30067-7165 > 404-788-1216 (C)<- leave message > 989-348-4796 (P)<- emergency only > marchywka at hotmail.com > Note: Hotmail is blocking my mom's entire > ISP claiming it is to reduce spam but probably > to force users to use hotmail. Please DON'T > assume I am ignoring you and try > me on marchywka at yahoo.com if no reply > here. Thanks. > > > _________________________________________________________________ > Shed those extra pounds with MSN and The Biggest Loser! > http://biggestloser.msn.com/ > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Helping your favorite cause is as easy as instant messaging.?You IM, we give. http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join From Sterten at aol.com Tue Feb 19 11:11:38 2008 From: Sterten at aol.com (Sterten at aol.com) Date: Tue, 19 Feb 2008 11:11:38 EST Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing Message-ID: I recommend this for alignment: _http://align.bmr.kyushu-u.ac.jp/mafft/online/server/_ (http://align.bmr.kyushu-u.ac.jp/mafft/online/server/) From ethan.strauss at promega.com Tue Feb 19 11:29:25 2008 From: ethan.strauss at promega.com (Ethan Strauss) Date: Tue, 19 Feb 2008 10:29:25 -0600 Subject: [BiO BB] database search/alignment with ciruclar molecules? In-Reply-To: <47B9FCD4.1030002@info-engineering-svc.com> References: <47B9FCD4.1030002@info-engineering-svc.com> Message-ID: Hi, I have created a database which holds plasmid sequences and I am running into an issue with doing database similarity searches due to the fact that the molecules are circular. Right now, I am using a brute force approach where I pull each plasmid out of the database and perform a Smith Watermann alignment to it to find similar sequences. I plan to go to BLAST sometime soon. Anyway, I am having problems due to the fact that the molecules involved are circular, but the alignment treats them as linear. For Smith Watermann, I know I can deal with circularity by treating the sequences as dimers, but I don't know if that is the best way to approach it and it will make something which is incredibly slow incredibly slower! I would appreciate any thoughts on this issue. Thanks! Ethan Ethan Strauss Ph.D. Bioinformatics Scientist Promega Corporation 2800 Woods Hollow Rd. Madison, WI 53711 608-274-4330 800-356-9526 ethan.strauss at promega.com From larye at info-engineering-svc.com Tue Feb 19 13:46:48 2008 From: larye at info-engineering-svc.com (Larye Parkins) Date: Tue, 19 Feb 2008 11:46:48 -0700 (MST) Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing In-Reply-To: Message-ID: On Tue, 19 Feb 2008, Mike Marchywka wrote: ... > Do you have actual timing tests for various complete tasks or is 17 > seconds about it? Aligning 172 sequences totaling 1.2MB with each other (average ~7000 bases each, longest 22,830 bases): real 3m17.458s user 3m2.385s sys 0m9.294s MUMmer output generated 693 alignment files, out of the possible 29584 combinations of sequences, with alignment lengths ranging from about 90 bases to 22830 (the longest with itself). The process used the 1.2MB multi-sequence file as both reference and query for 'nucmer,' then ran 'show-coords' to generate a delta file. The majority of the run-time was spent parsing the delta file and generating the alignments from sequence pairs with significant alignments, using 'show-aligns.' The final step was to generate a Postscript scatterplot. I later rewrote that part to generate 16 separate plots with 43x43 sequences each to make them readable on standard paper size. ---cut--- #!/usr/bin/env perl -w $pref="all"; $ref=$ARGV[0]; $qry=$ref; system("nucmer --prefix=${pref} $ref $qry"); system("show-coords -rcl ${pref}.delta > ${pref}.coords"); open(DELTA,"<${pref}.delta"); while () { next if $_ !~ /^>/; @inlin = split(/ /,$_); $inlin[0] =~ y/>//d; print STDERR "Processing $inlin[0] -> $inlin[1]\n"; system("show-aligns ${pref}.delta $inlin[0] $inlin[1] > ${inlin[0]}_${inlin[1]}.aligns"); } system("delta-filter -q -r ${pref}.delta > ${pref}.filter"); system("mummerplot ${pref}.delta -R $ref -Q $qry --layout -p ${pref} -S -t postscript"); ---cut--- Example alignment file output: -- BEGIN alignment [ +1 89 - 187 | -1 11054 - 10956 ] 89 gccatcgcagagcttcgctaagctcactgaacgacagcagcagtatgct 11054 gccatcgcagagcttcgctaagctcactgagcggcagcagcagtatgct ^ ^ 138 acgttcctctccctcgccgcctttgctggagcccccgtcctcttcgatc 11005 acgttcctctccctcgccgcctttgctggagcccccgtcctcttcgatc 187 a 10956 a -- END alignment [ +1 89 - 187 | -1 11054 - 10956 ] In this case, the alignment is forward versus reverse strands, with two SNPs detected. System: Sun Blade 2000 (2x900MHz SPARC), Solaris 10; MUMmer 3.20, compiled 32-bit. -- Larye D. Parkins Information Engineering Services PMB 435, 610 N. 1st St., Ste 5 Hamilton, MT 59840 http://www.info-engineering-svc.com Making IT work since 1965. Member of: ACM, IEEE Computer Society, USENIX, SAGE, LOPSA On Tue, 19 Feb 2008, Mike Marchywka wrote: > > > We have been using MUMmer3 (http://mummer.sourceforge.net) for rapid > > alignments of whole genomes, genomes and contigs, and searching for > > Thanks- that looks like a good tool that I didn't know about. I > noticed they advertize e coli results prompting me to go back and > check my own. I'd have to go check the suffix tree literature to see > what exactly they claim to do in 17 seconds on e coli, but under > cygwin, I was able to index all matching strings of length 25 or more, > in about 67 seconds , > > $ date;$progpath/string_test -fastas both_fasta -index 8 -length 25 > -fix 12 -output 3 -filterN -filterID -status -fcompare_all> anchors > ;date Sat Nov 10 18:45:23 EST 2007 string_test.cpp177 loaded 2 fastas > Sat Nov 10 18:46:30 EST 2007 > > > and create a coarse alignment in another 25 seconds, > > $ date; $progpath/mm_align_tool -fastas both_fasta -v -pair_rules > anchors -doall -pair_align 0 -output text> align1 ;date Sat Nov 10 > 18:50:01 EST 2007 mm_hit_classes.h389 annotation_model.h57 Loaded > 33373 pair rules. mm_align_tool.cpp309 Doing string PAIR align with > cutoff 3 mm_align_tool.h227 do_all with only one rule, did you mean > -mrules? mm_align_tool.cpp318 doing 0 vs 1 mm_align_tool.cpp326 do hit > dump rules Sat Nov 10 18:50:26 EST 2007 > > > Do you have actual timing tests for various complete tasks or is 17 > seconds about it? So, ok 67+25=92 seconds is not real impressive > compared to 17, and I'm not sure how much I can blame cygwin for this > :) I guess once I'm sure I have a useful algorithm, I can subtract IO > time which has been significant in many cases. Someone also privately > suggested blast's bl2seq and I would point out that this is quite fast > on pairs of 50k sequences. > > > > > Mike Marchywka > 586 Saint James Walk > Marietta GA 30067-7165 > 404-788-1216 (C)<- leave message > 989-348-4796 (P)<- emergency only > marchywka at hotmail.com > Note: Hotmail is blocking my mom's entire > ISP claiming it is to reduce spam but probably > to force users to use hotmail. Please DON'T > assume I am ignoring you and try > me on marchywka at yahoo.com if no reply > here. Thanks. > > > _________________________________________________________________ > Shed those extra pounds with MSN and The Biggest Loser! > http://biggestloser.msn.com/ > > From marchywka at hotmail.com Wed Feb 20 07:39:14 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Wed, 20 Feb 2008 07:39:14 -0500 Subject: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing In-Reply-To: References: Message-ID: Thanks. Do you know off hand if this has a memory saving mode? It wasn't immediately obvious from the ./mafft --help Also, there may be some ability to use concepts from here if you want to optimize this :http://www.fftw.org/ I tried it on the e coli strains and the memory usage went to 1Gb ( VM limit, I only have 256M physical) and of course ( well, doesn't have to do but normally does ) CPU went to low levels ( more VM action than computation presumably). It seemed to know it should go to something called "memsave mode" but I'm not sure it realized it was playing with virtaul memory ( do you know if it is supposed to be cache aware or otherwise know about memory?): $ ./mafft --retree 1 /cygdrive/e/new/temp/both_fasta> xxxxx reallocating... done. reallocating... done. generating 200PAM scoring matrix for nucleotides ... done scoremtx = -1 Gap Penalty = -1.53, +0.00, -0.12 Making a distance matrix .. 1 / 2 done. Constructing a UPGMA tree ... 0 / 2 done. Progressive alignment ... STEP 1 / 1 len1=4979619, len2=4639675, Switching to the memsave mode fm FFT ... Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. > From: Sterten at aol.com > Date: Tue, 19 Feb 2008 11:11:38 -0500 > To: bbb at bioinformatics.org > Subject: Re: [BiO BB] Need fair alignment tool comparison/ using DSCAM for tool testing > > I recommend this for alignment: > > _http://align.bmr.kyushu-u.ac.jp/mafft/online/server/_ > (http://align.bmr.kyushu-u.ac.jp/mafft/online/server/) > > > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Climb to the top of the charts!?Play the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan From bsmagic at gmail.com Fri Feb 22 02:23:17 2008 From: bsmagic at gmail.com (Sheng Wang) Date: Fri, 22 Feb 2008 15:23:17 +0800 Subject: [BiO BB] LINES SINES MICROSATELLITES and TFOs In-Reply-To: <16381612.686151200151021279.JavaMail.coremail@bj126app66.126.com> References: <16381612.686151200151021279.JavaMail.coremail@bj126app66.126.com> Message-ID: <793f8aed0802212323r17818144s1d3c732616f2c4a3@mail.gmail.com> better use WU-BLAST as enginee. On 1/12/08, ocean wrote: > > > i think you can try ucsc, use the repeatmasker information of human > > the following is RepeatMasker (rmsk) Track Description > > Short interspersed nuclear elements (SINE), which include ALUs > Long interspersed nuclear elements (LINE) > Long terminal repeat elements (LTR), which include retroposons > DNA repeat elements (DNA) > Simple repeats (micro-satellites) > Low complexity repeats > Satellite repeats > RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) > Other repeats, which includes class RC (Rolling Circle) > Unknown > > > > ?2008-01-10?"Narendran GR" ??? > > Hi friends.... > > > I want to collect information on LINES SINES MICROSATELLITES and TFOs in > > Human genome... > > > > Where can i find the information??? > > Is there any information about them in NCBI??? > > Please help me with the same.... > > > > -- > Regards > Narendran G R > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb > -- Best Regards Sheng Wang From marchywka at hotmail.com Fri Feb 22 06:47:57 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Fri, 22 Feb 2008 06:47:57 -0500 Subject: [BiO BB] LINES SINES MICROSATELLITES and TFOs In-Reply-To: <793f8aed0802212323r17818144s1d3c732616f2c4a3@mail.gmail.com> References: <16381612.686151200151021279.JavaMail.coremail@bj126app66.126.com> <793f8aed0802212323r17818144s1d3c732616f2c4a3@mail.gmail.com> Message-ID: On a related but more primitive topic, anyone have test cases or new tools for CRISPR's? ( these come up on pubmed, but I'll pass thislink along for background since it is all I have in front of me now: http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17537822 ) I could run my modified tools to find suspects on the e coli strains I happen to have, $ date;$progpath/rules_annotater -fastas f2 -rci -regex "[\-1]{24,47}.{26,72}[\-1]{24,47}"> f2_crisps 2>f2_diags ;date Wed Feb 20 14:04:42 EST 2008 Wed Feb 20 14:04:53 EST 2008 and in fact determine that there were some suspects regularly separated by about 120 bases, but have no idea if these are "right:" >gi|48994873|gb|U00096.2| Escherichia coli K12 MG1655, complete genome $ cat f2_crsips | awk '{print $1" "$1-last" "$3; last=$1}' [...] 2660373 94198 CACTGTAGGCCTGATAAGACGCATTACGCGTCGCATCAGGCAACGGCTGTCGGATGCGGCGTGAACGCCTTATCCGACCTACGGTTCTGTTCACTGTAGGCCTGATAAGACGCAT 2660488 115 TACGCGTCGCATCAGGCAACGGCTGTCGGATGCGGCGTGAACGCCTTATCCGACCTACGGTTCTGTTCACTGTAGGCCTGATAAGACGCATTACGCGTCGCATCAGGCAACGGCT 2875903 215415 CCCGGTTTATCCCCGCTGGCGCGGGGAACTCCCGGGGGATAATGTTTACGGTCATGCGCCCCCCGGTTTATCCCCGCTGGCGCGG 2876027 124 CGGTTTATCCCCGCTGGCGCGGGGAACTCAAGCTGGCTGGCAATCTCTTTCGGGGTGAGTCCGGTTTATCCCCGCTGGCGCGGGG 2876149 122 CGGTTTATCCCCGCTGGCGCGGGGAACTCGCAGGCGGCGACGCGCAGGGTATGCGCGATTCGCGGTTTATCCCCGCTGGCGCGGGG 2876273 124 CGGTTTATCCCCGCTGGCGCGGGGAACTCTCAACATTATCAATTACAACCGACAGGGAGCCCGGTTTATCCCCGCTGGCGCGGGG 2876394 121 GCGGTTTATCCCCGCTGGCGCGGGGAACTCTGCGTGAGCGTATCGCCGCGCGTCTGCGAAAGCGGTTTATCCCCGCTGGCGCGGG 2902035 25641 GGTTTATCCCCGCTGGCGCGGGGAACTCGACAGAACGGCCTCAGTAGTCTCGTCAGGCTCCGGTTTATCCCCGCTGGCGCGGGGA 2902155 120 TCGGTTTATCCCCGCTGGCGCGGGGAACACGGGCGCACGGAATACAAAGCCGTGTATCTGCTCGGTTTATCCCCGCTGGCGCGGG 2902279 124 GGTTTATCCCCGCTGGCGCGGGGAACACGAAATGCTGGTGAGCGTTAATGCCGCAAACACAGGTTTATCCCCGCTGGCGCGGGGA 2945406 43127 GACGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCATAACCCGAAGGTCGTCGGTTCAAATCCGGCCCCCGCAACCAATTAAAATTTGATGAAGTAAAGCAGTACGGTGACGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCA 2945554 148 TAACCCGAAGGTCGTCGGTTCAAATCCGGCCCCCGCAACCAATCAAATTTGATGAAGTAAAAGCAGTACGGTGACGCGGGGTGGAGCAGCCTGGTAGCTCGTCGGGCTCATAACCCGAAGGTCGTCGGTTCAAATCCGGCCCCCGCAA 3229358 283804 CTGCACCGCGCCACTGGCGGATGCGGCGTGAACGCCTTATCCGCCCTACATGTGTGTTCCCGTAGGTCGGATAAGACGCGACAAGCGTCGCATCCGGCATCTGCACCGCGCCACTGGCGGATGCGGCG [...] Wasn't sure if anyone knows about these things- known test cases or open issues. Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. > Date: Fri, 22 Feb 2008 15:23:17 +0800 > From: bsmagic at gmail.com > To: bbb at bioinformatics.org > Subject: Re: [BiO BB] LINES SINES MICROSATELLITES and TFOs > > better use WU-BLAST as enginee. > > On 1/12/08, ocean wrote: >> >> >> i think you can try ucsc, use the repeatmasker information of human >> >> the following is RepeatMasker (rmsk) Track Description >> >> Short interspersed nuclear elements (SINE), which include ALUs >> Long interspersed nuclear elements (LINE) >> Long terminal repeat elements (LTR), which include retroposons >> DNA repeat elements (DNA) >> Simple repeats (micro-satellites) >> Low complexity repeats >> Satellite repeats >> RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) >> Other repeats, which includes class RC (Rolling Circle) >> Unknown >> >> >> >> ?2008-01-10?"Narendran GR" ??? >> >> Hi friends.... >> >>> I want to collect information on LINES SINES MICROSATELLITES and TFOs in >>> Human genome... >>> >>> Where can i find the information??? >>> Is there any information about them in NCBI??? >> >> Please help me with the same.... >>> >> >> -- >> Regards >> Narendran G R >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> _______________________________________________ >> BBB mailing list >> BBB at bioinformatics.org >> http://www.bioinformatics.org/mailman/listinfo/bbb >> > > > > -- > Best Regards > Sheng Wang > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bbb _________________________________________________________________ Connect and share in new ways with Windows Live. http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008 From idoerg at gmail.com Sun Feb 24 18:18:03 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Sun, 24 Feb 2008 15:18:03 -0800 Subject: [BiO BB] First call for talks: AFP / Biosapiens 2008 July 18-19 Toronto, Canada Message-ID: Joint AFP-Biosapiens SIG http://2008.BioFunctionPrediction.org The Automated Function Prediction (AFP) SIG and the Biosapiens European network of excellence are teaming up to hold a two-day Special Interest Group (SIG) meeting July 18-19, 2007 alongside ISMB 2008 in Toronto, Canada. The deluge of genomic information is challenging biologists to annotate this data, from locating genes in the raw data through predicting the function form protein sequence and structure. AFP and Biosapiens share many common goals, and this year we have decided to join forces for a SIG that will deal with a wide scope of gene, protein, and genomic annotations. For more information: http://2008.BioFunctionPrediction.org Talks are sought in, but are not limited to: * Various aspects of gene and protein function prediction o Function prediction using sequence based methods. This would include "classic" methods such as detection of functional motifs and inferring function from sequence similarity. o Function from genomic information: prediction by genomic location; locus comparison with other organisms; function gain and loss. o Phylogeny based methods o Function from molecular interactions o Function from structure o Function prediction using combined methods o "Meta-talks" discussing the limitations and horizons of computational function prediction. o Assessing function prediction programs * Genomic annotation o Gene finding o Genome visualization o Collaborative annotation o Cooperation between experimental and computational biologists o Metagenomics This year we are considering proposals fro mini-tutorials. For more information see the AFP / Biosapiens site. Confirmed speakers include: * Barry Honig,Columbia University and Howard Hughes Medical Institute, USA * Peer Bork, European Molecular Biology Laboratories, Germany * Andrew Emily, University of Toronto, Canada * Olga Troyanskaya, Princeton University, USA * Kimmen Sjolander, University of California Berkeley, USA Important dates:: - April 20, 2008: Talk, tutorial and poster abstracts due. - May 16, 2008: notification of acceptance - May 25, 2007: final abstracts due - July 18-19, 2008: AFP-Biosapiens SIG alongside ISMB 2008 in Toronto, Canada. For inquiries, including sponsorship opportunities, please email: afpbiosap2008 at gmail.com -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From bioinfosm at gmail.com Mon Feb 25 13:26:28 2008 From: bioinfosm at gmail.com (Samantha Fox) Date: Mon, 25 Feb 2008 12:26:28 -0600 Subject: [BiO BB] tissue specificity Message-ID: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com> Hi all. Are there any tools, or prediction software for tissue specificity; given a set of genes... what tissue they are most likely to be, or given a set of expression data and tissues.. integrate it all to determing tissue specific genes! Thanks .. ~S From idoerg at gmail.com Mon Feb 25 14:12:58 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 25 Feb 2008 11:12:58 -0800 Subject: [BiO BB] First call for participation AFP / Biosapiens SIG: deadline dates correction Message-ID: Please note important change in deadline dates. Important dates: - March 31, 2008: Talk, tutorial and poster abstracts due. - April 20, 2008: notification of acceptance - May 5, 2007: final abstracts due - July 18-19, 2008: AFP-Biosapiens SIG alongside ISMB 2008 in Toronto, Canada. Joint AFP-Biosapiens SIG http://2008.BioFunctionPrediction.org The Automated Function Prediction (AFP) SIG and the Biosapiens European network of excellence are teaming up to hold a two-day Special Interest Group (SIG) meeting July 18-19, 2007 alongside ISMB 2008 in Toronto, Canada. The deluge of genomic information is challenging biologists to annotate this data, from locating genes in the raw data through predicting the function form protein sequence and structure. AFP and Biosapiens share many common goals, and this year we have decided to join forces for a SIG that will deal with a wide scope of gene, protein, and genomic annotations. This year we are also considering proposals for mini-tutorials. see the AFP / Biosapiens 2008 site for more details. For more information: http://2008.BioFunctionPrediction.org Talks are sought in, but are not limited to: * Various aspects of gene and protein function prediction o Function prediction using sequence based methods. This would include "classic" methods such as detection of functional motifs and inferring function from sequence similarity. o Function from genomic information: prediction by genomic location; locus comparison with other organisms; function gain and loss. o Phylogeny based methods o Function from molecular interactions o Function from structure o Function prediction using combined methods o "Meta-talks" discussing the limitations and horizons of computational function prediction. o Assessing function prediction programs * Genomic annotation o Gene finding o Genome visualization o Collaborative annotation o Cooperation between experimental and computational biologists o Metagenomics This year we are considering proposals fro mini-tutorials. For more information see the AFP / Biosapiens site. Confirmed speakers include: * Barry Honig,Columbia University and Howard Hughes Medical Institute, USA * Peer Bork, European Molecular Biology Laboratories, Germany * Andrew Emily, University of Toronto, Canada * Olga Troyanskaya, Princeton University, USA * Kimmen Sjolander, University of California Berkeley, USA Important dates:: - March 31, 2008: Talk, tutorial and poster abstracts due. - April 20, 2008: notification of acceptance - May 5, 2007: final abstracts due - July 18-19, 2008: AFP-Biosapiens SIG alongside ISMB 2008 in Toronto, Canada. For inquiries, including sponsorship opportunities, please email: afpbiosap2008 at gmail.com -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From marchywka at hotmail.com Mon Feb 25 15:39:12 2008 From: marchywka at hotmail.com (Mike Marchywka) Date: Mon, 25 Feb 2008 15:39:12 -0500 Subject: [BiO BB] tissue specificity In-Reply-To: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com> References: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com> Message-ID: Normally when I post to this list I try to provide some background as I have no idea where everyone else is interest-wise. I vaguely remember running into something related while researching some other topic and went back to do a quick literature search. I'm not sure of your immediate problem or state of knowledge but, from what I can find, this is generally an open area. For example, you could try reading stuff like this, http://www.ncbi.nlm.nih.gov/pubmed/18194723?ordinalpos=9&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum Clin Lab Med. 2008 Mar;28(1):127-43. Links Data mining for biomarker development: a review of tissue specificity analysis.Klee EW. Division of Experimental Pathology, Department of Laboratory Medicine and Pathology, Mayo Clinic, 200 1st Street SW, Stabile 2-50, Rochester, MN 55905, USA I was surprised to find many genes labelled, "tissue specific" http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term="tissue%20specific" and maybe you could look into some of the related publications. Do you have a specific thesis, problem, or objective? Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com Note: Hotmail is blocking my mom's entire ISP claiming it is to reduce spam but probably to force users to use hotmail. Please DON'T assume I am ignoring you and try me on marchywka at yahoo.com if no reply here. Thanks. > Date: Mon, 25 Feb 2008 12:26:28 -0600 > From: bioinfosm at gmail.com > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] tissue specificity > > Hi all. > > Are there any tools, or prediction software for tissue specificity; given a > set of genes... what tissue they are most likely to be, or given a set of > expression data and tissues.. integrate it all to determing tissue specific > genes! > > Thanks .. > > ~S > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ Need to know the score, the latest news, or you need your Hotmail?-get your "fix". http://www.msnmobilefix.com/Default.aspx From bioinfosm at gmail.com Mon Feb 25 15:54:53 2008 From: bioinfosm at gmail.com (Samantha Fox) Date: Mon, 25 Feb 2008 14:54:53 -0600 Subject: [BiO BB] tissue specificity In-Reply-To: References: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com> Message-ID: <726450810802251254y4a98ea83q8da04ca7ac86e000@mail.gmail.com> Mike, I appreciate your response. Well, my objective is to look at expression data, with respect to tissue specificity. I needed specific help in this case, as I am not able to find any help or tools/software to deal with tissue specificity information. I came across this tool GeneMerge (http://genemerge.bioteam.net/), and tissuedb from HUSAR group ( http://genome.dkfz-heidelberg.de/menu/tissue_db/index.html) ... but have not yet conquered them. Anyone with experience on these or similar tools, do point me to FAQs or other details, when using a group of genes to determine tissue specificity of the group. ~S On Mon, Feb 25, 2008 at 2:39 PM, Mike Marchywka wrote: > > Normally when I post to this list I try to provide some background as I > have no idea where > everyone else is interest-wise. I vaguely remember running into something > related while > researching some other topic and went back to do a quick literature > search. I'm not sure > of your immediate problem or state of knowledge but, from what I can find, > this is generally an open > area. For example, you could try reading stuff like this, > > > http://www.ncbi.nlm.nih.gov/pubmed/18194723?ordinalpos=9&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum > > Clin Lab Med. 2008 Mar;28(1):127-43. Links > Data mining for biomarker development: a review of tissue specificity > analysis.Klee EW. > Division of Experimental Pathology, Department of Laboratory Medicine and > Pathology, Mayo Clinic, 200 1st Street SW, Stabile 2-50, Rochester, MN > 55905, USA > > > I was surprised to find many genes labelled, "tissue specific" > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term="tissue%20specific" > > and maybe you could look into some of the related publications. > > Do you have a specific thesis, problem, or objective? > > > > Mike Marchywka > 586 Saint James Walk > Marietta GA 30067-7165 > 404-788-1216 (C)<- leave message > 989-348-4796 (P)<- emergency only > marchywka at hotmail.com > Note: Hotmail is blocking my mom's entire > ISP claiming it is to reduce spam but probably > to force users to use hotmail. Please DON'T > assume I am ignoring you and try > me on marchywka at yahoo.com if no reply > here. Thanks. > > > Date: Mon, 25 Feb 2008 12:26:28 -0600 > > From: bioinfosm at gmail.com > > To: bio_bulletin_board at bioinformatics.org > > Subject: [BiO BB] tissue specificity > > > > Hi all. > > > > Are there any tools, or prediction software for tissue specificity; > given a > > set of genes... what tissue they are most likely to be, or given a set > of > > expression data and tissues.. integrate it all to determing tissue > specific > > genes! > > > > Thanks .. > > > > ~S > > _______________________________________________ > > BBB mailing list > > BBB at bioinformatics.org > > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _________________________________________________________________ > Need to know the score, the latest news, or you need your Hotmail(R)-get > your "fix". > http://www.msnmobilefix.com/Default.aspx > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board > From ryan at raaum.org Mon Feb 25 20:19:10 2008 From: ryan at raaum.org (Ryan Raaum) Date: Mon, 25 Feb 2008 20:19:10 -0500 Subject: [BiO BB] tissue specificity In-Reply-To: <726450810802251254y4a98ea83q8da04ca7ac86e000@mail.gmail.com> References: <726450810802251026p6175ab57k7a8ae290f57d40a9@mail.gmail.com> <726450810802251254y4a98ea83q8da04ca7ac86e000@mail.gmail.com> Message-ID: Have you looked at the Novartis SymAtlas? http://symatlas.gnf.org/SymAtlas/ The published reference is http://www.pnas.org/cgi/content/abstract/012025199v1 -Ryan On Mon, Feb 25, 2008 at 3:54 PM, Samantha Fox wrote: > Mike, > > I appreciate your response. Well, my objective is to look at expression > data, with respect to tissue specificity. I needed specific help in this > case, as I am not able to find any help or tools/software to deal with > tissue specificity information. > I came across this tool GeneMerge (http://genemerge.bioteam.net/), and > tissuedb from HUSAR group ( > http://genome.dkfz-heidelberg.de/menu/tissue_db/index.html) ... but have not > yet conquered them. > > Anyone with experience on these or similar tools, do point me to FAQs or > other details, when using a group of genes to determine tissue specificity > of the group. > > ~S > On Mon, Feb 25, 2008 at 2:39 PM, Mike Marchywka > wrote: > > > > > > > Normally when I post to this list I try to provide some background as I > > have no idea where > > everyone else is interest-wise. I vaguely remember running into something > > related while > > researching some other topic and went back to do a quick literature > > search. I'm not sure > > of your immediate problem or state of knowledge but, from what I can find, > > this is generally an open > > area. For example, you could try reading stuff like this, > > > > > > http://www.ncbi.nlm.nih.gov/pubmed/18194723?ordinalpos=9&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum > > > > Clin Lab Med. 2008 Mar;28(1):127-43. Links > > Data mining for biomarker development: a review of tissue specificity > > analysis.Klee EW. > > Division of Experimental Pathology, Department of Laboratory Medicine and > > Pathology, Mayo Clinic, 200 1st Street SW, Stabile 2-50, Rochester, MN > > 55905, USA > > > > > > I was surprised to find many genes labelled, "tissue specific" > > > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term="tissue%20specific" > > > > and maybe you could look into some of the related publications. > > > > Do you have a specific thesis, problem, or objective? > > > > > > > > Mike Marchywka > > 586 Saint James Walk > > Marietta GA 30067-7165 > > 404-788-1216 (C)<- leave message > > 989-348-4796 (P)<- emergency only > > marchywka at hotmail.com > > Note: Hotmail is blocking my mom's entire > > ISP claiming it is to reduce spam but probably > > to force users to use hotmail. Please DON'T > > assume I am ignoring you and try > > me on marchywka at yahoo.com if no reply > > here. Thanks. > > > > > Date: Mon, 25 Feb 2008 12:26:28 -0600 > > > From: bioinfosm at gmail.com > > > To: bio_bulletin_board at bioinformatics.org > > > Subject: [BiO BB] tissue specificity > > > > > > Hi all. > > > > > > Are there any tools, or prediction software for tissue specificity; > > given a > > > set of genes... what tissue they are most likely to be, or given a set > > of > > > expression data and tissues.. integrate it all to determing tissue > > specific > > > genes! > > > > > > Thanks .. > > > > > > ~S > > > _______________________________________________ > > > BBB mailing list > > > BBB at bioinformatics.org > > > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > _________________________________________________________________ > > Need to know the score, the latest news, or you need your Hotmail(R)-get > > > your "fix". > > http://www.msnmobilefix.com/Default.aspx > > _______________________________________________ > > BBB mailing list > > BBB at bioinformatics.org > > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > _______________________________________________ > BBB mailing list > BBB at bioinformatics.org > http://www.bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Ryan Raaum Anthropology Lehman College The City University of New York 250 Bedford Park Blvd W. Bronx, NY 10468 e: ryan.raaum at lehman.cuny.edu w: http://raaum.org o: (718) 960-8845 f: (718) 960-8406 From jeff at bioinformatics.org Fri Feb 29 18:32:53 2008 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Fri, 29 Feb 2008 18:32:53 -0500 Subject: [BiO BB] February '08 issue of the Bioinformatics.Org Newsletter is now available Message-ID: <47C89625.3070902@bioinformatics.org> The newsletter includes some of the best of our various online forums and details some of our internal (and external) activities. IN THIS ISSUE: - Unveiling Pipet (part one) - Project spotlight - Job search highlight - Franklin Award laureate (first announcement) - Upcoming events URL: http://www.bioinformatics.org/newsletter/v01-n02.pdf Cheers, Jeff -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 978 562 4800 -- From kanzure at gmail.com Thu Feb 28 15:18:54 2008 From: kanzure at gmail.com (Bryan Bishop) Date: Thu, 28 Feb 2008 14:18:54 -0600 Subject: [BiO BB] Will this implementation of the lac operon work? Message-ID: <200802281418.54503.kanzure@gmail.com> Hi all, I am designing a regulatory circuit and have been scratching my head over how to include enhancers, promoters and TATA boxes, etc. I have decided to make some progress by randomly guessing and getting feedback from the community. Here's my work: http://heybryan.org/genetic-circuits.html Basically I've taken the sequences of BBa_I14032, BBa_R0011, and BBa_B0034, attached the peptide I want to express, and have at it. It is my understanding, then, that given a lactose-full environment, my repressible circuit should express the peptide, and if I wanted to add a second circuit I could express a repressor that would bond to prevent RNA polymerase II from manufacturing my peptide as frequently. Correct? The documentation is rather sparse, so once I get my head around this I'll be sure to go back and fill in the gaps in the documents. Thanks, - Bryan (I also sent this over to OWW to see what they have to say.) ________________________________________ Bryan Bishop http://heybryan.org/ From T.Hulsen at cmbi.ru.nl Fri Feb 29 13:23:25 2008 From: T.Hulsen at cmbi.ru.nl (Tim Hulsen) Date: Fri, 29 Feb 2008 19:23:25 +0100 Subject: [BiO BB] MyJournals.org Message-ID: <007f01c87b00$2f3380a0$6bfdfea9@Tim> Do you want to have easy access to the latest issues of your favourite journals, from all over the world, just through your web browser? Please visit http://www.myjournals.org , create a login and make your pick from the 422 journals currently available. From jeff at bioinformatics.org Fri Feb 29 19:54:14 2008 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Fri, 29 Feb 2008 19:54:14 -0500 Subject: [BiO BB] List configuration In-Reply-To: <476C1868.8060101@bioinformatics.org> References: <476C1868.8060101@bioinformatics.org> Message-ID: <47C8A936.2060206@bioinformatics.org> Please note, for the reasons specified below, that email should be addressed to instead of the old address. Thanks, Jeff J.W. Bizzaro wrote: > > FYI, the Mailman mailing list system that we use no longer accepts > underscore characters in the mailing list name and was printing > "bio_bulletin_board-bounces at bioinformatics.org" as > "bio_bulletin-bounces at board" in outgoing messages. This was of > course causing messages to be bounced back, leading to many > subscribers being automatically unsubscribed. > -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 978 562 4800 --