From cannino2424 at yahoo.gr Tue Dec 6 04:54:15 2005 From: cannino2424 at yahoo.gr (Eleni Argyropoulou) Date: Tue, 6 Dec 2005 09:54:15 +0000 (GMT) Subject: [BiO BB] Bioinformatics Message-ID: <20051206095416.48090.qmail@web25512.mail.ukl.yahoo.com> Dear all, I am a student from Greece and I would like to attend to an MS in Bioinformatics, in America. I'm searching for a list of the good universities, adequate enough to arm me for the job market. I hope you can help me. Thank you anyways. Regards, Eleni Argyropoulou --------------------------------- ?????????????? Yahoo! ?????????? ?? ?????????? ???? ???? (spam); ?? Yahoo! Mail ???????? ??? ???????? ?????? ????????? ???? ??? ??????????? ????????? http://login.yahoo.com/config/mail?.intl=gr -------------- next part -------------- An HTML attachment was scrubbed... URL: From biomed06 at diee.unica.it Mon Dec 5 11:38:24 2005 From: biomed06 at diee.unica.it (BIOMED'06) Date: Mon, 5 Dec 2005 17:38:24 +0100 Subject: [BiO BB] CFP: Second International Workshop on Agents inMedicine, Computational Biology and Bioinformatics Message-ID: <091901c5f9ba$4fa8c120$c01310ac@Piccione> [We apologize if you receive multiple copies of this message] ******************************************************************************************************* MAS*BIOMED'06 Second International Workshop on Agents in Medicine, Computational Biology, and Bioinformatics (http://www.diee.unica.it/biomed06) To be held at AAMAS'06 Fifth International Joint Conference on Autonomous Agents and Multiagent Systems May 9, 2006 ******************************************************************************************************* Call for papers / participation *Motivation and Description* There is growing evidence that agent technology can be useful in designing and implementing solutions aimed at supporting the automation of medical and biological procedures. While it is essential for computer scientists and software developers to understand the specific problems in medicine and biology, it is also essential for an expert of either domain to understand the capabilities of agent technology. The main purpose of this workshop is to bring together and create synergies between researchers on these fields in order to discuss relevant issues and approaches aimed at assessing and promoting the adoption of agent technology. It is intended that the workshop mainly focuses on the benefits of adopting agent technology in: (i) storing, accessing, and distributing relevant medical or biological data, (ii) implementing the automation of information-gathering and information-inference processes in medical and biological settings, (iii) supporting e-Health, (iv) simulating and modelling biological systems. This workshop aims to attract both theoretically and practically oriented papers in these thriving areas in the intersection of medicine and biology with computer science. The workshop will also focus on supporting infrastructures, such as agent-based systems, tools, languages, ontologies, and networking facilities for medicine, bioinformatics and computational biology. Papers on the application and evaluation of agent approaches to the solution of problems in these fields will be particularly sought. We envision this workshop to be a mixture of presentations and open discussions of the attendees. A key open discussion can focus on the strength and limitations of software agents in medical and biological domains. Possible questions to be addressed can be: Why can agent technology provide a better solution than existing ones? What are the limitations of agent technology with respect to medical and biological problems? Why can software agents succeed where traditional Artificial Intelligence technology/expert systems have reached their limits? What are the priorities of actions when introducing this technology in the medical and biological fields? _Medicine and Health Care_ There is barely a country that is no t suffering from the ever increasing impacts and costs for its health care system which is not the least boosted by the steadily increasing number of drugs, diagnosis tools and methods, and treatment procedures that appear continuously on the market. On the other hand, in today's globalized world, a fast and reliable medical prevention, diagnosis and treatment is of eminent importance as can be seen, for example, from the recent problems with SARS or the bird flu. Such highly contagious and lethal diseases can threaten the globe if they were not fought immediately with the highest level of efficiency and reliability. This requires, first of all, a fast and reliable pre-detection and diagnosis, regardless of where the affected person may currently stay in the world. The situations mentioned above are some of the many that prove that the medical domain is marked by requirements for high dynamicity, distribution, flexibility, scalability, extensibility, efficiency, cooperative work and collaboration, proactivity and autonomy. Most of these features are especially strongholds of multi-agent systems/autonomous agent technology. Thus, it is proper to say that agent technology will play an increasingly important role in medicine and health care in now and the future, and will significantly enhance the ability to model, design, and build complex, inherently distributed, software systems in medical and health care domains. _Computational Biology and Bioinformatics_ With the exponential growth of data being produced and made available to genetics researchers via the Internet, there are several challenges for using this information effectively to further genetics and biomedical research. For instance, sequence and structural information exists in databases along with various tools distributed throughout the world in various formats and platforms. Also, while the GO (genome ontology) project is addressing the problem, new and sometimes conflicting terminology and vocabulary are emerging for phenotyping and annotating sequences. There is a need for autonomous and semi-autonomous methods for learning and discovering relational and conceptual knowledge, as well as by intelligently combining these distributed data and information sources. In order to harness the benefits of this continuously growing amount of information, new information and communication technologies and approaches should be adopted. *Topics of interest include but are not limited to:* * Multi-Agent Interaction in Medical or Biological Settings - Coordination of tasks and data - Collaboration in peer-to-peer networks - Multi-strategy and meta- learning for cooperative information agents * Analysis and Modelling of Data and Tasks - Multi-agent systems for medical pre-detection, diagnosis, and treatment - Multi-agent systems for patient scheduling, transplant management, community care, information access, training, internal hospital and clinic tasks, etc. - Integrated genotyping and gene linkage analysis - Multi-agent approaches to gene expression analysis - Modelling of biological processes * Architectures, Languages, Tools, and Applications - Agent-based architectures and frameworks tailored for medical or biological domains - Customization of agent-based tools, languages, and libraries fo r medical or biological domains * Knowledge management - Agent-based integration of biological knowledge - Agent-based data mining and knowledge discovery in medical or biological domains - Multi-agent information gathering in medical or biological settings - Integration of heterogeneous data sources and/or services * Ontologies for Medical or Biological Domains - Collaborative ontology construction - Distributed ontology management Those wishing to participate in the workshop are requested to submit an original research paper, not published or submitted elsewhere. Papers will be peer reviewed by at least two referees from the workshop's program committee based on the technical relevance, quality, clarity of presentation, objective analysis of the reported experiences, and novelty. High-profile survey papers could be also considered for publication. The length of a paper must not exceed 15 single-spaced A4 pages including figures, tables, and references. Papers should be formatted using the Springer LNCS style. Templates are available at Springer. The language of the workshop is English. At least one of the authors per each accepted paper should be able to register with and attend the workshop and present the paper. All the accepted papers will be printed in the workshop proceedings of the workshop. A selection of the best papers presented at this workshop will be considered for a special issue on the same theme at "Multiagent and Grid Systems - an International Journal" by IOS press. *Submission Procedure* Those wishing to participate in the workshop are requested to submit an original research paper, not published or submitted elsewhere. Papers will be peer reviewed by at least two referees from the workshop's program committee based on the technical relevance, quality, clarity of presentation, objective analysis of the reported experiences, and novelty. High-profile survey papers could be also considered for publication. The length of a paper must not exceed 15 single-spaced A4 pages including figures, tables, and references. Papers should be formatted using the Springer LNCS style. Templates are available at Springer. The language of the workshop is English. At least one of the authors per each accepted paper should be able to register with and attend the workshop and present the paper. All submissions should be sent by email, either in PDF or in PostScript format, to biomed06 AT diee.unica.it. The subject of the email should contain "medicine" or "biology" within square brackets to facilitate the organizing committee in the task of associating papers to referees. All the accepted papers will be printed in the workshop proceedings of the workshop. A selection of the best papers presented at this workshop will be considered for a special issue on the same theme at "Multiagent and Grid Systems - an International Journal" by IOS press. *Important Dates* Submission Deadline: January 15, 2006 Notification of Acceptance: February 19, 2006 Camera ready: March 5, 2006 Workshop: May 9,, 2006 *Organizing Committee* Giuliano Armano (giuliano.armano at diee.unica.it) Dept. of Electrical and Electronic Engineering - Univ. of Cagliari (Italy) Piazza D'Armi - 09123 Cagliari (Italy) Andrew Martin (martin at biochem.ucl.ac.uk) Dept. of Biochemistry & Molecular Biology - Univ. College London (UK) Darwin Building - Gower Street London WC1E 6BT - England Huaglory Tianfield (h.tianfield at gcal.ac.uk) Glasgow Caledonian University, School of Computing and Mathematical Sciences, SRIF/SHEFC Centre for Virtual Organization Technology Enabling Research (VOTER) - Director 70 Cowcaddens Road, Glasgow, G4 0BA, UK Rainer Unland (UnlandR at informatik.uni-essen.de) Inst. for Computer Science and Business Information Systems (ICB), University of Duisburg-Essen Schuetzenbahn 70, 45117 Essen, Germany *Publishing Chair* Eloisa Vargiu (vargiu at diee.unica.it) Dept. of Electrical and Electronic Engineering - Univ. of Cagliari (Italy) Piazza D'Armi - 09123 Cagliari (Italy) -------------------------------------------------------------------------------- Eloisa Vargiu, PhD IASC Group - Intelligent Agents and Soft-Computing DIEE, Dept. of Electrical and Electronic Engineering email: vargiu at diee.unica.it tel: +39 070-675.5785 fax: +39 070-6755782 mobile: +39 320-4373038 web: http://iasc.diee.unica.it -------------------------------------------------------------------------------- Il problema dell'umanita' e' che gli stupidi sono strasicuri, mentre gli intelligenti sono pieni di dubbi. B.Russel -------------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cfp.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cfp.pdf Type: application/pdf Size: 85294 bytes Desc: not available URL: From tho at ifs.tuwien.ac.at Tue Dec 6 07:57:37 2005 From: tho at ifs.tuwien.ac.at (Nguyen Manh Tho) Date: Tue, 6 Dec 2005 13:57:37 +0100 Subject: [BiO BB] CFP: The 1st Int. Workshop on Bioinformatics and Security (BIOS 06), Vienna, Austria, April 20-22 / 2006 Message-ID: <006601c5fa64$a2922ed0$35a8a8c0@manhtho> ------------------------------------------------------------------------ The 1st International Workshop on Bioinformatics and Security (BIOS 06) ------------------------------------------------------------------------ www.ares-conf.org/?q=bios In Conjunction with ARES 2006 Vienna, Austria 20-22 April, 2006 Call for Papers ---------------- Technological advances in high-throughput techniques and efficient data gathering methods coupled with a world wide effort in computational biology have resulted in a vast amount of life science data often available in distributed and heterogeneous repositories. These repositories contain interesting information such as sequence and structure data, annotations for biological data, results of complex and expensive computations, bio-medical publications, and so on. Newer and more sophisticated computational techniques to analyze such data are also being developed and made available for public use at a rapid pace. Nonetheless, the multiplicity in the objectives, methods, representation, and platforms of these data sources and analysis tools have created an urgent and immediate need for research in resource integration and platform independent processing of investigative queries involving heterogeneous data sources and analysis tools. It is now universally recognized that a database approach to analysis and management of biological data offers a convenient, high level, and efficient alternative to high volume biological data processing and management. Advances in database integration, query processing, web technology, work flow systems, and object-orientation can be leveraged to develop novel high-performance data management systems for biological applications. But also security is very important in this area and therefore this workshop has these two focuses. Topics of interest lie at the intersection of general bioinformatics and security research. Following is a nonexclusive list of topics of interest for this year. * Security for grid computing in the field of bioinformatics * Information security development processes for sensitive (medical) data * Securing medical data (Information, Data and System Integrity) * Digital Rights Management for medical stored data * Complex relational database management system, with object-oriented extensions and numerous application driven enhancements * Web and Wireless Security Bioinformatics and Medical Diagnosis * Usability and security needs of complex systems * Innovative approaches for securing medical data (access management, authentication, data protection, etc.) * Information Security Management Important Dates --------------- Submission of papers: 20 December 2005 Notification of acceptance: 20 January 2006 Camera-ready copies: 1 February 2006 Submission Details ------------------ Your contributions should be formatted acoording to the IEEE Computer Society Press Proceedings Author Guidelines: 10-point Times, single-spaced, two-column format (see here for detail; if the link does not work, see here for approximation). Each of your contributions should not exceed 8 pages. Organisational Committee ------------------------- Workshop Chair --------------- K?sef, University of Linz, FAW Austria Mazuran Petra, FAW, Austria Wagner Roland, University of Linz, FAW Austria Program Committee ------------------ Eisenacher Martin, University of M?, Germany Hochreiter Sepp, TU Berlin, Germany Hof Sonja, (DWS) AG,Switzerland Kramer Stefan, TUM, Germany Marik Vladimir, Technical University Prag, Czech Mazuran Petra, FAW, Austria Palkoska J? FAW Austria Retschitzegger Werner, University of Linz, Austria Revell Norman, Middlesex University, UK Tjoa A Min, Technical University of Vienna, Austria -------------- next part -------------- An HTML attachment was scrubbed... URL: From gully at usc.edu Wed Dec 7 21:25:00 2005 From: gully at usc.edu (Gully Burns) Date: Wed, 07 Dec 2005 18:25:00 -0800 Subject: [BiO BB] The latest release of the NeuroScholar System (1.4b11) Message-ID: <0IR500MDEQPZKL80@msg-mx5.usc.edu> ++++++++++++++++++++++++ + ANNOUNCEMENT + ++++++++++++++++++++++++ The latest NeuroScholar system is now released and available for download from http://chasseur.usc.edu/website/neuroscholar_demo.html. Please feel free to give us any comment or participate in the software development. -------------------------------------- WHAT IS NEUROSCHOLAR? -------------------------------------- The NeuroScholar system is a knowledge management system for the neuroscientific literature, allowing users to build an organized library of PDF files and then make and manage free-form notes based on the articles. This simple functionality is the first phase of the creation of a system that enables bench-neuroscientists to construct knowledge bases of what they know. This is an attempt to introduce an informatics framework into the lab to facilitate to an increased level of formalism to the subject. -------------------------------------- CHANGES IN THIS RELEASE -------------------------------------- - The user manual for the NeuroScholar system is included in this distribution. The document is located at the 'docs' directory under your NeuroScholar installation directory. - The Linux version of the NeuroScholar is now available. Both the RPM and binary distributions are provided for download. - The right click feature in the Fragmenter for Mac PowerBooks (Panther OS) is now functioning. To perform the mouse right-click, just simply click the mouse and press 'Ctrl' or 'Alt' key. - Some out-of-memory problems have been repaired. The default heap size is now set to 512MB. - Some bugs involving the process of creating or deleting a knowledge base have been repaired. - The view form panel has been improved by highlighting those fields that are part of the index string so that the user can know what are required in the process of data entry using the standard form. - The printing function in the Fragmenter has been repaired. The document will be scaled to fit the printing output if the width is larger than the printable area. - We now use the PubMed ELink utility in the ArticleRobot to speed up retrieval of full-text files. - Previously, the History function was not working when the recorded view instance has a source view instance connected. The definition of the source view instance needs to be removed before being serialized. This has now been fixed. - The problem of the drag-and-drop function in the local tree panel has been fixed. Now, the drag-and-drop feature is only enabled in the 'DISPLAY' mode. - The data uploading process in the NeuARt standalone application is improved. Multiple data maps or brain volumes can be created and uploaded into the database. The description of the data map or brain volume is editable so that the user can annotate them. - The knowledge capture feature in the Fragmenter is repaired. The user would be able to capture the data using the questionnaire for the tract tracing experiment or physiology experiment. -------------------------------------- SUPPORT STRUCTURES -------------------------------------- - Video Documentation: http://chasseur.usc.edu/website/video.html - Discussion group: https://sourceforge.net/forum/?group_id=99564 - But report: https://sourceforge.net/tracker/?group_id=99564&atid=624587 - Mailing list: http://lists.sourceforge.net/lists/listinfo/neuroscholar-announce Thank you for your attention. The NeuroScholar Team Univeristy of Southern California From delete at elfdata.com Fri Dec 9 06:54:42 2005 From: delete at elfdata.com (Theodore H. Smith) Date: Fri, 9 Dec 2005 11:54:42 +0000 Subject: [BiO BB] WU-Blast vs BLAST? In-Reply-To: <006601c5fa64$a2922ed0$35a8a8c0@manhtho> References: <006601c5fa64$a2922ed0$35a8a8c0@manhtho> Message-ID: I have a question regarding the use of WU-Blast vs BLAST. If WU-BLAST is much faster and slightly more accurate than BLAST, why aren't more people using WU-BLAST? Why isn't NCBI offering a WU-BLAST search in place of it's BLAST search? Is it: 1) Inertia due to having to upgrade and rehabilitate systems to be using WU-Blast? 2) WU-BLAST not offering something BLAST does? 3) Having to retrain people, or people being resistant to change? 4) Some kind of legal restriction? 5) Something else? I am curious on the matter. I am just an ordinary researcher, not anyone associated with Washington University, actually I live in London UK :) From christoph.gille at charite.de Fri Dec 9 08:08:19 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 9 Dec 2005 14:08:19 +0100 (CET) Subject: [BiO BB] WU-Blast vs BLAST? In-Reply-To: References: <006601c5fa64$a2922ed0$35a8a8c0@manhtho> Message-ID: <42509.192.168.220.204.1134133699.squirrel@webmail.charite.de> I do agree. WU-blast for AA sequences is also included in the program package STRAP which is maintained by myself. The advantage over using an ordinary Web browser is that the result is cached on HD and is available the next time immediately. Further one can search against "non-redundant" databases. From ahmed at users.sourceforge.net Fri Dec 9 08:29:32 2005 From: ahmed at users.sourceforge.net (Ahmed Moustafa) Date: Fri, 09 Dec 2005 07:29:32 -0600 Subject: [BiO BB] Java API for BLAST? Message-ID: <439986BC.9070704@users.sourceforge.net> Hi All! Is there some sort of a Java package to do BLAST search (remote or local)? I went through BioJava and I could not find something like that, it seemed there was only a parser for the BLAST XML output. Thanks in advance! Ahmed From christoph.gille at charite.de Fri Dec 9 10:39:23 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 9 Dec 2005 16:39:23 +0100 (CET) Subject: [BiO BB] Java API for BLAST? In-Reply-To: <439986BC.9070704@users.sourceforge.net> References: <439986BC.9070704@users.sourceforge.net> Message-ID: <42617.192.168.220.204.1134142763.squirrel@webmail.charite.de> Hi Ahmed, I have just findished one but need some more three days for documentation. It might already work for you. Please look at: http://www.charite.de/bioinf/strap/javadoc/charite/christo/interfaces/SequenceBlaster.html http://www.charite.de/bioinf/strap/Scripting.html You have three implementations: 1. via HTTP @ ebi 2. WU-Blast local 3. NCBI-Blast local Since Blast usually takes some sec the results are cached on HD. Please tell me if you need assistance. Christoph From landman at scalableinformatics.com Fri Dec 9 10:45:15 2005 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 09 Dec 2005 10:45:15 -0500 Subject: [BiO BB] Java API for BLAST? In-Reply-To: <439986BC.9070704@users.sourceforge.net> References: <439986BC.9070704@users.sourceforge.net> Message-ID: <4399A68B.7080808@scalableinformatics.com> Hi Ahmed: You might look into hooking into the BioPerl bits from Java (RMI I think). This shouldn't be too difficult. Joe Ahmed Moustafa wrote: > Hi All! > > Is there some sort of a Java package to do BLAST search (remote or > local)? I went through BioJava and I could not find something like that, > it seemed there was only a parser for the BLAST XML output. > > Thanks in advance! > > Ahmed > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From ahmed at users.sourceforge.net Fri Dec 9 11:33:09 2005 From: ahmed at users.sourceforge.net (Ahmed Moustafa) Date: Fri, 09 Dec 2005 10:33:09 -0600 Subject: [BiO BB] Java API for BLAST? In-Reply-To: <4399A68B.7080808@scalableinformatics.com> References: <439986BC.9070704@users.sourceforge.net> <4399A68B.7080808@scalableinformatics.com> Message-ID: <4399B1C5.4020505@users.sourceforge.net> Hi Joe, I thought of switching my project to Perl because I found BioPerl had more functionalities and more "documentation" than BioJava but am trying first to stay with Java, otherwise will go to BioPerl. Thanks! Ahmed On 12/9/2005 9:45 AM, Joe Landman wrote: > Hi Ahmed: > > You might look into hooking into the BioPerl bits from Java (RMI I > think). This shouldn't be too difficult. > > Joe > > Ahmed Moustafa wrote: > >> Hi All! >> >> Is there some sort of a Java package to do BLAST search (remote or >> local)? I went through BioJava and I could not find something like >> that, it seemed there was only a parser for the BLAST XML output. >> >> Thanks in advance! >> >> Ahmed > From ahmed at users.sourceforge.net Fri Dec 9 11:41:27 2005 From: ahmed at users.sourceforge.net (Ahmed Moustafa) Date: Fri, 09 Dec 2005 10:41:27 -0600 Subject: [BiO BB] Java API for BLAST? In-Reply-To: <42617.192.168.220.204.1134142763.squirrel@webmail.charite.de> References: <439986BC.9070704@users.sourceforge.net> <42617.192.168.220.204.1134142763.squirrel@webmail.charite.de> Message-ID: <4399B3B7.8030207@users.sourceforge.net> Hi Christoph, I am going to try the first implementation (via HTTP/EBI), and will let you know how it will go. BTW, I thought NCBI had some Java APIs but could not find, have you heard of that? Thanks! Ahmed On 12/9/2005 9:39 AM, Dr. Christoph Gille wrote: >Hi Ahmed, > >I have just findished one but need some more three days for documentation. >It might already work for you. >Please look at: >http://www.charite.de/bioinf/strap/javadoc/charite/christo/interfaces/SequenceBlaster.html >http://www.charite.de/bioinf/strap/Scripting.html >You have three implementations: > 1. via HTTP @ ebi > 2. WU-Blast local > 3. NCBI-Blast local > >Since Blast usually takes some sec the results are cached on HD. > >Please tell me if you need assistance. > >Christoph > From christoph.gille at charite.de Fri Dec 9 11:42:51 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 9 Dec 2005 17:42:51 +0100 (CET) Subject: [BiO BB] Java API for BLAST? In-Reply-To: <4399B1C5.4020505@users.sourceforge.net> References: <439986BC.9070704@users.sourceforge.net> <4399A68B.7080808@scalableinformatics.com> <4399B1C5.4020505@users.sourceforge.net> Message-ID: <34246.192.168.220.204.1134146571.squirrel@webmail.charite.de> Ahmed, please consider that Java has so many advantages over PERL. E.g. Java is as fast as C/C++ and it is type save! From martin_jambon at emailuser.net Fri Dec 9 12:52:44 2005 From: martin_jambon at emailuser.net (Martin Jambon) Date: Fri, 9 Dec 2005 09:52:44 -0800 (PST) Subject: [BiO BB] WU-Blast vs BLAST? In-Reply-To: <42509.192.168.220.204.1134133699.squirrel@webmail.charite.de> References: <006601c5fa64$a2922ed0$35a8a8c0@manhtho> <42509.192.168.220.204.1134133699.squirrel@webmail.charite.de> Message-ID: Feel free to add your suggestions to the BLAST page at Wikiomics: http://wikiomics.org/index.php?title=BLAST Martin On Fri, 9 Dec 2005, Dr. Christoph Gille wrote: > I do agree. > WU-blast for AA sequences is also included in the program package STRAP > which is maintained by myself. > The advantage over using an ordinary Web browser is that the result is > cached on HD and is available the next time immediately. > Further one can search against "non-redundant" databases. > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Martin Jambon, PhD http://martin.jambon.free.fr Store and share your bioinformatics tips at http://wikiomics.org From delete at elfdata.com Fri Dec 9 14:03:06 2005 From: delete at elfdata.com (Theodore H.Smith) Date: Fri, 9 Dec 2005 19:03:06 +0000 Subject: [BiO BB] Downloading PDB40D-B? Message-ID: <2C5B5BBD-18A2-46A2-A9C1-2C0482CD796F@elfdata.com> Hi people, I am a researcher testing some new algorithms for protein and DNA searches for their performance in various areas. (Accuracy, time, RAM usage). I'm reading that the protein database, PDB40D-B is useful for testing BLAST-like algorithms for their accuracy. http://www.pnas.org/cgi/ content/full/95/11/6073 Is PDB40D-B actually a real database or must I construct it myself using self-made software out of other data? I'm basically looking for a protein database to download, that I can test some BLAST-like software on. It doesn't need to be PDB40D-B, just any database that would allow for meaningful analysis of the results. Apparantly, PDB40D-B was "created" from SCOP, meaning that PDB40D-B isn't actually a downloadable database, it needs to be created yourself. Now I just need to figure out which files on SCOP's FTP section ( http://www.rcsb.org/pdb/cgi/ftpd.cgi/ ) I need to download and then figure out what to do with those files :) Unless anyone else has a simpler suggestion? From delete at elfdata.com Fri Dec 9 14:09:54 2005 From: delete at elfdata.com (Theodore H. Smith) Date: Fri, 9 Dec 2005 19:09:54 +0000 Subject: [BiO BB] How to FTP 34,000 protein files? Is it possible? Message-ID: I'm trying to download the files at: ftp://ftp.rcsb.org/pub/pdb/data/structures/all/pdb There are 34,142 files in this directory as of today. I have at my disposal, a Mac, a Linux box, and a Windows PC. OSX's Finder tells me "server disconnected us" when it tries to list the contents of this directory, because it can't handle that many files over FTP. I tried wget on Linux. It also told me "invalid server response". Fetch on Mac handled better in that it could list the whole contents of this directory. But trying to download either results in one README file being downloaded, or the entire Mac crashing if I attempt to drag and drop 34,142 files to the Finder! Any suggestions for downloading? Like I said, I have a Mac, a Unix, and a Windows PC. Honestly I am surprised at wget. I've never seen it fail before. From christoph.gille at charite.de Fri Dec 9 14:19:54 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 9 Dec 2005 20:19:54 +0100 (CET) Subject: [BiO BB] How to FTP 34,000 protein files? Is it possible? In-Reply-To: References: Message-ID: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> wget is not a good idea 'cos it connects and disconnect for every protein file. There are ftp clients where you can give a list of the files you want to download. I have forgotten the name of the ftp client I used for the same purpose. There are also bunch of ftp mirror programs which will work. The mirror programs however are more difficult to use. Just search your software manager for the terms ftp and mirror or ftp and scriptable. From lutfullah.kakakhel at gmail.com Fri Dec 9 14:47:29 2005 From: lutfullah.kakakhel at gmail.com (Lutfullah Kakakhel) Date: Sat, 10 Dec 2005 00:47:29 +0500 Subject: [BiO BB] How to FTP 34,000 protein files? Is it possible? In-Reply-To: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> References: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> Message-ID: <477b582e0512091147p38621668p59bbf74fe1b4bca@mail.gmail.com> Try rsync on your linux box. --Lutfullah On 12/10/05, Dr. Christoph Gille wrote: > > wget is not a good idea 'cos it connects and disconnect for every protein > file. > There are ftp clients where you can give a list of the files you > want to download. > I have forgotten the name of the ftp client I used for the same purpose. > > There are also bunch of ftp mirror programs which will work. > The mirror programs however are more difficult to use. > Just search your software manager for the terms ftp and mirror or ftp and > scriptable. > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -------------- next part -------------- An HTML attachment was scrubbed... URL: From logan at cacs.louisiana.edu Fri Dec 9 18:01:38 2005 From: logan at cacs.louisiana.edu (Raja Loganantharaj) Date: Fri, 09 Dec 2005 17:01:38 -0600 Subject: [BiO BB] Last CFP: Special Session on Bioinformatics (June 27-30) Message-ID: <439A0CD2.5060804@cacs.louisiana.edu> You are invited to submit a paper to the BioInformatics track of the IEA/AIE 2006 conference in Annecy FRANCE June 27-30. The goal of the special track is to facilitate collaboration between AI researchers and biologists by presenting cutting edge algorithms, ideas and the applications of AI technology to solve interesting problems in computational biology. We plan to accept up to 10 papers for presentation (after a formal peer review process) and these will be included in the IEA/AIE 2006 conference proceedings, which is published in a bound volume by Springer-Verlag in their 'Lecture Notes in Artificial Intelligence' series. All the accepted papers will be considered for a special issue of the Journal of Applied Intelligence. The topics of interests include, but not limited to: * Sequence Alignment * Gene Discovery * Protein /RNA structure Prediction and Modeling * Comparative Genomics * Regulatory modules and Pathway analysis * Evolution and Phylogenetics * Molecular Structures * Interactions between protein to protein, and protein to dna. * MicroArray data analysis and high dimensional reductions * Application of machine learning to molecular biology View the details at http://esia2.univ-savoie.fr/conf-iea-aie/index.php?id=18 The important dates are Electronic Paper Submission Deadline has been extended to December 15, 2005 Author(s) Notification January 23, 2006 Camera-Ready Copy Deadline February 24, 2006 IEA/AIE-06 Special session on Bioinformatics June 27-30, 2006 To submit paper, go to author access at the main conference site and register yourself before you submit your complete paper up to 12 pages for full presentation or up to 6 pages for short presentation. Please follow the formatting instruction given at the conference site at http://esia2.univ-savoie.fr/conf-iea-aie/ Thank you Raja Loganantharaj -- Dr. Raja Loganantharaj Director of Bioinformatics Research Lab Center for Advanced Computer Studies University of Louisiana PO Box 44330 Lafayette, LA 70504-4330 ------------------------------- Voice: 337-482-5345 http://www.cacs.louisiana.edu/~logan/ From landman at scalableinformatics.com Fri Dec 9 19:02:02 2005 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 09 Dec 2005 19:02:02 -0500 Subject: [BiO BB] Java API for BLAST? In-Reply-To: <4399B1C5.4020505@users.sourceforge.net> References: <439986BC.9070704@users.sourceforge.net> <4399A68B.7080808@scalableinformatics.com> <4399B1C5.4020505@users.sourceforge.net> Message-ID: <439A1AFA.9050605@scalableinformatics.com> Hi Ahmed: Ahmed Moustafa wrote: > Hi Joe, > > I thought of switching my project to Perl because I found BioPerl had > more functionalities and more "documentation" than BioJava but am trying > first to stay with Java, otherwise will go to BioPerl. My apologies about not being clearer. I am not suggesting a port to Perl, just hooking in the previously developed modules through some sort of inter-language call. In Perl it is fairly easy to do with the Inline::* modules, and I believe it is fairly easy with Python and Java. Thats it... not advocating re-inventing your program in a new language, just using what might be available in another language that can help. Joe > > Thanks! > > Ahmed > > On 12/9/2005 9:45 AM, Joe Landman wrote: > >> Hi Ahmed: >> >> You might look into hooking into the BioPerl bits from Java (RMI I >> think). This shouldn't be too difficult. >> >> Joe >> >> Ahmed Moustafa wrote: >> >>> Hi All! >>> >>> Is there some sort of a Java package to do BLAST search (remote or >>> local)? I went through BioJava and I could not find something like >>> that, it seemed there was only a parser for the BLAST XML output. >>> >>> Thanks in advance! >>> >>> Ahmed >> >> > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From hlapp at gmx.net Sat Dec 10 22:26:31 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 10 Dec 2005 19:26:31 -0800 Subject: [BiO BB] How to FTP 34,000 protein files? Is it possible? In-Reply-To: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> References: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> Message-ID: <657ff5a73b34cc0d1d26ae6a115400a3@gmx.net> Why not use mget and a wildcard expression using any vanilla or fancy ftp client? (Obviously, turn off the prompt beforehand, or you'll be asked 34,000 times to confirm the local filename ...) On Dec 9, 2005, at 11:19 AM, Dr. Christoph Gille wrote: > wget is not a good idea 'cos it connects and disconnect for every > protein > file. > There are ftp clients where you can give a list of the files you > want to download. > I have forgotten the name of the ftp client I used for the same > purpose. > > There are also bunch of ftp mirror programs which will work. > The mirror programs however are more difficult to use. > Just search your software manager for the terms ftp and mirror or ftp > and > scriptable. > > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From christoph.gille at charite.de Sun Dec 11 05:43:52 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Sun, 11 Dec 2005 11:43:52 +0100 (CET) Subject: [BiO BB] How to FTP 34,000 protein files? Is it possible? In-Reply-To: <657ff5a73b34cc0d1d26ae6a115400a3@gmx.net> References: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> <657ff5a73b34cc0d1d26ae6a115400a3@gmx.net> Message-ID: <61611.84.190.20.190.1134297832.squirrel@webmail.charite.de> I believe that mget *.* works only if the files reside in the same dir. From lutfullah.kakakhel at gmail.com Sun Dec 11 08:44:57 2005 From: lutfullah.kakakhel at gmail.com (Lutfullah Kakakhel) Date: Sun, 11 Dec 2005 18:44:57 +0500 Subject: Fwd: [BiO BB] How to FTP 34,000 protein files? Is it possible? In-Reply-To: <477b582e0512110528l3f5cd297l5a1d500759c30f91@mail.gmail.com> References: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> <657ff5a73b34cc0d1d26ae6a115400a3@gmx.net> <61611.84.190.20.190.1134297832.squirrel@webmail.charite.de> <477b582e0512110528l3f5cd297l5a1d500759c30f91@mail.gmail.com> Message-ID: <477b582e0512110544l610a3c2fgc7cb2318bbacea9e@mail.gmail.com> I just had a look at the ftp site given in the orignal message . The filenames are actually symbolic links. $ wget -m -c --retr-symlinks ftp://ftp.rcsb.org/pub/pdb/data/structures/all/pdb will retrieve them all in one go: ================================================= --18:04:34-- ftp://ftp.rcsb.org/pub/pdb/data/structures/all/pdb/pdb160d.ent.Z => `ftp.rcsb.org/pub/pdb/data/structures/all/pdb/pdb160d.ent.Z' ==> CWD not required. ==> PASV ... done. ==> RETR pdb160d.ent.Z ... done. Length: 34 100%[====================================>] 14,097 15.93K/s 18:04:37 (15.89 KB/s) - `ftp.rcsb.org/pub/pdb/data/structures/all/pdb/pdb160d.ent.Z' saved [14,097] --18:04:37-- ftp://ftp.rcsb.org/pub/pdb/data/structures/all/pdb/pdb160l.ent.Z => `ftp.rcsb.org/pub/pdb/data/structures/all/pdb/pdb160l.ent.Z' ==> CWD not required. ==> PASV ... done. ==> RETR pdb160l.ent.Z ... done. Length: 34 100%[====================================>] 39,930 9.37K/s ETA 00:00 The directory structure saved is : your_home_directory/ftp.rcsb.org/pub/pdb/data/structures/all/pdb into which you should see all the downloaded files: pdb05c1.noc.Z pdb0in3.noc.Z pdb100d.ent.Z pdb114d.ent.Z pdb131d.ent.Z --Lutfullah -------------- next part -------------- An HTML attachment was scrubbed... URL: From delete at elfdata.com Sun Dec 11 09:40:57 2005 From: delete at elfdata.com (Theodore H. Smith) Date: Sun, 11 Dec 2005 14:40:57 +0000 Subject: [BiO BB] How to FTP 34,000 protein files? Is it possible? In-Reply-To: <477b582e0512110544l610a3c2fgc7cb2318bbacea9e@mail.gmail.com> References: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> <657ff5a73b34cc0d1d26ae6a115400a3@gmx.net> <61611.84.190.20.190.1134297832.squirrel@webmail.charite.de> <477b582e0512110528l3f5cd297l5a1d500759c30f91@mail.gmail.com> <477b582e0512110544l610a3c2fgc7cb2318bbacea9e@mail.gmail.com> Message-ID: Aha, So that was what I was missing. --retr-symlinks thanks! I also updated my wget to version 1.10.2, from the old 1.8.2, so hopefully this version can handle a large number of files. On 11 Dec 2005, at 13:44, Lutfullah Kakakhel wrote: > I just had a look at the ftp site given in the orignal message . > The filenames are actually symbolic links. > $ wget -m -c --retr-symlinks ftp://ftp.rcsb.org/pub/pdb/data/ > structures/all/pdb > will retrieve them all in one go: > > ================================================= > --18:04:34-- ftp://ftp.rcsb.org/pub/pdb/data/structures/all/pdb/ > pdb160d.ent.Z > => `ftp.rcsb.org/pub/pdb/data/structures/all/pdb/ > pdb160d.ent.Z' > ==> CWD not required. > ==> PASV ... done. ==> RETR pdb160d.ent.Z ... done. > Length: 34 > > 100%[====================================>] 14,097 15.93K/s > > 18:04:37 (15.89 KB/s) - `ftp.rcsb.org/pub/pdb/data/structures/all/ > pdb/pdb160d.ent.Z' saved [14,097] > > --18:04:37-- ftp://ftp.rcsb.org/pub/pdb/data/structures/all/pdb/ > pdb160l.ent.Z > => `ftp.rcsb.org/pub/pdb/data/structures/all/pdb/ > pdb160l.ent.Z' > ==> CWD not required. > ==> PASV ... done. ==> RETR pdb160l.ent.Z ... done. > Length: 34 > > 100%[====================================>] 39,930 9.37K/ > s ETA 00:00 > > The directory structure saved is : > your_home_directory/ftp.rcsb.org/pub/pdb/data/structures/all/pdb > > into which you should see all the downloaded files: > pdb05c1.noc.Z pdb0in3.noc.Z pdb100d.ent.Z pdb114d.ent.Z > pdb131d.ent.Z > > --Lutfullah > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- http://elfdata.com/plugin/ What does our work achieve, if it's not making the world a happier place? http://www.whatnextjournal.co.uk/Pages/Next/Happiness.html When's the last time you thought deeply about how to improve our lives? From lutfullah.kakakhel at gmail.com Sun Dec 11 11:28:24 2005 From: lutfullah.kakakhel at gmail.com (Lutfullah Kakakhel) Date: Sun, 11 Dec 2005 21:28:24 +0500 Subject: [BiO BB] How to FTP 34,000 protein files? Is it possible? In-Reply-To: <477b582e0512110823v130ac83cl60d937bd158f0724@mail.gmail.com> References: <42859.192.168.220.204.1134155994.squirrel@webmail.charite.de> <657ff5a73b34cc0d1d26ae6a115400a3@gmx.net> <61611.84.190.20.190.1134297832.squirrel@webmail.charite.de> <477b582e0512110528l3f5cd297l5a1d500759c30f91@mail.gmail.com> <477b582e0512110544l610a3c2fgc7cb2318bbacea9e@mail.gmail.com> <477b582e0512110823v130ac83cl60d937bd158f0724@mail.gmail.com> Message-ID: <477b582e0512110828k71067929t19792279cfaefb38@mail.gmail.com> Nothing to do with the number of files or the version. I think wget has no such restrictions in the older version too. The -c switch is for continuing (resuming), in case you drop the connection. The -m is for mirroring the entire directory tree structure and is a combination of several switches, including recursive downloading. The switch --retr-symlinks will follow the link to fetch the actual file. Your original posting, however, mentioned 'invalid server response'. I am not sure what caused that error. If you omit the switch for symbolic links, you should still be able to get into the server but you will get a directory full of symbolic links only and not the actual files. May be you have access problems to the server - proxy, firewall - whatever else but wget should not fail to retrieve a mere 34000 files in a directory - nor should it require giving filename for every protein. Enjoy! --Lutfullah -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinfosm at gmail.com Mon Dec 12 11:38:03 2005 From: bioinfosm at gmail.com (Samantha Fox) Date: Mon, 12 Dec 2005 11:38:03 -0500 Subject: [BiO BB] clustering short sequences In-Reply-To: <437CFE72.8030702@fiserlab.org> References: <437CFE72.8030702@fiserlab.org> Message-ID: <726450810512120838n2bf2519fo4b5a2ec2e5878083@mail.gmail.com> I had a similar problem, but with dna sequences .. I finally used align0 of fasta package .. what was the get-around you used ? align0 worked pretty well for me, as I did not want to penalize end-gaps. Sam On 11/17/05, Narcis Fernandez-Fuentes wrote: > > > Hi all, > > Does anybody knows a program to sequentially cluster short protein > fragments, between 8 to 14 residues long? I tried cd-hit, it works find > in the range of 12 to 14 but below it crash. Any suggestion? > > Thanks! > > Narcis > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -------------- next part -------------- An HTML attachment was scrubbed... URL: From narcis at fiserlab.org Mon Dec 12 13:17:34 2005 From: narcis at fiserlab.org (Narcis Fernandez-Fuentes) Date: Mon, 12 Dec 2005 13:17:34 -0500 Subject: [BiO BB] clustering short sequences In-Reply-To: <726450810512120838n2bf2519fo4b5a2ec2e5878083@mail.gmail.com> References: <437CFE72.8030702@fiserlab.org> <726450810512120838n2bf2519fo4b5a2ec2e5878083@mail.gmail.com> Message-ID: <439DBEBE.6010601@fiserlab.org> I wrote my own greedy algorithm. But I am going to use the program you used and compare the resutls. Narcis Samantha Fox wrote: > I had a similar problem, but with dna sequences .. I finally used align0 > of fasta package .. what was the get-around you used ? > align0 worked pretty well for me, as I did not want to penalize end-gaps. > > Sam > > On 11/17/05, *Narcis Fernandez-Fuentes* > wrote: > > > Hi all, > > Does anybody knows a program to sequentially cluster short protein > fragments, between 8 to 14 residues long? I tried cd-hit, it works find > in the range of 12 to 14 but below it crash. Any suggestion? > > Thanks! > > Narcis > _______________________________________________ > Bioinformatics.Org general > forum - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Narcis Fernandez-Fuentes, phD Seaver Center for Bioinformatics Albert Einstein College of Medicine 1300 Morris Park Ave, Bronx, NY 10461, USA phone: (718)430-3233 fax: (718) 430-8565 mailto:narcis at fiserlab.org (http://www.fiserlab.org) From gigi at biocomp.unibo.it Tue Dec 13 09:40:23 2005 From: gigi at biocomp.unibo.it (gigi) Date: Tue, 13 Dec 2005 15:40:23 +0100 Subject: [BiO BB] 7th Bologna Winter School: APPLIED BIOINFORMATICS: the Test Case of the Human Genome Message-ID: <439EDD57.6060504@biocomp.unibo.it> APPLIED BIOINFORMATICS: the Test Case of the Human Genome Bologna Winter School 2006 Italy - Bologna Feb 13-17, 2006 The Bologna Winter Schools are unique international forums where to debate the state of the art of complex problems at the forehand of Bioinformatics, Computational Biology and Modern Biology. The 7th Edition focuses on the applications of Bioinformatics to the Human Genome Analysis. Fundamental problems, that are still matter of debate, will be addressed such as the annotation, the expression and the regulation of the Human Genome, the relationship between expression and diseases, the variability between populations and the applications to molecular medicine. The growing interest in Biodiversity, Aging, Forensic Genomics and early diagnosis of genetic maladies indicates how Bioinformatics is essential not only for the management and the analysis of the data but also for identifying typical markers of different populations and individuals, with the common goal of highlighting the relationship between genotype and phenotype. The school will be devoted to explore which ideas and tools out of Bioinformatics can effectively help in the analysis at large of the Human Genome and will cover different sessions: How many genes? Genome annotation Transcription regulation Genome variability SNPs aging and diseases Forensic genomics A round table discussion and an industry corner will give to all the participants the opportunity to briefly present their work and to discuss different applications. TEACHERS: David Balding, Imperial College, London, UK Guido Barbujani, University of Ferrara, IT Jaume Bertranpetit, University "Pompeu Fabra", Barcelona, ES Claudio Franceschi, University of Bologna, IT Roderic Guig?, University "Pompeu Fabra", Barcelona, ES Uta-Dorothee Immel, Martin Luther University, Halle (Saale), DE David T. Jones, University College, London, UK Arthur Lesk, Cambridge University, UK Marion Nagy, Humboldt University/Charit?, Berlin, DE Peter N?rnberg, University of Cologne, K?ln, DE Walther Parson, Medical University of Innsbruck, AT Giovanni Perini, University of Bologna, IT Graziano Pesole, University of Milan, IT Lutz Roewer, Humboldt University/Charit?, Berlin, DE Janet Thornton, EBI-EMBL, Cambridge, UK Anna Tramontano, University "La Sapienza", Roma, IT Alfonso Valencia, CNB-CSIC, Cantoblanco, ES ------------------------------------------------------------------------ Additional Information http://www.biocomp.unibo.it/~school2006/ For Application send a short C.V. to school2006 at biocomp.unibo.it Deadline for application is January 10, 2006 From landman at scalableinformatics.com Tue Dec 13 10:59:29 2005 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 13 Dec 2005 10:59:29 -0500 Subject: [BiO BB] [BLAST-Announce #053] BLAST 2.2.13 released] Message-ID: <439EEFE1.6080302@scalableinformatics.com> -------- Original Message -------- Subject: [blast-announce] [BLAST-Announce #053] BLAST 2.2.13 released Date: Tue, 13 Dec 2005 10:53:34 -0500 Notes for the 2.2.13 release Standalone BLAST 2.2.13 is now available from the BLAST download page. Major changes include: * New engine now available in blastall * Statistical parameter change * Bug fixes New engine available in blastall Blastall now has support for a new version of the BLAST engine that can be enabled by adding "-V F" to the blastall command-line. This option will probably be the default in future versions. There are a few situations where it is very advantageous to use the new engine: 1. Large word-sizes with a BLASTN search. The new engine uses the "stride" idea of AGBLAST and this can lead to a considerable speedup for large word sizes. For a run of a typical mRNA sequence (u00001) with a word size of 25 the new code runs about twice as fast as the old code. Note that the AG "stride" has been available in megablast since the 2.2.10 release. This enhancement is platform-independent. 2. Searching multiple queries at once. The new engine will search multiple queries by scanning the database once, rather than once for each query. The speedup will depend upon the queries being searched and what part of the time is spent scanning the databases vs. actual computations (e.g., extensions etc.). Typically this feature is most important if a number of short queries (e.g., mRNA's or EST's) are being searched with blastn or if a tblastn search is performed. This feature is partially supported in the old code with the -B option as well as by megablast. 3. For very large queries. The memory management (especially during the dynamic programming phase) has been improved and this may allow searches with lots of matches or large queries that used to fail to now run to completion. Statistical parameter change Megablast, blastall and bl2seq have until now allowed users to select arbitrary gap existence and extension penalties for a blastn type search. This has been convenient for users but has led to the unfortunate situation that searches with some parameter sets were significantly overestimating the statistical significance of matches. To address this problem the proper statistical parameters for a number of reward/penalty/gap existence/gap extension values have been calculated. The parameters that might cause an issue here are -r (match reward), -q (mismatch penalty), -G (gap existence cost), and -E (gap extension cost). If you do not change these, then nothing will change for you. Please email blast-help at ncbi.nlm.nih.gov with any questions, bug reports, or requests for different parameter sets. Below are listed the supported combinations. Note that above a certain gap existence and extension penalty any value is permitted, as the statistics for ungapped searches can be used. These are marked as "ungapped threshold" below. For match = 1, mismatch = -4 the supported combinations are: G E ----- 1, 2, 0, 2, 2, 1, 1, 1, 2, 2 (ungapped threshold) match = 2, mismatch = -7 the supported combinations are: G E ----- 2, 4, 0, 4, 4, 2, 2, 2, 4, 4 (ungapped threshold) match = 1, mismatch = -3 the supported combinations are: G E ----- 1, 2, 0, 2, 2, 1, 1, 1 2, 2 (ungapped threshold) match = 2, mismatch = -5 the supported combinations are: G E ----- 2, 4, 0, 4, 4, 2, 2, 2, 4, 4 (ungapped threshold) match - 1, mismatch = -2 the supported combinations are: G E ----- 1, 2, 0, 2, 3, 1, 2, 1, 1, 1, 2, 2 (ungapped threshold) match = 2, mismatch = -3 the supported combinations are: G E ----- 4, 4, 2, 4, 0, 4, 3, 3, 6, 2, 5, 2, 4, 2, 2, 2, 6, 4 (ungapped threshold) match = 1, mismatch = -1 the supported combinations are: G E ----- 3, 2, 2, 2, 1, 2, 0, 2, 4, 1, 3, 1, 2, 1, 4, 2 (ungapped threshold) match = 5, mismatch = -4 the supported combinations are: G E ----- 10, 6 8, 6 25, 10 (ungapped threshold) match = 4, mismatch = -5 the supported combinations are: G E ----- 6, 5, 5, 5, 4, 5, 3, 5, 12, 8 (ungapped threshold) Bug fixes * A bug has been fixed in formatdb. This bug occurred when the -o option was not used, meaning that the FASTA definition lines of the input file were not parsed, and multiple database volumes were generated. The bug normally did not become apparent to the user until the BLAST run at which point the BLAST binary (e.g., blastall) would produce messages containing "ObjMgrChoice: pointer [0] type [1] not found". -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From landman at scalableinformatics.com Tue Dec 13 15:30:50 2005 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 13 Dec 2005 15:30:50 -0500 Subject: [BiO BB] BLAST 2.2.13 RPMs are out Message-ID: <439F2F7A.8010307@scalableinformatics.com> http://downloads.scalableinformatics.com/downloads/ncbi have at 'em -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From oliviero.carugo at univie.ac.at Wed Dec 14 08:03:35 2005 From: oliviero.carugo at univie.ac.at (oliviero.carugo at univie.ac.at) Date: Wed, 14 Dec 2005 14:03:35 +0100 Subject: [BiO BB] International PhD School in Bioinformatics Message-ID: <200512141303.jBED3ZgB376510@imap.univie.ac.at> The consortium of the "Bioinformatics Integration Network" (a network in the context of the Austrian genome research programme - www.gen-au.at) is organizing an international PhD programme covering the following research topics: - Databases / Analytical Tools for Genomics and Proteomics (Zlatko Trajanoski) - Biomolecular Sequence Analysis and MS Data Interpretation(Frank Eisenhaber) - Dynamics of Sequence Evolution (Arndt von Haeseler) - RNA related Bioinformatics Tools (Ivo Hofacker) - Computational Structural Genomics (Kristina Djinovic and Oliviero Carugo) - Data Mining in Proteomics (Bernhard Tilg) - Pathway Analysis (Georg Casari) The PhD courses will be common throughout the network and the candidates will be supervised by two PIs (supervisor and co-supervisor). We intend to include additional members from other EU countries in the doctorate committee in order to meet the requirements for the "European Doctorate". An "Outstanding Scholar" program will be established to identify new bioinformaticians who have demonstrated research excellence. The programme aims to rotate the students for three months in one of the participating labs or in the labs of our international collaborators in order to get exposure to other bioinformatics techniques. Admitted PhD students will receive fellowships starting March 1st, 2006. Information at genome.tugraz.at/binphd.php. From tcan at ceng.metu.edu.tr Wed Dec 14 12:40:00 2005 From: tcan at ceng.metu.edu.tr (Tolga CAN) Date: Wed, 14 Dec 2005 19:40:00 +0200 Subject: [BiO BB] Workshop on "Emerging Topics in Human Functional Genomics and Proteomics" - March 2006 Message-ID: <20051214173744.M61709@ceng.metu.edu.tr> WORKSHOP ON "EMERGING TOPICS IN HUMAN FUNCTIONAL GENOMICS AND PROTEOMICS" 26 - 31 MARCH 2006, ANTALYA, TURKEY http://www.i-cancer.fen.bilkent.edu.tr/omics Supported by ICGEB Genomics, proteomics and systems biology are emerging fields as a result of recent advances in molecular biology, which produce large scale data. Computational analysis techniques, which aim to develop tools for functional annotations to these data, are highly required in the post-genomic era. Primary objective of this workshop is to make a head start for researchers who work or want to work in these areas and to bring together young scientists with the leading scientists in the field "-omics". TOPICS Topics include, but not limited to: * Human Genome annotation * Largescale gene expression analysis * Functional Genomics and Transcriptomics * Proteomics * Systems Biology INVITED SPEAKERS Charles Auffray, CNRS, Villejuif, France, Peer Bork, EMBL, Heidelberg, Germany, Soren Brunak, Technical University of DK, Denmark Burak Erman, Koc University, Turkey Doron Lancet, Weizmann Institute of Science, Israel Mehmet Ozturk, Bilkent University, Turkey Yitzhak Pilpel, Weizmann Institute of Science, Israel Benno Schwikowski, Institut Pasteur in Paris IMPORTANT DATES and APPLICATIONS 15 November 2005 Applications start 15 January 2006 Applications end Online applications is now open: http://bioinfo.ceng.metu.edu.tr/Omics2006/Submission/openconf.php Participants: Participants must be currently involved in research on the subject of the meeting. Preference will be given to students and junior scientists. Funding: Please note that there is no registration fee. Nationals of ICGEB Member States who are selected to participate on an ICGEB grant will receive their accommodation (twin share) and local hospitality for the duration of the meeting; travel is NOT funded. The applicants should submit: * a resume (short cv) and a short list of publications (if any) * an abstract of current research If you wish to be eligible for financial support then in addition to the above, you need indicate the name, surname and e-mail address of an academical person to write a short reference letter for you. From molvisions at mac.com Wed Dec 14 13:41:57 2005 From: molvisions at mac.com (Timothy Driscoll) Date: Wed, 14 Dec 2005 13:41:57 -0500 Subject: [BiO BB] overlapping/shadow genes in prokaryotes Message-ID: <4152CE23-5221-4DD0-8341-512D398593BF@mac.com> hi, how prevalent is gene overlap in prokaryote genomes? not just a few codons at one end, but large regions of overlap, sometimes as long as the complete gene. I have recently begun curating the genome of a Gram-negative bacteria. I am finding a lot of predicted overlapping genes (Genemark, Glimmer, RefSeq), often with roughly equivalent supporting evidence in favor of both genes (Blast hits, RBS, codon bias, etc.). I have been told that overlapping genes are uncommon, and never more than about 30 codons in the overlap. but I am unable to find any data to support this. can anyone please provide some helpful references? many thanks, tim -- Timothy Driscoll em: molvisions at mac.com molvisions - see. grasp. learn. ph: 919-368-2667 im: molvisions usa:virginia:blacksburg tx: molvisions at vtext.com From golharam at umdnj.edu Wed Dec 14 13:38:30 2005 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 14 Dec 2005 13:38:30 -0500 Subject: [BiO BB] Semantic meaning of N in genomic sequences Message-ID: <004701c600dd$94936d60$2f01a8c0@GOLHARMOBILE1> I'm performing some analysis on different chromosomes of different species and noticed some chromosomes contain spans of N. Some of the N's are of the same lengths, and others are of different lengths. A span of Ns in the genomic sequence (as I understand it) could represent 1 of 3 things: 1. Region between two sytenous sequences on a contig/scaffold/etc with unknown size and/or sequence 2. Region on a contig of known size that is difficult to sequence 3. Region on a contig of unknown size that is difficult to sequence I'm trying to determine if certain spans of N represent any of the categories above, and which one in particular. Is there any standard for how many N's should be in place to represent anything in particular? How can you determine what a span on Ns represent? Ryan -- Ryan Golhar - golharam at umdnj.edu The Informatics Institute of UMDNJ From marty.gollery at gmail.com Wed Dec 14 15:23:03 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Wed, 14 Dec 2005 12:23:03 -0800 Subject: [BiO BB] overlapping/shadow genes in prokaryotes In-Reply-To: <4152CE23-5221-4DD0-8341-512D398593BF@mac.com> References: <4152CE23-5221-4DD0-8341-512D398593BF@mac.com> Message-ID: hi Tim, Are these predicted genes in the same reading frame, or is there a frameshift between them? Best regards, Marty On 12/14/05, Timothy Driscoll wrote: > > hi, > > how prevalent is gene overlap in prokaryote genomes? not just a few > codons at one end, but large regions of overlap, sometimes as long as > the complete gene. I have recently begun curating the genome of a > Gram-negative bacteria. I am finding a lot of predicted overlapping > genes (Genemark, Glimmer, RefSeq), often with roughly equivalent > supporting evidence in favor of both genes (Blast hits, RBS, codon > bias, etc.). > > I have been told that overlapping genes are uncommon, and never more > than about 30 codons in the overlap. but I am unable to find any data > to support this. can anyone please provide some helpful references? > > many thanks, > > tim > -- > Timothy Driscoll em: molvisions at mac.com > molvisions - see. grasp. learn. ph: 919-368-2667 > im: molvisions > usa:virginia:blacksburg tx: molvisions at vtext.com > > > > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From molvisions at mac.com Wed Dec 14 16:31:57 2005 From: molvisions at mac.com (Timothy Driscoll) Date: Wed, 14 Dec 2005 16:31:57 -0500 Subject: [BiO BB] overlapping/shadow genes in prokaryotes In-Reply-To: References: <4152CE23-5221-4DD0-8341-512D398593BF@mac.com> Message-ID: <101BD45B-3DC4-4417-AE56-7192A38B88E9@mac.com> On Dec 14, 2005, at 3:23 p, Martin Gollery wrote: > hi Tim, > > Are these predicted genes in the same reading frame, or is there a > frameshift between them? > > Best regards, > Marty > hi Marty, thanks for the response. actually, I should have been more specific - the predicted genes run on opposite strands. so far, I have curated approximately 125 genes (not counting the overlappers) of one megaplasmid of this genome (admittedly, a very small fraction of the total genome). 23% of the curated genes are paired with predicted overlapping genes. all of the pairs are on opposing strands. all overlaps are substantial - generally the entire gene, or most of it. 38% of the pairs are in the +1/-2 orientation. 28% of the pairs are in +1/-1 orientation. AFAIK, overlapping genes in this fashion is a common mechanism for genome compaction in prokaryotes. so there would be no reason to rule out one of the two overlapping genes, simply because it overlaps another gene - unless, of course, the supporting evidence is particularly poor. IOW, both predictions could very well be correct, and the fact that they overlap has no bearing on whether they are 'real' genes. is that a fair statement? thanks, tim > On 12/14/05, Timothy Driscoll wrote: > hi, > > how prevalent is gene overlap in prokaryote genomes? not just a few > codons at one end, but large regions of overlap, sometimes as long as > the complete gene. I have recently begun curating the genome of a > Gram-negative bacteria. I am finding a lot of predicted overlapping > genes (Genemark, Glimmer, RefSeq), often with roughly equivalent > supporting evidence in favor of both genes (Blast hits, RBS, codon > bias, etc.). > > I have been told that overlapping genes are uncommon, and never more > than about 30 codons in the overlap. but I am unable to find any data > to support this. can anyone please provide some helpful references? > > many thanks, > > tim > -- > Timothy Driscoll em: molvisions at mac.com > molvisions - see. grasp. learn. ph: 919-368-2667 > im: molvisions > usa:virginia:blacksburg tx: > molvisions at vtext.com > > > > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > -- > -- > Martin Gollery > Associate Director > Center For Bioinformatics > University of Nevada at Reno > Dept. of Biochemistry / MS330 > 775-784-7042 > ----------- > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Timothy Driscoll em: molvisions at mac.com molvisions - see. grasp. learn. ph: 919-368-2667 im: molvisions usa:virginia:blacksburg tx: molvisions at vtext.com From oliviero.carugo at univie.ac.at Thu Dec 15 03:13:03 2005 From: oliviero.carugo at univie.ac.at (oliviero.carugo at univie.ac.at) Date: Thu, 15 Dec 2005 09:13:03 +0100 Subject: [BiO BB] Post Doctoral Position in Macromolecular Crystallography Message-ID: <200512150813.jBF8D3gB422738@imap.univie.ac.at> *Postdoctoral Position in Macromolecular Crystallography*: *Radiation damage on metal centers and Soft X-rays Studies* *Commencing date:* As soon as possible, after closing date *Job description:* BIOXHIT (Biocrystallography on a Highly Integrated Technology Platform for European Structural Genomics) is an integrated project within the EU's 6th Framework Programme. It brings together scientists at all European synchrotrons and leading software developers to produce a highly effective technology platform for Structural Genomics. It aims to consolidate the process of bio-macromolecular structure solution from the bottom up. The successful applicant will work in a stimulating scientific environment at the Macromolecular crystallography group at the Department of Biomolecular Structural Chemistry, a member of Max F. Perutz Laboratories - University Departments at the Vienna Biocenter, University of Vienna. The Department is equipped to carry out all steps from gene cloning to structural analysis including recently installed Bruker X8 PROTEUM system. The Group has additionally access to the high throughput nano-drop crystallisation robotics operational at Institute of Molecular Pathology (IMP), based at Campus Vienna Biocenter as well as to major European synchrotron high brilliance X-ray sources. The selected candidate will focus on two related projects: one will be the investigation of radiation damage on the active centers of selected set of metallo-enzymes and the study effect of free-radical and electron scavengers with the aim to reduce such damage to these metal centers. The other will be the use of soft X-rays in macromolecular crystallography for phasing as well as functional characterization purposes, with the focus on the design of the diffraction experiment. *Qualifications and experience:* In addition to a PhD in a relevant area, experience in protein expression and purification and in macromolecular crystallography techniques including crystal growth, data-collection as well as familiarity with standard crystallographic software are required. *Contract:* A 2 year contract will be offered to the successful candidate. *Closing date:* 15 January 2006 *Web pages:* www.bioxhit.org ; http://www.mfpl.ac.at/content/index.php?cid=58 http://www.univie.ac.at/biolchem Further information can be obtained from Kristina Djinovic Carugo kristina.djinovic at univie.ac.at . University of Vienna is an equal opportunity employer. *To apply for this position, please email a detailed CV with concise description of research experience and the names and addresses of at least two referees to: kristina.djinovic at univie.ac.at , ****AND**** register at the University of Vienna: http://gerda.univie.ac.at/personal/artikel.php?Sprach_ID=1&Art_ID=668 From oliviero.carugo at univie.ac.at Thu Dec 15 03:22:34 2005 From: oliviero.carugo at univie.ac.at (oliviero.carugo at univie.ac.at) Date: Thu, 15 Dec 2005 09:22:34 +0100 Subject: [BiO BB] Post Doctoral Position in Structural Bioinformatics Message-ID: <200512150822.jBF8MYgB072958@imap.univie.ac.at> POSTDOCTORAL POSITION IN STRUCTURAL BIOINFORMATICS COMMENCING DATE: As soon as possible, after closing date. JOB DESCRIPTION: The Bioinformatics Integration Networks (BIN-II) is a research initiative funded by the Austrian Federal Ministry for Education, Science and Culture within the GEN-AU Programme. It brings together scientists from seven bioinformatics laboratories which organize an International PhD programme in Bioinformatics and Computational Biology and promote interdisciplinary researches. The successful applicant will work in a stimulating scientific environment at the Structural Biology group of the Department of Biomolecular Structural Chemistry, a member of Max F. Perutz Laboratories - University Departments at the Vienna Biocenter, University of Vienna. The Department is equipped with up-to-date computational facilities and has additionally access to the facilities of the other BIN-II laboratories. The selected candidate is expected to develop new strategies to evaluate the degree of similarity between three-dimensional macromolecular structures, which should be sufficiently fast to scan large databases and allow automated classifications. QUALIFICATIONS AND EXPERIENCE: In addition to a PhD in a relevant area, experience in computer science or bioinformatics is required. CONTRACT: A 30 months contract will be offered to the successful candidate. Application deadline: 31st January 2006. WEB PAGES: http://www.gen-au.at/ http://genome.tugraz.at/binphd.php http://www.mfpl.ac.at/content/index.php?cid=58 http://www.univie.ac.at/biolchem University of Vienna is an equal opportunity employer. Further information can be obtained from Oliviero Carugo: oliviero.carugo at univie.ac.at. To apply for this position, please email a detailed CV with concise description of research experience and the names and addresses of at least two referees to: oliviero.carugo at univie.ac.at AND register at the University Homepage: http://gerda.univie.ac.at/personal/artikel.php?Sprach_ID=1&Art_ID=668 (click on ?registrieren? and fill the form). ------------------------------------------------- Oliviero Carugo Department of Biomolecular Structural Chemistry University of Vienna Campus Vienna Biocenter 5 A-1030 Vienna Austria e-mail: oliviero.carugo at univie.ac.at Phone: +43-1-4277-52208 Fax: +43-1-4277-9522 ------------------------------------------------- From pmr at ebi.ac.uk Thu Dec 15 05:26:27 2005 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 15 Dec 2005 10:26:27 -0000 (GMT) Subject: [BiO BB] Semantic meaning of N in genomic sequences In-Reply-To: <004701c600dd$94936d60$2f01a8c0@GOLHARMOBILE1> References: <004701c600dd$94936d60$2f01a8c0@GOLHARMOBILE1> Message-ID: <1293.86.137.131.158.1134642387.squirrel@webmail.ebi.ac.uk> Hi Ryan, > I'm trying to determine if certain spans of N represent any of the > categories above, and which one in particular. Is there any standard > for how many N's should be in place to represent anything in particular? > How can you determine what a span on Ns represent? Hope you are not relying opnly on a FASTA format file of sequences :-) "If all else fails, read the documentation" - in other words, you shoudl be able to find out most of what you need from the full EMBL or GenBank entry (worth checking both formats to see which is easiest to parse). Even if you are using a FASTA file, you can retrieve the full entry to check when you need more information. Of course some of us would 'cheat' by using the ID and species to guess :-) Hope that helps a bit, Peter Rice From christoph.gille at charite.de Thu Dec 15 11:15:13 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Thu, 15 Dec 2005 17:15:13 +0100 (CET) Subject: [BiO BB] please tell me your opinion about the new BLAST API Message-ID: <56747.192.168.220.204.1134663313.squirrel@webmail.charite.de> Though many things like loops and string processing in Java are still not as easy and as performant as in other languages Java has to my opinion gained much attractiveness as a scripting language since version 1.5. Java (SUN) is surptisingly as fast as GNU-C/G++ on Intel/AMD and hence can be used for number crunching (it is really true). I have just added BLAST functionality to the open source toolbox STRAP and would like you to have a look at the API. These are wrappers for the EBI server and the local blast programs NCBI-blast and WU-blast. I could add more in the future. http://www.charite.de/bioinf/strap/Scripting.html Please tell me whether it is concise and well structured or whether there are methods missing. It is important that I fix problems just now before other people use the BLAST interface in their projects. Biojava has been lacking a wrapper for BLAST search sofar but there are parsers for the XML output in Biojava. Thus STRAP toolbox is complementary to Biojava and both can work together. Please also send suggestion for even tighter integrations of STRAP and Biojava. Life scientists using the GUI for blasting have two advantages over using a Web mask: 1. Multiple queries: A series of blast jobs for a number of query sequences can be started. 2. Cache: Blast results are stored in a HD cache and are computed only once. Subsequently, an identical query yields the result immediately. Please send your comments and suggestions Christoph From ssal at intracom.gr Thu Dec 15 13:26:12 2005 From: ssal at intracom.gr (Sotiris Salloum) Date: Thu, 15 Dec 2005 19:26:12 +0100 Subject: [BiO BB] please tell me your opinion about the new BLAST API In-Reply-To: <56747.192.168.220.204.1134663313.squirrel@webmail.charite.de> Message-ID: <200512151711.jBFHBLq03118@hal.intranet.gr> Dear Christoph, Actually Java string processing is very easy, compared to other languages. Check http://www.sitepoint.com/article/java-regex-api-explained , regarding your contribution looks very nice. Regarding performance issues with the new 64bit processors, java performance differences with other languages will be minor, the major thing is productivity and structured/reusable bioinformatics software. Regards Sotiris -----Original Message----- From: bio_bulletin_board-bounces+ssal=intracom.gr at bioinformatics.org [mailto:bio_bulletin_board-bounces+ssal=intracom.gr at bioinformatics.org] On Behalf Of Dr. Christoph Gille Sent: Thursday, December 15, 2005 5:15 PM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] please tell me your opinion about the new BLAST API Though many things like loops and string processing in Java are still not as easy and as performant as in other languages Java has to my opinion gained much attractiveness as a scripting language since version 1.5. Java (SUN) is surptisingly as fast as GNU-C/G++ on Intel/AMD and hence can be used for number crunching (it is really true). I have just added BLAST functionality to the open source toolbox STRAP and would like you to have a look at the API. These are wrappers for the EBI server and the local blast programs NCBI-blast and WU-blast. I could add more in the future. http://www.charite.de/bioinf/strap/Scripting.html Please tell me whether it is concise and well structured or whether there are methods missing. It is important that I fix problems just now before other people use the BLAST interface in their projects. Biojava has been lacking a wrapper for BLAST search sofar but there are parsers for the XML output in Biojava. Thus STRAP toolbox is complementary to Biojava and both can work together. Please also send suggestion for even tighter integrations of STRAP and Biojava. Life scientists using the GUI for blasting have two advantages over using a Web mask: 1. Multiple queries: A series of blast jobs for a number of query sequences can be started. 2. Cache: Blast results are stored in a HD cache and are computed only once. Subsequently, an identical query yields the result immediately. Please send your comments and suggestions Christoph _______________________________________________ Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From kani at mbu.iisc.ernet.in Tue Dec 13 01:01:23 2005 From: kani at mbu.iisc.ernet.in (kani at mbu.iisc.ernet.in) Date: Tue, 13 Dec 2005 11:31:23 +0530 (IST) Subject: [BiO BB] reg protein codes Message-ID: <3393.10.51.30.15.1134453683.squirrel@10.51.30.15> hi i'm a bioinformatics student,i'm trying to write C program for calculating the interatomic distance from protein coordinates files. but i'm unable to get right one,could anyone plz help me to finish the codes thanks regards raghul From Nadia.Bolshakova at cs.tcd.ie Tue Dec 13 04:15:59 2005 From: Nadia.Bolshakova at cs.tcd.ie (Nadia Bolshakova) Date: Tue, 13 Dec 2005 09:15:59 -0000 Subject: [BiO BB] Call for Special Track "BIOINFORMATICS and its MEDICAL APPLICATIONS" Message-ID: <007201c5ffc5$d4b99720$6e26e286@DBNJK90J> The 19th IEEE Symposium on Computer-Based Medical Systems Special Track: BIOINFORMATICS and its MEDICAL APPLICATIONS Special Track Webpage: http://www.cs.tcd.ie/Nadia.Bolshakova/CBMS_Bioinformatics06.html CALL FOR PAPERS Major computational challenges have been raised in the post-genomic era. Novel computational methods and approaches are required to acquire, store, organize, archive, analyse and visualize the large amount of biological and biomedical data. The goal of the track is to share ideas related to bioinformatics challenge among biological, biomedical and computer scientists. Authors are invited to submit original papers addressing any computational biology issue. Papers are invited (but not limited) to the following key themes: Biomedical Research Evolution and Phylogenetics Data Mining in Bioinformatics Microarray Analysis RNAi Analysis Sequence Alignment Pathways, Networks, Systems Biology Functional Genomics Visualization Protein Structure and Analysis Comparative Genomics Pattern Recognition Ontologies Software Systems Unlike workshops, where position papers and reports on initial and intended work are appropriate, papers selected for a special track should report on significant unpublished work suitable for publication as a conference paper. More information about the symposium, registration fees, venue can be found here: http://cbms2006.ece.byu.edu. IMPORTANT DATES January 31, 2006 Submission of (6-page, maximum) paper March 1, 2006 Notification of acceptance April, 5, 2006 Final camera-ready paper due April, 8, 2006 Pre-registration deadline May 22, 2006 Hotel room reservations due You must pre-register to have your paper published in the proceedings. If you only plan to attend and are not submitting a paper, pre-registration is still strongly encouraged. This conference is space-limited, and registration may not be available on-site. SUBMISSION PROCEDURES FOR PAPER No hardcopy submissions are being accepted. Electronic submissions of original technical research papers will only be accepted in PDF format. File size is limited to 2 MB. Use a maximum of six A4 pages, including figures and references. Include one cover sheet, stating the track title (Special Track on Bioinformatics and its Medical Applications), paper title, authors, technical area(s) covered in the article, corresponding author's information (telephone, fax, mailing address, e-mail address), and your preference for oral or poster presentation. Author names should appear only on the cover sheet, not on the paper. Submit your manuscript no later than January 31, 2006. Authors will be notified of acceptance by March 1, 2006 after a review process by three independent experts. Each accepted paper to the Special Track on Bioinformatics and its Medical Applications will be published in the conference proceedings by IEEE CS Press, conditional upon the author's advance registration. Papers that were not accepted by the Program Committee of the track can be considered for publication as regular submissions by the General Program Committee of IEEE CBMS 2006. Please note that the format of IEEE CBMS 2006 proceedings will be the IEEE Computer Science Press 8.5x11-inch format. Submission is encouraged in this format. For more details please see the website of IEEE CBMS 2006: (http://cbms2006.ece.byu.edu/how.html#submission). All submissions will be done electronically via the CBMS web submission system. TRACK CHAIR Nadia Bolshakova -Trinity College Dublin, Ireland TRACK PROGRAM COMMITTEE Francisco Azuaje, University of Ulster, Northern Ireland Nadia Bolshakova, Trinity College Dublin, Ireland Fernando Martin-Sanchez, Institute of Health "Carlos III", Spain James McInerney, National University of Ireland Heather Ruskin, Dublin City University, Ireland For further questions, please contact Nadia.Bolshakova at cs.tcd.ie -------------- next part -------------- An HTML attachment was scrubbed... URL: From idoerg at burnham.org Wed Dec 14 21:18:01 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 14 Dec 2005 18:18:01 -0800 Subject: [BiO BB] Announcement: JAFA, a function prediction metaserver Message-ID: <43A0D259.9090909@burnham.org> Hi, The Godzik lab at the Burnham Institute for Medical Research is happy to announce a new function prediction meta-server: Joint Assembly of Functional Annotation (JAFA). JAFA accepts a protein sequence, queries several function prediction servers, and presents the results in an easy to read display. Function predictions are shown as Gene Ontology (GO) terms. You can see at a glance which program predicted which function, and a prediction score. You can further click through to the predicting servers themselves, or receive an overall graphic view of the location of the predicted terms in the GO graph. A downloadable XML file enables the incorporation of JAFA into your own workflow. Please go here: http://jafa.burnham.org We hope you will find JAFA useful and interesting. It is still quite beta-ish, so feature requests and bug reports are more than welcome. Please email: jafateam at gmail.com With your praises and rants. In the next few weeks we aim to add more servers to the JAFA set of queried servers. Suggestions are most welcome. In the future, we will enable the querying of function to structure predictors as well. Please pass this email along to your friends, and post it on mailing lists you deem relevant. Regards, Iddo Friedberg and Tim Harder -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9949 http://ffas.ljcrf.edu/~iddo From sourangshu at csa.iisc.ernet.in Fri Dec 16 03:36:33 2005 From: sourangshu at csa.iisc.ernet.in (Sourangshu Bhattacharya) Date: Fri, 16 Dec 2005 14:06:33 +0530 (IST) Subject: [BiO BB] reg protein codes In-Reply-To: <3393.10.51.30.15.1134453683.squirrel@10.51.30.15> References: <3393.10.51.30.15.1134453683.squirrel@10.51.30.15> Message-ID: your can take it from me.. just drop into my lab: machine learning lab, room 251, CSA... Sourangshu Sourangshu Bhattacharya PhD Student, Dept. of Computer Science & Automation, IISc, Bangalore. http://people.csa.iisc.ernet.in/sourangshu On Tue, 13 Dec 2005 kani at mbu.iisc.ernet.in wrote: > hi > i'm a bioinformatics student,i'm trying to write C program for > calculating the interatomic distance from protein coordinates files. > but i'm unable to get right one,could anyone plz help me to finish the codes > thanks > regards > raghul > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From bader at cbio.mskcc.org Fri Dec 16 14:28:03 2005 From: bader at cbio.mskcc.org (Gary Bader) Date: Fri, 16 Dec 2005 14:28:03 -0500 Subject: [BiO BB] Announcing Cytoscape Version 2.2 Message-ID: <43A31543.4020603@cbio.mskcc.org> Greetings, Cytoscape is an open-source bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data. http://www.cytoscape.org New version 2.2 Features: For users: * Improved node/edge attribute browsing and editing * Cytoscape Graph Editor v1.0 * Support for loading Gene Ontology OBO and gene annotation (association) files * Cytoscape panels (CytoPanels) to ease window management * New GML visual style to manage visual attributes from GML files * Independent internal windows for easy network comparison * Simplified mechanism for saving Visual Styles in between sessions * Improved performance * Many bugs fixed For programmers: * Improved Attribute API (CyAttributes) Best regards, The Cytoscape Collaboration http://cytoscape.org/people.php From drlivesay at csupomona.edu Fri Dec 16 15:33:23 2005 From: drlivesay at csupomona.edu (Dr. Dennis R. Livesay) Date: Fri, 16 Dec 2005 12:33:23 -0800 Subject: [BiO BB] 2 Postdoctoral positions in computational biophysics and bioinformatics Message-ID: <0884387A56F94E469988DFD7D01097BF3ECB43@EXCH01.win.csupomona.edu> (2) Postdoctoral Research Associate Positions Computational Biophysics, Biophysical Chemistry, Bioinformatics Applications are invited for two postdoctoral Research Associate positions starting in March 2006. A challenge of biophysics is to understand protein folding, stability, flexibility, and function in terms of structure and solvent/thermodynamic conditions. This project builds upon prior success of using graph-rigidity algorithms to identify flexible and rigid regions in proteins, and development of the Distance Constraint Model (DCM). The DCM is based on the hypothesis that network rigidity is an underlying mechanism for enthalpy-entropy compensation, yielding an algorithm to account for non-additivity in free energy decompositions. A prototype DCM will be extended to include explicit modeling of essential enthalpy-entropy compensation mechanisms that include: hydration, hydrophobic and electrostatic interactions, and residue-specific component enthalpy and entropy parameterization. These improvements will enable us to predict protein stability in mixed solvent conditions, including heat and cold denaturation. There are three distinct project goals. Goal (i): Development and release of a fast computational tool that harmoniously quantifies stability and flexibility in practical computing times. Goal (ii): To construct a publicly accessible QSFR database providing users access to protein stability and flexibility data in conjunction with user-friendly analysis tools to investigate Quantified Stability/Flexibility Relationships. Goal (iii): Application of the DCM to investigate protein flexibility and stability. For example, the local-details of protein flexibility can be quantified to identify correlated atomic motions, important for induced fit and allosteric conformational changes, at a given thermodynamic condition. Moreover, this software tool will be used to elucidate evolutionarily conserved QSFR within protein families and superfamilies. We are looking for highly motivated candidates with a Ph.D. in Biophysics, Physical Chemistry, Physics, Computer Science, or related discipline. Candidates should have prior experience in the area of Computational Biophysics and/or Bioinformatics. The successful candidates must have demonstrated ability to conduct research independently, and ability to work in an interdisciplinary environment that requires team effort. Prior postdoctoral experience is desirable. The pay scale will be based on standard NIH rates, commensurate with experience. Both candidates must have strong skills and a record of accomplishment in C/C++ programming. Knowledge of UNIX/LINUX operating systems is essential. Both Research Associates will routinely work with high performance (grid and cluster) computing at the operational and algorithmic levels. They will be working closely with one another, Dr. Donald J. Jacobs (DJJ) at the University of North Carolina at Charlotte, Dr. Dennis R. Livesay (DRL) at Cal Poly Pomona, students and collaborators. One Associate will reside in the group of DJJ with emphasis on biophysical modeling. The candidate interested in this position must have a strong background in statistical mechanics and chemical thermodynamics. The other Associate will reside in the group of DRL with emphasis on Computational Biology as it pertains to protein modeling. The candidate interested in this position must have experience with database design, MySQL and CGI programming. Knowledge of basic sequence and structure analysis techniques is essential. Impressive computing resources are available at both sites for both Associates. Further information can be obtained at www.physics.uncc.edu/PhysStaff/djacobs/djacobs.html or by e-mail to DJJ (djacobs1 at email.uncc.edu) and at http://www.csupomona.edu/~drlivesay or by e-mail to DRL (drlivesay at csupomona.edu). Applicants interested in biophysical modeling should send a curriculum vitae, graduate transcripts, a research statement, and arrange for three letters of reference to be sent to Prof. Donald J. Jacobs, Dept. of Physics and Optical Science, UNC Charlotte, 9201 University City Blvd., Charlotte, NC 28223. Applicants interested in computational biology should send their materials to Prof. Dennis R. Livesay, Dept. of Chemistry, Cal Poly Pomona, 3801 W. Temple Ave, Pomona, CA 91768. Both positions are supported by the recently funded NIH project entitled ?Predicting Protein Stability and Flexibility.? Full consideration will be to applications received before February 12, 2006. UNC Charlotte and Cal Poly Pomona are EOE/AA employers. From muratem at eng.uah.edu Fri Dec 16 16:24:32 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Fri, 16 Dec 2005 15:24:32 -0600 (CST) Subject: [BiO BB] Re: [Bioclusters] [BLAST-Announce #053] BLAST 2.2.13 released] In-Reply-To: <439EEFE1.6080302@scalableinformatics.com> References: <439EEFE1.6080302@scalableinformatics.com> Message-ID: Greetings I ran a test on the the 2.2.13 BLAST that I thought the list might find interesting. (Thanks to Joe Landman for putting it on the list, I thought I was on that particular NCBI distribution but never saw the message until Joe put it here.) I made a database out of the TIGR gene index for pig, SSGI.021105 I made a test query out of 200 70mers based on 5 sequences from SSGI.021105. I ran on a simple test on a Solaris/Sparc64 using time blastall [-V F] -p blastn -i test.fa -d SSGI.021105 > test.blast yielding 2.2.12 2.2.13 no -V 2.2.13 with -V F user 1m 39.8s 1m 50.3s 0m 9.2s real 1m 46.9s 1m 57.7 0m 9.5s sys 0m 6.3s 0m 6.3s 0m 0.3s A diff on the output between v.12 and v.13(no V -F) produces only differences in the header because the version numbers are different. The diff between v.13(no V -F) and v.13 produces small differences in the scores and e-values. For example: < NP1115201 GB|AY277629.1|AAQ24490.1 type XVII collagen [Sus scrofa] 139 4e-33 < AJ656543 34 0.18 < BP160342 32 0.73 < TC237977 32 0.73 < TC212301 32 0.73 < CJ009554 30 2.9 < AJ657806 30 2.9 < BI344938 30 2.9 < AJ651764 30 2.9 < TC220684 UP|S3A3_HUMAN (Q12874) Splicing factor 3A subunit 3 (Sp... 30 2.9 --- > NP1115201 GB|AY277629.1|AAQ24490.1 type XVII collagen [Sus scrofa] 138 6e-33 > AJ656543 34 0.19 > BP160342 32 0.75 > TC237977 32 0.75 > TC212301 32 0.75 > CJ009554 30 3.0 > AJ657806 30 3.0 > BI344938 30 3.0 > AJ651764 30 3.0 > TC220684 UP|S3A3_HUMAN (Q12874) Splicing factor 3A subunit 3 (Sp... 30 3.0 So, it looks to be significantly faster. Cheers Mike From marty.gollery at gmail.com Sun Dec 18 15:04:00 2005 From: marty.gollery at gmail.com (Martin Gollery) Date: Sun, 18 Dec 2005 12:04:00 -0800 Subject: [BiO BB] Re: [Bioclusters] [BLAST-Announce #053] BLAST 2.2.13 released] In-Reply-To: References: <439EEFE1.6080302@scalableinformatics.com> Message-ID: Wow- that is a significant speedup! I don't mind the minor score differences either- does anyone else? Marty On 12/16/05, Mike Muratet wrote: > > Greetings > > I ran a test on the the 2.2.13 BLAST that I thought the list might find > interesting. (Thanks to Joe Landman for putting it on the list, I thought > I was on that particular NCBI distribution but never saw the message until > Joe put it here.) > > I made a database out of the TIGR gene index for pig, SSGI.021105 > > I made a test query out of 200 70mers based on 5 sequences from > SSGI.021105. > > I ran on a simple test on a Solaris/Sparc64 using > > time blastall [-V F] -p blastn -i test.fa -d SSGI.021105 > test.blast > > yielding > > 2.2.12 2.2.13 no -V 2.2.13 with -V F > user 1m 39.8s 1m 50.3s 0m 9.2s > real 1m 46.9s 1m 57.7 0m 9.5s > sys 0m 6.3s 0m 6.3s 0m 0.3s > > A diff on the output between v.12 and v.13(no V -F) produces only > differences in the header because the version numbers are different. > > The diff between v.13(no V -F) and v.13 produces small differences in the > scores and e-values. For example: > > < NP1115201 GB|AY277629.1|AAQ24490.1 type XVII collagen [Sus scrofa] > 139 4e-33 > < AJ656543 > 34 0.18 > < BP160342 > 32 0.73 > < TC237977 > 32 0.73 > < TC212301 > 32 0.73 > < CJ009554 > 30 2.9 > < AJ657806 > 30 2.9 > < BI344938 > 30 2.9 > < AJ651764 > 30 2.9 > < TC220684 UP|S3A3_HUMAN (Q12874) Splicing factor 3A subunit 3 (Sp... > 30 2.9 > --- > > NP1115201 GB|AY277629.1|AAQ24490.1 type XVII collagen [Sus scrofa] > 138 6e-33 > > AJ656543 > 34 0.19 > > BP160342 > 32 0.75 > > TC237977 > 32 0.75 > > TC212301 > 32 0.75 > > CJ009554 > 30 3.0 > > AJ657806 > 30 3.0 > > BI344938 > 30 3.0 > > AJ651764 > 30 3.0 > > TC220684 UP|S3A3_HUMAN (Q12874) Splicing factor 3A subunit 3 (Sp... > 30 3.0 > > So, it looks to be significantly faster. > > Cheers > > Mike > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From muratem at eng.uah.edu Mon Dec 19 13:34:29 2005 From: muratem at eng.uah.edu (Mike Muratet) Date: Mon, 19 Dec 2005 12:34:29 -0600 (CST) Subject: [BiO BB] mpiblast error ObjMgrNextAvailEntityID failed In-Reply-To: <439EEFE1.6080302@scalableinformatics.com> References: <439EEFE1.6080302@scalableinformatics.com> Message-ID: Greetings This question appears to have come up before on these lists, but I don't see an answer. I recall we've posted about this problem before but I don't think we ever got the problem resovled. I created two databases using the commands below from the Ensembl human and mouse databases. I tried to run a query with: setenv data_root /scratch/uahmam setenv blast_root /opt/asn/apps/mpiBlast_1.3.0/latest/bin setenv MPI_REQUEST_MAX 128000 setenv MPI_TYPE_MAX 128000 echo "blast_root is "$blast_root echo "data_root is "$data_root cd /scratch/uahmam time /usr/bin/mpirun -np 8 $blast_root/mpiblast --remove -p blastn -d Homo_sapiens.NCBI35.nov.cdna.fa -i novel_v11.fa -W 9 -r 1 -q -1 -G 1 -E 3 -o novel_v_Hsen.blast where my .ncbirc file is as below: [mpiBLAST] Shared=/scratch/uahmam Local=/scratch/uahmam [NCBI] Data= /opt/asn/apps/ncbi_6.1/data [BLAST] BLASTDB=/scratch/uahmam BLATMAT=/opt/asn/apps/ncbi_6.1/data mpiformatdb --nfrags=6 -t enHs -i Homo_sapiens.NCBI35.nov.cdna.fa -p F -n enHs mpiformatdb --nfrags=6 -t enMm -i Mus_musculus.NCBIM34.nov.cdna.fa -p F -n enMm I get nothing but a million+ lines of: ObjMgrNextAvailEntityID failed with idx 2048 It fills up my disk quota and crashes so I get nothing else. I have found this line in the source code, but I can't figure out what's going on from that. mpiblast seems to work OK for other databases/queries. Can anybody offer any suggestions? Thanks Mike From angshu96 at gmail.com Mon Dec 19 17:47:46 2005 From: angshu96 at gmail.com (Angshu Kar) Date: Mon, 19 Dec 2005 17:47:46 -0500 Subject: [BiO BB] arabidopsis + biosql Message-ID: Hi, I'm not fully sure whether to post this question in this community. But I feel those who are working in plant genomics using bioperl can possibly answer this. I'm trying to use load_seqdatabase.pl to load data into the biosql schema.Can anyone please suggest an arabidopsis data file source that has all the additional information (probably GENBANK format) but only holds the CDSs? Thanks, Angshu -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.gille at charite.de Mon Dec 19 21:07:26 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 20 Dec 2005 03:07:26 +0100 (CET) Subject: [BiO BB] BLAST with multiple queries Message-ID: <61204.84.190.6.230.1135044446.squirrel@webmail.charite.de> >From recent posts I learned that modern BLAST implementations might be faster if many queries are processed simultaneously since the database is scanned only once. I wonder whether I should take this into account in the Java API SequenceBlaster http://www.charite.de/bioinf/strap/javadoc/charite/christo/interfaces/SequenceBlaster.html and would like to ask you for your opinion. The disadvantage would be that the programming interface would become more difficult even though most users will not use this modern feature of multiple sequences. There would be some changes necessary: I would need to turn setAAQuerySequence(String s) into setAAQuerySequences(String[] ss) In the same way I would need to change getQuerySequence() and setNTQuerySequence(String s) into plural. The blast result would be retrieved giving the result number as parameter: String getResultXml(int number) Currently I delete the cache entry by CacheResult.remove(SequenceBlaster) For this purpose I would need to make a new instance method, something like SequenceBlaster#removeInCache(int resultNumber); From hamid at ibb.ut.ac.ir Tue Dec 20 12:21:57 2005 From: hamid at ibb.ut.ac.ir (hamid) Date: Tue, 20 Dec 2005 20:51:57 +0330 Subject: [BiO BB] Comparing whole genome sequences Message-ID: I want to compare (align) whole genome of several organisms. How can I do that? there are several tools such as mVISTA that can align multiple preprocessed sequences but I want to compare my own organisms. for example if I want to compare 3 strains of brusella. Thanks From christoph.gille at charite.de Tue Dec 20 14:04:05 2005 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 20 Dec 2005 20:04:05 +0100 (CET) Subject: [BiO BB] BALiBASE is a benchmark for alignment procedures Message-ID: <43184.192.168.220.204.1135105445.squirrel@webmail.charite.de> The performance of different sequence alignment methods can be assessed using BALiBASE. I have embedded BALiBASE in STRAP and started to compute a table of score values for different alignment procedures:ClustalW, Mafft5,T_Coffee, Muscle, Align_m, Dialign2, DialignT, Probcons2, MultiAlignerNeoBio, JAligner http://www.charite.de/bioinf/strap/dataFiles/BALiBASE_result.rsc Using the GUI it is very simple to compute a table for any set of alignment methods. But it takes quite a lot of time (~ 48h ) and the entire table is not finished yet. Computation is performed with default parameters for all alignment methods. NeoBio and JAligner are pair alignment methods and had been turned into multiple sequence alignment methods using a very primitive approach. So it is not completely fair to compare them to those that are primarily multiple sequence aligners. From fiedler at cshl.edu Wed Dec 21 12:10:20 2005 From: fiedler at cshl.edu (Tristan Fiedler) Date: Wed, 21 Dec 2005 12:10:20 -0500 Subject: [BiO BB] Re: Comparing Whole Genome Sequences In-Reply-To: <20051221170109.BF03B368F98@primary.bioinformatics.org> References: <20051221170109.BF03B368F98@primary.bioinformatics.org> Message-ID: <294947f49a1d64b41663a7e90bd6dc1a@cshl.edu> The Blastz algorithm has been optimized for genome-scale alignments. Pre-processing of sequences to mask low complexity region, repeats, vector contamination, poor quality sequence, tandem repeats, etc may be needed however. References : http://pipmaker.bx.psu.edu/dist/blastz.pdf http://www.pnas.org/cgi/content/abstract/100/20/11484 Cheers, Tristan --- Tristan J. Fiedler Postdoctoral Fellow - Stein Lab Cold Spring Harbor Laboratory On Dec 21, 2005, at 12:01 PM, bio_bulletin_board-request at bioinformatics.org wrote: > > > Message: 1 > Date: Tue, 20 Dec 2005 20:51:57 +0330 > From: "hamid" > Subject: [BiO BB] Comparing whole genome sequences > To: bio_bulletin_board at bioinformatics.org > Message-ID: > Content-Type: text/plain > > I want to compare (align) whole genome of several organisms. How can I > do > that? there are several tools such as mVISTA that can align multiple > preprocessed sequences but I want to compare my own organisms. for > example > if I want to compare 3 strains of brusella. > Thanks > From sulakhe at mcs.anl.gov Wed Dec 21 01:07:51 2005 From: sulakhe at mcs.anl.gov (Dinanath Sulakhe) Date: Wed, 21 Dec 2005 00:07:51 -0600 Subject: [BiO BB] Re: [Bioclusters] [BLAST-Announce #053] BLAST 2.2.13 released] In-Reply-To: References: <439EEFE1.6080302@scalableinformatics.com> Message-ID: <6.2.1.2.2.20051221000125.0431f6a0@pop.mcs.anl.gov> Hi Mike, I tried running 2.2.13 with -V F option, and one thing i noticed is it keeps all the results in the memory and writes to the file only at the end? Has anyone noticed this. If its true, then its difficult to use this option -V F on large input files (thousands of sequences) against a huge database. I ran a blastall for 1000 protein sequences against non-redundant database (ncbi's) using -p blastp and -V F. But i see teh output file is empty even after about 15 mins. But if i run the same job without -V F option i start seeing output. I have formatted the NR database with -v 200 option. Please let me know if you have seen an similar behavior. And is it possible to flush the output to the file so as to reduce the memory usage? Thanks -dina At 03:24 PM 12/16/2005, Mike Muratet wrote: >Greetings > >I ran a test on the the 2.2.13 BLAST that I thought the list might find >interesting. (Thanks to Joe Landman for putting it on the list, I thought >I was on that particular NCBI distribution but never saw the message until >Joe put it here.) > >I made a database out of the TIGR gene index for pig, SSGI.021105 > >I made a test query out of 200 70mers based on 5 sequences from SSGI.021105. > >I ran on a simple test on a Solaris/Sparc64 using > >time blastall [-V F] -p blastn -i test.fa -d SSGI.021105 > test.blast > >yielding > > 2.2.12 2.2.13 no -V 2.2.13 with -V F >user 1m 39.8s 1m 50.3s 0m 9.2s >real 1m 46.9s 1m 57.7 0m 9.5s >sys 0m 6.3s 0m 6.3s 0m 0.3s > >A diff on the output between v.12 and v.13(no V -F) produces only >differences in the header because the version numbers are different. > >The diff between v.13(no V -F) and v.13 produces small differences in the >scores and e-values. For example: > >< NP1115201 GB|AY277629.1|AAQ24490.1 type XVII collagen [Sus scrofa] >139 4e-33 >< AJ656543 34 0.18 >< BP160342 32 0.73 >< TC237977 32 0.73 >< TC212301 32 0.73 >< CJ009554 30 2.9 >< AJ657806 30 2.9 >< BI344938 30 2.9 >< AJ651764 30 2.9 >< TC220684 UP|S3A3_HUMAN (Q12874) Splicing factor 3A subunit 3 (Sp... 30 2.9 >--- >>NP1115201 GB|AY277629.1|AAQ24490.1 type XVII collagen [Sus scrofa] >138 6e-33 >>AJ656543 >34 0.19 >>BP160342 >32 0.75 >>TC237977 >32 0.75 >>TC212301 >32 0.75 >>CJ009554 >30 3.0 >>AJ657806 >30 3.0 >>BI344938 >30 3.0 >>AJ651764 >30 3.0 >>TC220684 UP|S3A3_HUMAN (Q12874) Splicing factor 3A subunit 3 (Sp... >30 3.0 > >So, it looks to be significantly faster. > >Cheers > >Mike >_______________________________________________ >Bioclusters maillist - Bioclusters at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters From ediths at unizh.ch Wed Dec 21 03:51:56 2005 From: ediths at unizh.ch (Edith Schlagenhauf) Date: Wed, 21 Dec 2005 09:51:56 +0100 Subject: [BiO BB] Comparing whole genome sequences In-Reply-To: References: Message-ID: Hi, you may have a look at the MUMmer software (tigr.org). Especially at the NUCmer alignment script which allows for multiple reference and multiple query sequences to be aligned in a many vs. many fashion. HTH, Edith ****************************************** Dr Edith Schlagenhauf Bioinformatics Institute of Plant Biology University of Zurich Zollikerstrasse 107 CH-8008 Zurich SWITZERLAND e-mail: ediths AT botinst DOT unizh DOT ch Tel.: +41 1 634 82 78 Fax : +41 1 634 82 04 ****************************************** On Tue, 20 Dec 2005, hamid wrote: > I want to compare (align) whole genome of several organisms. How can I do > that? there are several tools such as mVISTA that can align multiple > preprocessed sequences but I want to compare my own organisms. for example > if I want to compare 3 strains of brusella. > Thanks > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From hamid at ibb.ut.ac.ir Fri Dec 23 07:39:05 2005 From: hamid at ibb.ut.ac.ir (hamid) Date: Fri, 23 Dec 2005 16:09:05 +0330 Subject: [BiO BB] Re: Comparing Whole Genome Sequences In-Reply-To: <294947f49a1d64b41663a7e90bd6dc1a@cshl.edu> References: <20051221170109.BF03B368F98@primary.bioinformatics.org> <294947f49a1d64b41663a7e90bd6dc1a@cshl.edu> Message-ID: I have downloaded both MUMmer and Blastz but both of them must be compiled in Unix based OSs. Is there any program that compiled under Microsoft platforms? Thanks alot From Yannick.Wurm at unil.ch Fri Dec 23 11:11:08 2005 From: Yannick.Wurm at unil.ch (Yannick Wurm) Date: Fri, 23 Dec 2005 17:11:08 +0100 Subject: [BiO BB] Blast report viewer Message-ID: <92C33925-10D0-4F60-B89E-F811248A26CF@unil.ch> Hi, for one of my projects, I need to manually browse the blast results I have generated. I would like to do this in the most ergonomic way possible. Might you know of a software tool which would load my blast report file, and visually display the different results? A visualization kind of like that you get when blasting at ncbi would be great. I stumbled onto http://www.korilog.org/pub/klblaster.php which looks great... except that its windows-only. Any ideas? Thanks & Season's greetings, yannick __________________________________ yannick.wurm at unil.ch - Doctoral student Department of Ecology and Evolution http://www.unil.ch/dee/page28685.html #3106, Biophore, Universite de Lausanne 1015 Lausanne, Switzerland land: +41.21.692.4182 fax: +41.21.692.4165 cell: +41.78.87.87.001 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aloraine at gmail.com Fri Dec 23 12:02:48 2005 From: aloraine at gmail.com (Ann Loraine) Date: Fri, 23 Dec 2005 11:02:48 -0600 Subject: [BiO BB] Blast report viewer In-Reply-To: <92C33925-10D0-4F60-B89E-F811248A26CF@unil.ch> References: <92C33925-10D0-4F60-B89E-F811248A26CF@unil.ch> Message-ID: <83722dde0512230902t23d5ec9cm9c174dabd1817354@mail.gmail.com> The Web page says the program is written in Java and that the reason the developer distributes it for Windows only is: "I could only find one one freely available and convenient tool to design an installation wizard: NIS (Nullsoft Install System). As soon as I find a tool to create such a wizard for Linux and MacOS-X, I will design installation procedures for these OS." Seems like Java Web Start would solve this distribution problem. It works pretty well with all major platforms and all the user has to do is visit a Web page, like with: http://www.affymetrix.com/support/developer/tools/download_igb.affx Maybe you could ask the developer to provide a Java Web Start page for the program? It does look quite useful. -Ann On 12/23/05, Yannick Wurm wrote: > Hi, > > for one of my projects, I need to manually browse the blast results I have > generated. I would like to do this in the most ergonomic way possible. Might > you know of a software tool which would load my blast report file, and > visually display the different results? A visualization kind of like that > you get when blasting at ncbi would be great. > > I stumbled onto http://www.korilog.org/pub/klblaster.php > which looks great... except that its windows-only. > > Any ideas? > > Thanks & Season's greetings, > > yannick > > > > __________________________________ > yannick.wurm at unil.ch - Doctoral student > Department of Ecology and Evolution > http://www.unil.ch/dee/page28685.html > > #3106, Biophore, Universite de Lausanne > 1015 Lausanne, Switzerland > land: +41.21.692.4182 fax: +41.21.692.4165 > cell: +41.78.87.87.001 > > > _______________________________________________ > Bioinformatics.Org general forum - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From dr_zafi at yahoo.com Sat Dec 24 01:50:36 2005 From: dr_zafi at yahoo.com (Dr Zafar) Date: Fri, 23 Dec 2005 22:50:36 -0800 (PST) Subject: [BiO BB] Inquiry. Message-ID: <20051224065036.65895.qmail@web33411.mail.mud.yahoo.com> Dear Fellows, Greetings! I, am very much interested in doing MS in bioinformatics, but being a physician am I, eligible for this. After taking courses in molecular biology I, found myself very interested in this wonderful field, so it will be very kind if some let me the universities list and tell me about the eligibility. Thanks in advance, Dr Zafar --------------------------------- Yahoo! DSL Something to write home about. Just $16.99/mo. or less -------------- next part -------------- An HTML attachment was scrubbed... URL: From leser at informatik.hu-berlin.de Mon Dec 26 07:35:19 2005 From: leser at informatik.hu-berlin.de (Ulf Leser) Date: Mon, 26 Dec 2005 13:35:19 +0100 (MET) Subject: [BiO BB] CfP: 3rd Int. Workshop on Data Integration for the Life Science Message-ID: 1st Call for Papers; we apologize for cross-posting; =========================================================== DILS 2006 3rd International Workshop on Data Integration for the Life Sciences European Bioinformatics Institute, UK 20. - 22.7.2006 http://www.informatik.hu-berlin.de/dils2006 Proceedings published by Springer LNBI =========================================================== Important Dates =============== * Abstract Submission 2. March 2006 * Paper Submission 9. March 2006 * Author Notification 24. April 2006 * Camera-Ready version 16. May 2006 Aims and Scope ============== Data management and data integration are pertinent problems in the Life Sciences. Today's advances in molecular biology and molecular medicine are almost always underpinned by enormous efforts in data management, automatic data quality assurance, and computational data analysis. Many existing research directions, such as drug development, systems biology, or personalized medicine, critically depend on integrating data sets produced by different experimental methods, in different groups, and at different levels of granularity. Despite more than a decade of intensive research in these areas, there remain many unsolved problems in the fields of data management and data integration. Indeed, these problems are becoming worse, both due to continuous increases in data volumes and the growing diversity in types of data that need to be managed. As an example, integration of data from proteomics mass spectrometry or 2-hybrid assays, experimental techniques that have only been carried out at a high throughput scale for the past 2-3 years, are already pressing requirements in many projects. And the next big challenge is already upon us: The need to integrate genomics and proteomics data with the vast amounts of medical data, collected daily in thousands of hospitals world-wide. DILS 2006 is the third in a workshop series that aims at fostering discussion, exchange, and innovation in research and development in the areas of data integration and data management for the Life Science. DILS 2004 in Leipzig and DILS 2005 in San Diego each attracted each around 100 researchers from all over the world. We invite researchers, professionals, and practitioners to participate and share their knowledge in this forum. Apart from the presentation of peer-reviewed papers, DILS 2006 will have two keynote talks, progress reports from important projects in the field, and a poster session with flash presentations. Topics of Interest ================== Papers must address challenges for data integration and data management in life sciences. A special focus of DILS 2006 lies on papers that address data management issues at the interface between molecular biology and medicine. Thus, topics of interest include, but are not limited to: * Integration of medical and biological data * Data management and data analysis for personalized medicine * Management and modelling of phenotype data * Modelling, storing, and querying biological network data * Managing complex annotations * Support for representing and searching biological and medical data types * Data management for biodiversity * Knowledge representation and reasoning * Algorithms and architectures for data integration problems at all levels * Query processing for biomedical applications * Data mining and data-driven analysis of Life Science data sets * Ontology-based data integration and analysis * Scientific workflows and annotation pipelines * Laboratory information management systems * Methods for modelling and assuring data quality * Integrating bioinformatics and chemoinformatics data * Data management issues in pharmacogenomics Note that papers must have a clear focus on biological or biomedical issues to be considered for acceptance. Paper Submission ================ DILS 2006 invites two types of papers: 1. Research papers describe novel methods, algorithms, or models for solving problems relevant to the scope of DILS. The main focus of such papers must be the advancement of research on data integration or data management. Research papers must not be longer than 16 pages. 2. Systems or experience papers describe innovative realizations of systems solving problems relevant to DILS. These papers need not present a new method, but the systems described must use cutting-edge technology and must be up-and-running. Papers describing commercial solutions are also welcome, as long as sufficient detail of the problem and its technical solution are provided. Systems papers must not be longer than 12 pages. As for previous DILS conferences, accepted papers will be published by Springer in the LNCS (Lecture Notes in Computer Science) / LNBI (Lecture Notes in Bioinformatics) series. All papers must be previously unpublished and may not be under parallel review at any other journal, conference, or workshop. Submissions should be formatted according to LNCS guidelines (www.springer.de/comp/lncs/authors.html). All submissions will be handled electronically. Acceptance of each submission will be based by the expertise of at least three reviewers. DILS 2006 will also have a poster session. Posters are published on the DILS web site, but are not part of the proceedings. Selected posters will have the chance to be presented in a flash presentation session. Venue ===== DILS 2006 will take place at the European Bioinformatics Institute (EBI) in Hinxton, UK (near Cambridge). The EBI is one of the largest centers world-wide for hosting, managing, and developing data management and data analysis systems for the Life Sciences. It hosts and contributes to the maintenance of a number of very important databases in this area, such as UniProt, Ensembl, InterPro, and the European branch of the International Nucleotide Sequence Database. Furthermore, the institute is situated side-by-side with the Wellcome Trust Sanger Institute, one of the biggest genome research institutes in the world. Together, the EBI and the Sanger Centre have created an inspiring atmosphere that has turned Cambridge into of the largest centers for Biotech companies in the United Kingdom. Organization ============ * Ulf Leser, Humboldt-Universitaet zu Berlin, Germany * Barbara Eckman, IBM Healthcare & Life Sciences, USA * Felix Naumann, Humboldt-Universitaet zu Berlin, Germany * Paul Kersey, European Bioinformatics Institute, Hinxton, UK Program Committee ================= Amarnath Gupta, San Diego Supercomputer Center, USA Anthony Kosky, Axiope Inc, USA Arek Kasprzyk, European Bioinformatics Institute, UK Bertram Ludaescher, UC Davis, USA Bertram Weiss, Schering AG, Germany Christian Piepenbrock, Epigenomics AG, Germany David Benton, GlaxoSmithKline, USA Dennis Paul Wall, Harvard Medical School, USA Emmanuel Barillot, Institut Curie, Paris, France Erhard Rahm Universit??t, Leipzig, Germany Floris Geerts, University of Edinburgh and Limburgs Universitair Centrum, UK H V Jagadish, University of Michigan, USA Hasan Jamil, Wayne State University, USA Henning Hermjakob, European Bioinformatics Institute, UK Jacob Koehler, Rothamsted Research, UK Jamie Cuticchia, Research Triangle Institute, USA Jignesh Patel, University of Michigan, USA Joachim Hammer, University of Florida, USA Juergen Eils, Deutsches Krebsforschungszentrum DKFZ, Germany Laure Berti-Equille, Universitaire de Beaulieu, France Limsoon Wong, Institute for Infocomm Research, Singapore Louiqa Raschid, University of Maryland, USA Michael Schroeder, Technical University Dresden, Germany Mike Hogarth, UC Davis, USA Monica Scannapieco, University of Rome "La Sapienza", Italy Nick Murall, LION Bioscience, Germany/UK Norman Paton, University of Manchester, UK Otto Ritter, AstraZeneca, USA Paula Matuszek, GlaxoSmithKline Beecham, USA Peter Bunemann, University of Edinburgh, UK Peter Karp, SRI International, USA Peter Mork, Univ. of Washington, Seattle, USA Sharon Wang, IBM Healthcare and Life Sciences, USA Stefan Jablonski, Univ. Erlangen-Nuernberg, Germany Terence Critchlow, Lawrence Livermore National Laboratory, USA Vipul Kashyap, Partners HealthCare System, USA From rajat at isical.ac.in Tue Dec 27 02:44:22 2005 From: rajat at isical.ac.in (Rajat K. De) Date: Tue, 27 Dec 2005 13:14:22 +0530 Subject: [BiO BB] Invitation for paper submission Message-ID: <43B0F0D6.8040306@isical.ac.in> (Please ignore if you have already received this mail.) Dear Colleague, I have been invited to organize a Session titled "Applications of Pattern Recognition and Machine Learning to Functional Genomics" in the 10th World Multi-Conference on Systemics, Cybernetics and Informatics: WMSCI 2006 (http://www.iiisci.org/wmsci2006/invitedsession/InvitedSessionPre.asp), to be held at Orlando, Florida, USA, during July 16-19, 2006. In this context, I would invite you to submit paper(s) in the said Session. The goal of the Session is to present the current state-of-the-art relating to the applicability of pattern recognition and machine learning approaches to the problems of bioinformatics and computational biology. This will facilitate collaboration among pattern recognition, machine learning researchers and biologists, which will further help them in solving complex problems of computational biology as well as enriching the literature of pattern recognition and machine learning. The papers submitted to me for the said Session will be peer reviewed and then the finally accepted papers will be included in the Conference Proceedings. The authors may submit their contributions on a few of the following sample areas which are not exhaustive. Normalization of microarray gene expression data Gene clustering/classification for finding co-expressed/coregulated genes Classification/clustering of diseased vs. normal samples Identification of function of a gene Identification of genes responsible for a particular disease Determination of the interaction between protein-protein and protein-gene (protein-protein network and gene-protein network) Comparative genomics Evolution and phylogenetics Identification of molecular structures Protein folding problems Classification of proteins Identification of active sites of proteins Pathway analysis related to metabolism and signal transduction mechanism Chairs of invited sessions will select the best paper presented at their session. Sessions' best papers will be reviewed by reviewers of the Journal of Systemics, Cybernetics, and Informatics (JSCI) in order to select the best 30% of them for their respective publications in the Journal. Some important points to note while submitting your contributions: 1. Size of the paper: Paper drafts should have 2000 to 5000 words, in English. Author(s) with names, addresses, telephone and fax numbers, and e-mail addresses should be included. Each author making a submission should necessarily suggest at least one and a maximum of three reviewers. 2. How to submit paper: Please e-mail the pdf/word version of the paper through e-mail (rajat at isical.ac.in) to me. 3. Deadline of submission: January 31, 2006 I look forward to receive your contributions and to see you in the Conference. Best regards, Rajat K. De Organizer, Session on Applications of Pattern Recognition and Machine Learning to Functional Genomics, WMSCI 2006 **************************************************** * Rajat K. De, Ph.D. | Tel * * Assistant Professor | off : +91-33-25753105/3100 * * Machine Intelligence Unit | * * Indian Statistical Institute | res : +91-33-25796860 * * 203 B.T. Road | Fax : +91-33-25783357 * * Calcutta 700108 | +91-33-25773035 * * INDIA. | mobile : +91-9433008009 * **************************************************** From Jason.H.Moore at Dartmouth.EDU Fri Dec 30 15:10:06 2005 From: Jason.H.Moore at Dartmouth.EDU (Jason H. Moore) Date: 30 Dec 2005 15:10:06 EST Subject: [BiO BB] 2006 BioGEC Workshop Message-ID: <54208086@newdancer.Dartmouth.EDU> WORKSHOP ON BIOLOGICAL APPLICATIONS OF GENETIC AND EVOLUTIONARY COMPUTING (BioGEC) http://www.epistasis.org/biogec2006.html To be held as part of the 2006 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO-2006) July 8-12, 2006 (Saturday-Wednesday) Renaissance Seattle Hotel http://marriott.com/property/propertypage/SEASM Seattle, Washington, USA Organized by ACM SIG-EVO http://www.sigevo.org/gecco-2006/ SUMMARY The field of Genetic and Evolutionary Computation (GEC) has greatly benefited by borrowing ideas from the biological sciences. Recently, it has become clear that GEC can help solve biological problems, and thereby irepay its debti. The fifth annual workshop on Biological Applications of Genetic and Evolutionary Computation (BioGEC), organized in connection with the 2006 Genetic and Evolutionary Computation Conference (GECCO-2006) in Seattle, Washington, USA, is intended to explore and critically evaluate the application of GEC to biological problems. Specifically, the goal is to bring biologists and computer scientists together to foster an exchange of ideas that will yield emergent properties that will move the field forward in unpredictable ways. The 2006 BioGEC workshop will span two four-hour sessions. The first session will feature a community analysis of a real biological dataset. An important feature of this session is that the biologist that generated the data will be present to provide feedback on the results. The second session will feature poster presentations of new or incomplete work in the BioGEC domain. The goal of this session is to provide a forum for receiving critical feedback on ideas and research results that might not yet be mature. More details on both sessions are provided below in the Request for Papers section. REQUEST FOR PAPERS Session 1 (4 hours): Community Analysis of a Biological Dataset Papers are requested that report results from the analysis of a published human scleroderma microarray dataset using genetic algorithms (GA), genetic programming (GP), evolutionary computing (EC), grammatical evolution (GE), ant colony optimization (ACO), estimation of distribution algorithms (EDA), artificial immune systems, or any other biologically-inspired algorithm or method. The data were originally published by Whitfield et al. in PNAS (100:12319-24, 2003). The PNAS paper is freely available from http://www.pubmedcentral.gov/picrender.fcgi?artid=218756&blobtype=pdf. The raw data and the filtered/normalized data are freely available from http://genome-www.stanford.edu/scleroderma/data.shtml. The scleroderma question addressed and the biologically-inspired approach used is open-ended and up to the authors. However, all papers should include a section at the end titled iBiological Inferencei that summarizes the biological interpretations and inferences drawn from the analyses. Dr. Whitfield will attend the BioGEC workshop to provide feedback about the results and to answer questions about the disease endpoints and the data. Please send 4-8 page papers (PDF or Word) to Jason H. Moore (jason.h.moore at dartmouth.edu) and Marylyn D. Ritchie (ritchie at chgr.mc.vanderbilt.edu) by March 31, 2006. The format for manuscripts submitted to the workshop is that used for ACM SIG proceedings. Paper templates can be found at http://www.acm.org/sigs/pubs/proceed/template.html. Please see the GECCO-2006 website for additional details. Authors of accepted paper will be invited to give oral presentation at the Workshop. There are also plans to publish the best papers in an online journal. Session 2 (4 hours): Critical Feedback Poster Session Papers are requested that report new or otherwise preliminary results and ideas. Papers presenting new hypotheses without experimental results are encouraged. For example, students could present thesis or dissertation ideas. Authors of accepted papers will be expected to present a poster at the workshop. This will provide a forum for critical feedback from participants. Please send 1-3 page papers (PDF or Word) to Jason H. Moore (jason.h.moore at dartmouth.edu) and Marylyn D. Ritchie (ritchie at chgr.mc.vanderbilt.edu) by March 31, 2006. The format for manuscripts submitted to the workshop is that used for ACM SIG proceedings. Paper templates can be found at http://www.acm.org/sigs/pubs/proceed/template.html. Please see the GECCO-2006 website for additional details. Important Dates March 31, 2006: papers due April 5, 2006: acceptance notices April 19, 2006: camera ready revisions due July 8 or 9, 2006: BioGEC workshop WORKSHOP ORGANIZERS Jason H. Moore, Ph.D. Dartmouth College http://www.epistasis.org jason.h.moore at dartmouth.edu Marylyn D. Ritchie, Ph.D. Vanderbilt University http://chgr.mc.vanderbilt.edu ritchie at chgr.mc.vanderbilt.edu