From codeshepherd at gmail.com Mon Jul 2 07:10:53 2007 From: codeshepherd at gmail.com (=?ISO-8859-1?Q?Dee=FE=E0n_Chakravarth=FF?=) Date: Mon, 02 Jul 2007 19:10:53 +0800 Subject: [BiO BB] parallel clustalw Message-ID: <4688DD3D.5020305@gmail.com> Hi All, Am trying to compile parallel version of clustalw on Cent OS 4. I downloaded the package from ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/ParClustal0.2.tar.gz I have ncbi tool kit installed in my machine. When I run make I get the following error. : undefined reference to `seek_si_sj_to_calculate' slaves.o(.text+0x687): In function `main_processes_slaves': : undefined reference to `subpairalign' collect2: ld returned 1 exit status make: *** [clustalx] Error 1 I had attached the complete output of make, uname and gcc version below. $ uname -a Linux panther5.nus.edu.sg 2.6.9-42.0.10.ELsmp #1 SMP Tue Feb 27 10:11:19 EST 2007 i686 i686 i386 GNU/Linux ]$ gcc -v Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.6/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-java-awt=gtk --host=i386-redhat-linux Thread model: posix gcc version 3.4.6 20060404 (Red Hat 3.4.6-8) $ make cc -c -O -I/usr/include/openmpi/ init.c cc -c -O -I/usr/include/openmpi/ interface.c cc -c -O -I/usr/include/openmpi/ readseq.c cc -c -O -I/usr/include/openmpi/ writeseq.c cc -c -O -I/usr/include/openmpi/ showpair.c cc -c -O -I/usr/include/openmpi/ malign.c cc -c -O -I/usr/include/openmpi/ util.c cc -c -O -I/usr/include/openmpi/ trees.c cc -c -O -I/usr/include/openmpi/ gcgcheck.c cc -c -O -I/usr/include/openmpi/ prfalign.c cc -c -O -I/usr/include/openmpi/ pairalign.c cc -c -O -I/usr/include/openmpi/ calcgapcoeff.c cc -c -O -I/usr/include/openmpi/ calcprf1.c cc -c -O -I/usr/include/openmpi/ calcprf2.c cc -c -O -I/usr/include/openmpi/ readtree.c cc -c -O -I/usr/include/openmpi/ seqweight.c cc -c -O -I/usr/include/openmpi/ readmat.c cc -c -O -I/usr/include/openmpi/ alnscore.c cc -c -O -I/usr/include/openmpi/ random.c cc -c -O -I/usr/include/openmpi/ motifs.c cc -c -O -I/usr/include/openmpi/ bionj.c cc -c -O -I/usr/include/openmpi/ slaves.c cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF -I/home/griduser/clustalw/toolbox/ncbi/include xutils.c cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF -I/home/griduser/clustalw/toolbox/ncbi/include xmenu.c cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF -I/home/griduser/clustalw/toolbox/ncbi/include xcolor.c cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF -I/home/griduser/clustalw/toolbox/ncbi/include xdisplay.c cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF -I/home/griduser/clustalw/toolbox/ncbi/include xscore.c cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF -I/home/griduser/clustalw/toolbox/ncbi/include clustalx.c cc -o clustalx init.o interface.o readseq.o writeseq.o showpair.o malign.o util.o trees.o gcgcheck.o prfalign.o pairalign.o calcgapcoeff.o calcprf1.o calcprf2.o readtree.o seqweight.o readmat.o alnscore.o random.o motifs.o bionj.o slaves.o xutils.o xmenu.o xcolor.o xdisplay.o xscore.o clustalx.o -O -lm -lmpi -lrt -pthread -L/home/griduser/clustalw/toolbox/ncbi/lib -L/usr/lib/openmpi/ -L/usr/X11R6/lib -lvibrant -lncbi -lpthread -lXm -lXmu -lXt -lX11 -lm /home/griduser/clustalw/toolbox/ncbi/lib/libncbi.a(ncbifile.o)(.text+0x945): In function `Nlm_TmpNam': : warning: the use of `tempnam' is dangerous, better use `mkstemp' writeseq.o(.text+0x284): In function `open_output_file': : warning: the `gets' function is dangerous and should not be used. slaves.o(.text+0x5f0): In function `main_processes_slaves': : undefined reference to `seek_si_sj_to_calculate' slaves.o(.text+0x687): In function `main_processes_slaves': : undefined reference to `subpairalign' collect2: ld returned 1 exit status make: *** [clustalx] Error 1 Thanks Deepan From wiiat at kis-lab.com Sun Jul 1 23:40:16 2007 From: wiiat at kis-lab.com (WI-IAT07) Date: Mon, 2 Jul 2007 12:40:16 +0900 Subject: [BiO BB] CFP: Workshop on Biomedicine Applications of Web technologies (BMWT 2007) <10 July> Message-ID: <20070702034705.8FDEE368807@primary.bioinformatics.org> ================================================================================ Call for Papers International Workshop on BioMedicine applications of Web Technologies in conjunction with WI/IAT 2007 Silicon Valley, CA, USA, November 2-5, 2007 http://chunnan.iis.sinica.edu.tw/BMWT2007.html ======================================================================== ========================= Topics of the workshop ========================= BMWT 2007 will be part of events of the WI-IAT conference and will be arranged for a date during November 2-5. This workshop will cover interesting topics of applications of intelligent Web technologies in bio-medicine, including Web Ontology, Semantic Web, Web Usage Mining, Web Search and Intelligent Agents. Original contributions will be solicited in the following subjects (but not necessarily limited to): * Web information extraction and wrapper generation * Applications of Web taxonomies and ontologies in bio-medicine * Web-based service-oriented architecture in bio-medicine * Integration and maintenance of bio-medicine taxonomies and ontologies * Web content and structure mining , Web Information retrieval and filtering * Web-based data collection, curation and analysis * Text mining for metadata creation * Multimedia contents in bio-medicine on the Web * Web search engines, meta-search engines and inference engine * Semantic Web * Intelligent Web agents * Knowledge community formation and support This workshop intends to bring together researchers and practitioners to foster the exchange of ideas and the dissemination of emerging techniques on intelligent Web technology in bio- medicine applications. The workshop will capture current important developments of new models, new methodologies and new tools for building a variety of embodiments of scalable, effective and intelligent Web-based information systems for the ever-increasing needs of bio-medicine applications. ================== Type of workshop ================== The workshop will run as a half-day workshop for approximately 5 hours with 1~2 keynote speeches and 8~12 paper presentations. ================== Important dates ================== July 10, 2007: Due date for full workshop papers submission August 1, 2007: Final acceptance by Workshop Co-Chairs August 2, 2007: Notification of paper acceptance to authors August 17, 2007: Camera-ready of accepted papers November 2-5, 2007: Workshops ========================== Paper submission guideline ========================== (1) All submitted papers will be reviewed on the basis of technical quality, relevance, significance, and clarity by at least two reviews for each paper. (2) We will use WI-IAT 2007 Cyberchair system for on-line paper submission and review process. Details to be announced. (3) The length of accepted papers should NOT exceed 4 pages (IEEE-CS format, extra payment is only available for one more extra page). (4) We will not have a separate workshop registration fee this year. (i.e., only one conference registration covers everything). ================= Program committee ================= (Tentative) Howard CT Ho, IBM Almaden, USA Chun-Nan Hsu, Academia Sinica, Taiwan Chung-Yen Lin, Academia Sinica, Taiwan Wen-Hsiang Lu, NCKU, Taiwan Louiqa Raschid, U of Maryland, USA Shin-Mu Tseng, NCKU, Taiwan Samson Tu, Stanford, USA Qiang Yang, HKUST, Hong Kong, China Ueng-Cheng Yang, NYMU, Taiwan ==================== Organizing committee ==================== Chun-Nan Hsu (Co-Chair) Institute of Information Science Academia Sinica, Taipei, Taiwan chunnan at iis.sinica.edu.tw Vincent Shin-Mu Tseng (Co-Chair) Department of Computer Science and Information Engineering National Cheng Kung University, Tainan, Taiwan tsengsm at mail.ncku.edu.tw Wen-Hsiang Lu (Co-Chair) Department of Computer Science and Information Engineering National Cheng Kung University, Tainan, Taiwan whlu at mail.ncku.edu.tw From wongls at comp.nus.edu.sg Wed Jul 11 10:54:23 2007 From: wongls at comp.nus.edu.sg (Limsoon Wong) Date: Wed, 11 Jul 2007 22:54:23 +0800 Subject: [BiO BB] GIW2007 --- Call for posters Message-ID: <001201c7c3cb$5f3f2660$1e1015ac@comp.nus.edu.sg> CALL FOR POSTERS The 18th International Conference on Genome Informatics (GIW 2007) Biopolis, Singapore. December 3-5 2007. http://www.comp.nus.edu.sg/~giw2007 The 18th International Conference on Genome Informatics (GIW 2007) will be held at the Biopolis in Singapore on December 3-5, 2007. The GIW is the longest running international bioinformatics conference, which has provided unique opportunities that bridge theory and experiments, academia and industry, and East and West. SUBMISSIONS: Poster or software demonstration abstracts are limited to 2 pages, including title, figures, tables, text, and bibliography. Please see below for Abstract Templates. All abstract should be submitted at the site http://www.easychair.org/GIWPoster2007. Accepted posters and software demonstration abstracts will be compiled into "The 18th International Conference on Genomic Informatics, Posters and Software Demonstrations". Additionally, a number of notable abstracts will be selected for oral presentations. IMPORTANT DATES: Poster submission deadline: 16 September 2007 Poster decision: 14 October 2007 Poster templates: http://www.comp.nus.edu.sg/~giw2007/poster.html From orbitz at ezabel.com Wed Jul 11 21:46:33 2007 From: orbitz at ezabel.com (orbitz at ezabel.com) Date: Wed, 11 Jul 2007 21:46:33 -0400 Subject: [BiO BB] Masters Program Advice Message-ID: <83FAF617-8E05-413A-B729-1846BFE7A307@ezabel.com> Hello, I am looking for some advice on bioinformatics masters programs. My background is mostly as a computer scientists. I currently work as a programmer. I completed an undergraduate bioinformatics program and have been out of school for about a year. I would like to get a masters. I have started by looking for masters programs, in the US. I have found three schools so far that look interesting. I'm not sure how one chooses a grad school, I have been told that I should decide on what aspect of bioinformatics I would like to research and find a school with someone who specializes in that, although I honestly don't know enough about an bioinformaticians or the schools at the moment. I know I would like to work with infectious diseases currently but I'm not sure about beyond that. I am not sure if any of the schools I have been currently looking at are good for this, or if I can get into them currently. The undergraduate program I was part of also did not have all of the classes which are pre-reqs in some of the programs I have looked at. For instance my program did not involve Organic Chemistry and at least one of the programs requires two semester of it as a pre-req. I am not sure how this situation is dealt with. The three schools I have looked at currently are Johns Hopkins, Stanford, and University Of Wisconsin. Any advice would be appreciated. Thanks, M From phoebe at deakin.edu.au Fri Jul 13 06:49:15 2007 From: phoebe at deakin.edu.au (Phoebe Chen) Date: Fri, 13 Jul 2007 20:49:15 +1000 Subject: [BiO BB] APBC2008 Deadline Approaching Message-ID: <20070713204915.oyzifcv28w8ssowc@mail.deakin.edu.au> APBC2008 Deadline Approaching - Full Paper Submission on 20 July (7 more days) CALL FOR PAPERS - APBC 2008 The Sixth Asia-Pacific Bioinformatics Conference, APBC2008, will be held in Kyoto, Japan, during 14-17 January 2008. See http://bic.kyoto-u.ac.jp/apbc2008/index.html The Asia-Pacific Bioinformatics Conference series is an annual forum for exploring research, development, and novel applications bioinformatics. IMPORTANT DATES Submission of papers 20 July 2007 *** Notification of paper acceptance 17 September 2007 Submission of posters 30 September 2007 Camera-ready copy & Author registration 30 September 2007 Notification of poster acceptance 20 October 2007 Conference 14-17 January 2008 From swhwang10 at yahoo.com Sun Jul 15 00:42:22 2007 From: swhwang10 at yahoo.com (Seungwoo Hwang) Date: Sat, 14 Jul 2007 21:42:22 -0700 (PDT) Subject: [BiO BB] [Announcement]The 2nd International BioWiki Contest Message-ID: <57012.18896.qm@web43138.mail.sp1.yahoo.com> The 2nd International BioWiki Contest [Objective] Wiki is a web technology designed to allow multiple authors to freely add and edit website contents. Wiki is thus well suited for developing collaborative online knowledge bases, whose best known example is Wikipedia (http://wikipedia.org). The objective of this contest is to adopt the wiki paradigm towards collaborative development of biological knowledge bases that anyone can contribute. In this contest, each participant will develop a wiki-based web site that will serve as a useful knowledge source for a biological subject of his/her choice. Web sites created from previous contests are shown at http://biowiki.net/index.php/Biowiki_Site_List as examples. [Schedule] - Registration: 2007/07/21 - 2007/08/20 - Final Result Due: 2007/08/26 - Award Notification: After 2007/08/31 [How to register] Obtain the registration form from the contest website and send the completed form to swhwang at kribb.re.kr [Venue] This is an online contest. All contest activities (registration, submission, and award notification) take place in the internet. In addition to the online processes, a presentation and an announcement of the event will also be given during the 3rd ISCB Student Council Symposium at Vienna, Austria on 2007/07/21. Evaluation will also be done during the 6th International Conference on Bioinformatics (INCOB 2007) at Hong Kong on 2007/08/31. [Awards] The following prizes will be awarded: - 1st place (1 team): $500 USD - 2nd place (1 team): $300 USD - 3rd place (3 teams): $200 USD per team Prizes will be paid through wire transfer. [Evaluation] Each web site will be evaluated on the following criteria: - Goal-orientedness: It should serve as a useful biological knowledge base. - Contents: It should contain high quality contents. - Expandability: It should be well organized so that other users can easily figure out how to contribute. The evaluation will be done on-site at the 6th International Conference on Bioinformatics (INCOB 2007) by a panel of experts from eight countries in the Asia-Pacific region. The evaluation result will be announced on-line. [Sponsor] This contest is organized by APBioNet (Asia Pacific Bioinformatics Network) and KOBIC (Korean Bioinformation Center, http://www.kobic.re.kr). KOBIC provides funds and computing infrastructure for this contest. [Copyright] - Copyrighted contents should not be used without the permission of the owners. Once put in the contest, all the contents will lose copyrights. Participants should not copy and paste copyrighted text contents from other web sites without the removal of copyrights. - All the images and other non-text materials are under the same principle as text. - All the contents created and uploaded by participants will be openfreely shared under BioLicense (http://biolicense.org/). [Contact] Jong Bhak, Ph.D. jong at kribb.re.kr Seungwoo Hwang, Ph.D. swhwang at kribb.re.kr [Contest URL] http://biowiki.net/index.php/Current_Contests ____________________________________________________________________________________ Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz From me.lixue at gmail.com Tue Jul 17 14:33:59 2007 From: me.lixue at gmail.com (Xue Li) Date: Tue, 17 Jul 2007 13:33:59 -0500 Subject: [BiO BB] hat kinds of data mining techniques have been using in drug discovery and drug delivery Message-ID: <62ed16460707171133j7d2c79evcf8bb0b366d01776@mail.gmail.com> Hello all, I was wondering what kinds of data mining techniques have been using in drug discovery and drug delivery? It would be much appreciated if you could offer me some resources to find it out. Millions of thanks. As far as I know, classification techniques are used in protein-protein interface prediction, and RNA- , DNA- interface prediction. Are optimization techniques used? How about regression techniques? -- Li Xue Bioinformatics and Computational Biology program @ ISU 515-520-1676 Ames, IA 50010 From marchywka at hotmail.com Tue Jul 17 19:08:03 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 17 Jul 2007 19:08:03 -0400 Subject: [BiO BB] hat kinds of data mining techniques have been using indrug discovery and drug delivery In-Reply-To: <62ed16460707171133j7d2c79evcf8bb0b366d01776@mail.gmail.com> Message-ID: I don't really have anything insightful to say directly regarding your question but I will point out that there are good case studies on pubmed- protease and polymerase targeting, or if you prefer kinases, should keep you busy for a while :) As far as delivery, lots of nano stuff, not sure on tools. I did want to mention that I have been amazed at the ( apparent) lack of some simple tools however. While it is quite possible I missed some, I have had to write a lot of scripts and c++ code to augment the Affymetrix annotations. Much of this is just the novel idea of using a computer to automate data processing rather than require a user to appreciate someone's nice web page for every protein he wants to investigate. However, then there are things like string correlators that execute in reasonable time, programmable ribosomes, format converters etc. I suppose I could have looked more carefully at conserved domain servers or bioperl packages to address various parts of the problem but so far I think I've done better with my own approach. I'm not sure I have exploited all the online tools- I only really use blast and eutils to download batches of proteins or nucleotides - but I do know that perl, at least under cygwin, was simply too slow to do anything of any size. I ended up writing my own c++ string manipulation stuff ( even here under cygwin the STL string classes were just too slow and I created objects that manipulate c-style strings ). Even using grep+sed to convert to fasta files was quite slow ( although I think this turned out to be mostly a problem using cygwin to pipe results - the console buffering seems to be the problem). I don't want to sound negative on cygwin- you just can't do this with BAT files and it is hard to get reasonable performance on top of windoze- but I'm not sure if that is creating some of the limitations. Don't know if any of that helps but I am curious if anyone has similar or contrasting observations. Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com >From: "Xue Li" >Reply-To: "General Forum at Bioinformatics.Org" > >To: bio_bulletin_board at bioinformatics.org >Subject: [BiO BB] hat kinds of data mining techniques have been using >indrug discovery and drug delivery >Date: Tue, 17 Jul 2007 13:33:59 -0500 > >Hello all, > >I was wondering what kinds of data mining techniques have been using in >drug >discovery and drug delivery? It would be much appreciated if you could >offer >me some resources to find it out. Millions of thanks. > >As far as I know, classification techniques are used in protein-protein >interface prediction, and RNA- , DNA- interface prediction. >Are optimization techniques used? How about regression techniques? > >-- >Li Xue >Bioinformatics and Computational Biology program @ ISU >515-520-1676 >Ames, IA 50010 >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07 From bader at cbio.mskcc.org Wed Jul 18 09:36:03 2007 From: bader at cbio.mskcc.org (Gary Bader) Date: Wed, 18 Jul 2007 09:36:03 -0400 Subject: [BiO BB] First announcement: 5th Annual Cytoscape Public Symposium and Developers Retreat] Message-ID: <469E1743.8060505@cbio.mskcc.org> *_5th Annual Cytoscape Public Symposium and Developers Retreat _* Amsterdam - Netherlands *November 8, 2007: Public Symposium:* */Integrative Bioinformatics: At the cutting edge of network analysis and biological data integration/* *Leroy Hood* -Institute of Systems Biology *Ewan Birney* - European Bioinformatics Institute *Peter Sorger* - Harvard University *Trey Ideker* - University of California, San Diego *Chris Sander* - Memorial-Sloan-Kettering Cancer Center *Benno Schwikowski* - Institute Pasteur *Andrew Hopkins*--Pfizer Global Research & Development* * *November 6,7 and 9, 2007: Developers retreat * *Department of Human Genetics, **Academic** **Medical** **Center* *University of Amsterdam**, **Netherlands* We are pleased to announce the forthcoming 2007 Cytoscape Symposium and Retreat to be held at the Academic Medical Center, University of Amsterdam , Netherlands on 6-9 Nov 2007. More information, updates and registration will be available at: http://www.cytoscape.org/retreat2007 This years meeting is particularly exciting since it is the first time it will be held in Europe, specifically in the vibrant historic city of Amsterdam. 'Floating' on a web of canals, with its unique combination of old-world charm and cosmopolitan culture, Amsterdam is one of the most popular European cities for international visitors. To reach Cytoscape's large European user base a Public Symposium will be held on November 8th for which there is a formidable list of confirmed speakers. Apart from the Symposium the retreat consists of hands-on demo's, user-training sessions and informal, technically focused developer meetings: * *Tues 6th Nov:* Cytoscape Developer's Discussions * *Wed 7th Nov:* *Application showcase, hands-on sessions, tutorials * * *Thur 8th Nov:* */Public Symposium/* * *Fri 9th Nov:* Development of Cytoscape Roadmap for 2007, 2008 The Symposium on the 8th is of general interest to both biologists and informaticians. Current and future users of Cytoscape are invited to visit the Application showcase on the 7th also. The developers days are targeted at the core developers of Cytoscape and plugins. We hope that you can join us! The Organizing Committee, 5^th Cytoscape Retreat 2007 Annette Adler - Agilent Technologies Piet Molenaar - Human Genetics Department AMC Guy Warner - Unilever /Contact/: cytoretreat at cytoscape.org / / /The retreat is supported by/: Unilever (www.unilever.com ) Netherlands Bioinformatics Center (NBIC) ( www.nbic.nl ) Agilent ( www.agilent.com ) /About Cytoscape:/ Cytoscape (www.cytoscape.org ) is an open source bioinformatics software platform for */visualizing/* molecular interaction networks and */integrating /*these interactions with gene expression profiles and other state data. The software architecture enables adaptation of Cytoscape functionality to the specific needs of biologists and bioinformaticians. It is jointly developed by the groups of Benno Schwikowski (Pasteur Institute, Paris), Trey Ideker (University of California San Diego), Chris Sander (Memorial Sloan-Kettering Cancer Center), Lee Hood (Institute of Systems Biology, Seattle), Annette Adler (Agilent Technologies, Santa Clara, CA) and Bruce Conklin (Gladstone/UCSF, GenMAPP). PS: Excuse us for cross-posting; we're trying to avoid this as much as possible. However, to reach a large audience we decided to include several relevant maillists. From me.lixue at gmail.com Wed Jul 18 13:52:29 2007 From: me.lixue at gmail.com (Xue Li) Date: Wed, 18 Jul 2007 12:52:29 -0500 Subject: [BiO BB] hat kinds of data mining techniques have been using indrug discovery and drug delivery In-Reply-To: References: <62ed16460707171133j7d2c79evcf8bb0b366d01776@mail.gmail.com> Message-ID: <62ed16460707181052h32656144j635dabbde8269b64@mail.gmail.com> Thank you, Mike, for your such long help :P I am just a beginner in Data mining and bioinformatics, so I am sure about the problems with cygwin and perl. I know that there is Bio++ where some C++ library for bioinformatics are be found. http://162.38.181.25/BioPP/ Hope it will help. Li On 7/17/07, Mike Marchywka wrote: > > I don't really have anything insightful to say directly regarding your > question but I will point > out that there are good case studies on pubmed- protease and polymerase > targeting, or > if you prefer kinases, should keep you busy for a while :) As far as > delivery, > lots of nano stuff, not sure on tools. > > I did want to mention that I have been amazed at the ( apparent) lack of > some simple > tools however. While it is quite possible I missed some, I have had to > write > a lot of scripts and c++ code to augment the Affymetrix annotations. Much > of > this is just the novel idea of using a computer to automate data > processing > rather > than require a user to appreciate someone's nice web page for every > protein > he wants to investigate. However, then there are things like string > correlators > that execute in reasonable time, programmable ribosomes, format converters > etc. > > I suppose I could have looked more carefully at conserved domain servers > or > bioperl packages to address various parts of the problem but so far I > think > I've done better with my own approach. I'm not sure I have exploited all > the online tools- I only really use blast and eutils to download batches > of proteins or nucleotides - but I do know that perl, at least under > cygwin, > was simply too slow to do anything of any size. I ended up writing my own > c++ string manipulation stuff ( even here under cygwin the STL string > classes were just too slow and I created objects that manipulate c-style > strings ). Even using grep+sed to convert to fasta files was quite slow > ( although I think this turned out to be mostly a problem using > cygwin to pipe results - the console buffering seems to be the problem). > I don't want to sound negative on cygwin- you just can't do this with > BAT files and it is hard to get reasonable performance on top of windoze- > but I'm not sure if that is creating some of the limitations. > > Don't know if any of that helps but I am curious if anyone has similar or > contrasting observations. > > Thanks. > > > > Mike Marchywka > 586 Saint James Walk > Marietta GA 30067-7165 > 404-788-1216 (C)<- leave message > 989-348-4796 (P)<- emergency only > marchywka at hotmail.com > > > > > > >From: "Xue Li" > >Reply-To: "General Forum at Bioinformatics.Org" > > > >To: bio_bulletin_board at bioinformatics.org > >Subject: [BiO BB] hat kinds of data mining techniques have been using > >indrug discovery and drug delivery > >Date: Tue, 17 Jul 2007 13:33:59 -0500 > > > >Hello all, > > > >I was wondering what kinds of data mining techniques have been using in > >drug > >discovery and drug delivery? It would be much appreciated if you could > >offer > >me some resources to find it out. Millions of thanks. > > > >As far as I know, classification techniques are used in protein-protein > >interface prediction, and RNA- , DNA- interface prediction. > >Are optimization techniques used? How about regression techniques? > > > >-- > >Li Xue > >Bioinformatics and Computational Biology program @ ISU > >515-520-1676 > >Ames, IA 50010 > >_______________________________________________ > >General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _________________________________________________________________ > http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07 > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Li Xue Bioinformatics and Computational Biology program @ ISU 515-520-1676 Ames, IA 50010 From veredcc at bgu.ac.il Thu Jul 19 15:52:57 2007 From: veredcc at bgu.ac.il (Vered Caspi) Date: Thu, 19 Jul 2007 19:52:57 GMT Subject: [BiO BB] Genbank file conversion to GCG format Message-ID: Hello, I am looking for a free software (unix) for mass conversion of GenBank files to GCG format. If any one has experience with that, I will be happy to learn. Vered === Vered Caspi, Ph.D. Bioinformatics Support Unit, Head National Institute for Biotechnology in the Negev, Building 39, room 214 Ben-Gurion University of the Negev Beer-Sheva 84105, Israel Email: veredcc at bgu.ac.il Tel: 08-6479034 054-7915969? From pmr at ebi.ac.uk Fri Jul 20 03:48:18 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 20 Jul 2007 08:48:18 +0100 Subject: [BiO BB] Genbank file conversion to GCG format In-Reply-To: References: Message-ID: <46A068C2.9010306@ebi.ac.uk> Vered Caspi wrote: > Hello, > I am looking for a free software (unix) for mass conversion of GenBank files to GCG format. > If any one has experience with that, I will be happy to learn. Assuming you mean GCG sequence files (not a GCG sequence database): EMBOSS can convert many sequence formats, including Genbank and GCG, using the program "seqret". Just one warning ... you need the -ossingle option on the command line to write each sequence to a separate file (or process one Genbank sequence at a time if they are already in separate files). EMBOSS can read GCG files with more than one sequence but other applications may assume only one sequence per file. regards, Peter Rice From Sterten at aol.com Fri Jul 20 04:19:55 2007 From: Sterten at aol.com (Sterten at aol.com) Date: Fri, 20 Jul 2007 04:19:55 EDT Subject: [BiO BB] Genbank file conversion to GCG format Message-ID: was ist GCG ? From pmr at ebi.ac.uk Fri Jul 20 04:44:27 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 20 Jul 2007 09:44:27 +0100 Subject: [BiO BB] Genbank file conversion to GCG format In-Reply-To: References: Message-ID: <46A075EB.9040206@ebi.ac.uk> Sterten at aol.com wrote: > was ist GCG ? "The Wisconsin Package" from Accelrys - originally academic software from the University of Wisconsin Genetics Computer Group, but a commercial package for many years. Those of us who remember the old days still call it GCG ... partly because the package has changed name so many times. Its home page is now http://www.accelrys.com/products/gcg/ (so they do still use the old name too :-) They invented their own file formats with a "checksum" line marked by ".." at the end. Documentation is above this line, and only sequence below. regards, Peter Rice From Sterten at aol.com Fri Jul 20 04:53:26 2007 From: Sterten at aol.com (Sterten at aol.com) Date: Fri, 20 Jul 2007 04:53:26 EDT Subject: [BiO BB] Genbank file conversion to GCG format Message-ID: no checksum needed these days, data storage is reliable. Should be easy to write a short program to convert the formats... From skhadar at gmail.com Fri Jul 20 08:40:35 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Fri, 20 Jul 2007 18:10:35 +0530 Subject: [BiO BB] How to rank active site(s) ? Message-ID: Dear All, Is there any computational tool/protocol available to rank active site(s) present in a protein. I am looking in to a couple of proteins which have multiple site (say active site / binding sites ), I am looking for a method to rank binding/Active sites. I have done with an extended google search. I couldnt find anything as such. I would appreciate if any one can point me to any similar work / paper / tool etc . Many thanks in advance, Shameer Khadar NCBS-TIFR From christoph.gille at charite.de Fri Jul 20 09:02:18 2007 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 20 Jul 2007 15:02:18 +0200 (CEST) Subject: [BiO BB] How to rank active site(s) ? In-Reply-To: References: Message-ID: <59689.141.42.56.114.1184936538.squirrel@webmail.charite.de> The active site atlas of the thornton group may be useful. The collected data from publications and provide links to the publications. Christoph From ykalidas at gmail.com Fri Jul 20 09:28:12 2007 From: ykalidas at gmail.com (Kalidas Yeturu) Date: Fri, 20 Jul 2007 18:58:12 +0530 Subject: [BiO BB] How to rank active site(s) ? In-Reply-To: References: Message-ID: <5632703b0707200628q17d427d9lec99dcf00f0b0b1f@mail.gmail.com> Hi Shameer, There are many ways of ranking binding sites.. (1) Size (2) Cumulative electro static potential of probe positions in the site (3) Occurance of most possible binding site residues etc., Please refer to :CASTp, Q-SiteFinder, LigsiteCSC algorithms.. With Regards Kalidas. Y On 7/20/07, Shameer Khadar wrote: > > Dear All, > > Is there any computational tool/protocol available to rank active site(s) > present in a protein. I am looking in to a couple of proteins which have > multiple site (say active site / binding sites ), I am looking for a > method > to rank binding/Active sites. I have done with an extended google search. > I > couldnt find anything as such. I would appreciate if any one can point me > to > any similar work / paper / tool etc . > > Many thanks in advance, > Shameer Khadar > NCBS-TIFR > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Kalidas Y http://ssl.serc.iisc.ernet.in/~kalidas From operon at cbiot.ufrgs.br Thu Jul 19 17:45:14 2007 From: operon at cbiot.ufrgs.br (Marcos de Carvalho) Date: Thu, 19 Jul 2007 18:45:14 -0300 Subject: [BiO BB] Genbank file conversion to GCG format In-Reply-To: References: Message-ID: Hi Vered, Maybe you would like to try Readseq: http://www.ebi.ac.uk/cgi-bin/readseq.cgi (web based) http://bioinformatics.org/~thomas/mol_lin/readseq/ (linux elf binary) http://iubio.bio.indiana.edu/soft/molbio/readseq/java/ (java version) regards Marcos On Thu, 19 Jul 2007 16:52:57 -0300, Vered Caspi wrote: > Hello, > I am looking for a free software (unix) for mass conversion of GenBank > files to GCG format. > If any one has experience with that, I will be happy to learn. > Vered > > === > Vered Caspi, Ph.D. > Bioinformatics Support Unit, Head > National Institute for Biotechnology in the Negev, > Building 39, room 214 > Ben-Gurion University of the Negev > Beer-Sheva 84105, Israel > Email: veredcc at bgu.ac.il > Tel: 08-6479034 054-7915969? > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From orbitz at ezabel.com Fri Jul 20 08:34:07 2007 From: orbitz at ezabel.com (orbitz at ezabel.com) Date: Fri, 20 Jul 2007 08:34:07 -0400 Subject: [BiO BB] Genbank file conversion to GCG format In-Reply-To: References: Message-ID: Famous last words! I don't know about in bioinformatics, but in other areas checksums are needed these days, and very important, but it's outside the scope of the data you deal with, external. On Jul 20, 2007, at 4:53 AM, Sterten at aol.com wrote: > > no checksum needed these days, data storage is reliable. > > Should be easy to write a short program to convert the formats... > > > > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From marchywka at hotmail.com Fri Jul 20 11:29:20 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Fri, 20 Jul 2007 11:29:20 -0400 Subject: [BiO BB] Genbank file conversion to GCG format In-Reply-To: Message-ID: No matter how simple the code it is important to determine its working right. It isn't the computer you are checking, its the human :) Dumb things like implementation specific string size limits, buffer flushing,etc can drop random stuff that would take forever to find. >From: orbitz at ezabel.com >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: Re: [BiO BB] Genbank file conversion to GCG format >Date: Fri, 20 Jul 2007 08:34:07 -0400 > >Famous last words! > >I don't know about in bioinformatics, but in other areas checksums are >needed these days, and very important, but it's outside the scope of the >data you deal with, external. > > >On Jul 20, 2007, at 4:53 AM, Sterten at aol.com wrote: > >> >>no checksum needed these days, data storage is reliable. >> >>Should be easy to write a short program to convert the formats... >> >> >> >> >>_______________________________________________ >>General Forum at Bioinformatics.Org - >>BiO_Bulletin_Board at bioinformatics.org >>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ Local listings, incredible imagery, and driving directions - all in one place! http://maps.live.com/?wip=69&FORM=MGAC01 From pmr at ebi.ac.uk Fri Jul 20 12:22:32 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 20 Jul 2007 17:22:32 +0100 Subject: [BiO BB] Genbank file conversion to GCG format In-Reply-To: References: Message-ID: <46A0E148.9090709@ebi.ac.uk> > On Jul 20, 2007, at 4:53 AM, Sterten at aol.com wrote: > > no checksum needed these days, data storage is reliable. Hah! It isn't reliable in this case. GCG added the checksum to catch users who edited their sequence files and deliberately (or accidentally while editing the annotation in the header) changed the sequence data :-) If you edit a GCG file, you run the reformat program to calculate a new checksum line. > Should be easy to write a short program to convert the formats... Generating the checksum is ... ummm .... interesting. It helps if you have a friend with access to the GCG "reformat" program who can tell you if you got it right. Some years ago there was a thread in one of the bionet newsgroups (ah, those were the days) when it took 4 attempts before someone could reliably agree with GCG's calculation (upper and lower case, numeric characters, spaces, and other IUPAC "standard" sequence characters such as "=" all have to be considered. regards, Peter From pmr at ebi.ac.uk Fri Jul 20 12:30:37 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 20 Jul 2007 17:30:37 +0100 Subject: [BiO BB] Genbank file conversion to GCG format In-Reply-To: References: Message-ID: <46A0E32D.2020201@ebi.ac.uk> Mike Marchywka wrote: > No matter how simple the code it is important to determine its working > right. > It isn't the computer you are checking, its the human :) > Dumb things like implementation specific string size limits, buffer > flushing,etc > can drop random stuff that would take forever to find. For those who have never seen one, here is a GCG format sequence as written by EMBOSS. It has enough essential details to be used by GCG. !!AA_SEQUENCE 1.0 Example GCG sequence This is all text up to the first line with two dots so we cannot have a normal EMBL/GenBank feature table here unless we first convert '.' to '_' Checksum line coming up: gcgseq Length: 22 Type: P Check: 731 .. 1 acdefghikl mnpqrstvwx yz From phoebe at deakin.edu.au Sat Jul 21 02:11:51 2007 From: phoebe at deakin.edu.au (Phoebe Chen) Date: Sat, 21 Jul 2007 16:11:51 +1000 Subject: [BiO BB] APBC2008 (Extension of Paper Submission Deadline) Message-ID: <20070721161151.v7feugym8wsgcwgg@mail.deakin.edu.au> Due to many requests, the submission deadline for APBC2008 has been extended to 30 July 2007, 23:59 EST. Sincerely, Conference Chair Tatsuya Akutsu Program committee co-chairs Alvis Brazma and Satoru Miyano *********************************************************************** CALL FOR PAPERS - APBC 2008 The Sixth Asia-Pacific Bioinformatics Conference, APBC2008, will be held in Kyoto, Japan, during 14-17 January 2008. See http://bic.kyoto-u.ac.jp/apbc2008/index.html *********************************************************************** The Asia-Pacific Bioinformatics Conference series is an annual forum for exploring research, development, and novel applications bioinformatics. The aim of this conference is to bring together researchers, professionals, and industrial practitioners for interaction and exchange of knowledge and ideas. We invite submissions that address conceptual and practical issues of bioinformatics. Topics of Interest Papers are solicited on, but not limited to, the following topics: - Sequence Analysis - Motif Finding - Recognition of Genes - RNA Analysis - Physical and Genetic Mapping - Evolution and Phylogeny - Protein Structure Analysis - Microarray Design - Transcriptome, Gene Expression - Proteomics - Pathways, Networks and Systems - Ontologies - Databases and Data Integration - Text Mining - Population Genetics/SNP/Haplotyping - Comparative Genomics, Genome Rearrangements - Applications IMPORTANT DATES Submission of papers July 30, 2007 (23:59 Eastern Standard Time (GMT-4)) (** new deadline **) Notification of paper acceptance 17 September 2007 Submission of posters 30 September 2007 Camera-ready copy & Author registration 30 September 2007 Notification of poster acceptance 20 October 2007 Conference 14-17 January 2008 From a.gopee at utm.intnet.mu Mon Jul 23 06:46:41 2007 From: a.gopee at utm.intnet.mu (Ajit) Date: Mon, 23 Jul 2007 14:46:41 +0400 Subject: [BiO BB] Protein Datatypes for function prediction Message-ID: <003a01c7cd16$c0fbf1e0$083c10ac@Ajit> Subject: Protein function prediction based using different datatypes Hello Can anyone tell me what are the currently available data types for protein function prediction and their associated tools/web site links? I already have a small list but I want to make sure I cover all of them...aznd I want to get some preliminaries on each of these... (1) Amino acid sequences (2) Protein structure (3) Genome sequences (4) Phylogenetic data (5) Microarray expression data (6) Protein interaction networks and protein complexes (7) Biomedical literature (8) Gene Ontology Please do send some details on actually on how each of the above can be used in protein function prediction... Thanks a lot Rgds Ajit From idoerg at gmail.com Mon Jul 23 09:06:44 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 23 Jul 2007 15:06:44 +0200 Subject: [BiO BB] Protein Datatypes for function prediction In-Reply-To: <003a01c7cd16$c0fbf1e0$083c10ac@Ajit> References: <003a01c7cd16$c0fbf1e0$083c10ac@Ajit> Message-ID: Shameless plug: read my review: *Friedberg I.* *Automated Function Prediction: the Genomic Challenge* *Briefings in Bioinformatics* (2006) *7*(3):225-242 There is a "tunnel through" link on my page to BiB: http://iddo-friedberg.org/papers.html The second link after the citation takes you to the paper, in case your institute does not subscribe. best, Iddo On 7/23/07, Ajit wrote: > > > > Subject: Protein function prediction based using different datatypes > > > Hello > Can anyone tell me what are the currently available data types for protein > function prediction and their associated tools/web site links? > > I already have a small list but I want to make sure I cover all of > them...aznd I want to get some preliminaries on each of these... > > (1) Amino acid sequences > (2) Protein structure > > (3) Genome sequences > > (4) Phylogenetic data > > (5) Microarray expression data > > (6) Protein interaction networks and protein complexes > > (7) Biomedical literature > > (8) Gene Ontology > > > > Please do send some details on actually on how each of the above can be > used in protein function prediction... > > > > Thanks a lot > > > > Rgds > > > > Ajit > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From marchywka at hotmail.com Tue Jul 24 07:50:50 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 24 Jul 2007 07:50:50 -0400 Subject: [BiO BB] Protein Datatypes for function prediction In-Reply-To: Message-ID: Also, while I was going to suggest "ab initio QM" as computing power is increasing rapidly, don't dismiss even simple text processing. That is, I was trying to find some simple ways to screen some AFFX probe sets I thought I would try this: If you wanted to ask, "what is a tyrosine phosphatase?" First, you could go download all the sequences that mention the term: ( a few thousand, enough to cluster perhaps) 518 eutilsnew -protein -out ptp -v "protein tyrosine phosphatase" Extract fasta files, 519 /cygdrive/c/mydocs/scripts/cc/affx/file_parsing -fastas ptp ptp_fastas Use your favorite word finder, 524 /cygdrive/c/mydocs/scripts/cc/affx/string_correlator -motif ptp_fastas 10 >some_ptp_words And sort them for the most common 532 cat some_ptp_words | awk '{print $3}' | sort | uniq -c | sort -g -r | more 2374 VCLGNICRSP 1678 FVCLGNICRSP 1182 LFVCLGNICRSP 975 GNICRSPMAE 904 GNICRSPTAE 698 VLFVCLGNICRSP [...] I hadn't been able to check this until I got my blast script to find the cdd database but it does seem to predict known stuff: blastnew -out ptp_cdd -hits 50 -cdd -expect 1000 VCLGNICRSP $ cat ptp_cdd | more BLASTP 2.2.16 [Mar-25-2007] Score E Sequences producing significant alignments: (bits) Value gnl|CDD|65262 pfam01451, LMWPc, Low molecular weight phosphotyro... 25 1.1 gnl|CDD|30743 COG0394, Wzb, Protein-tyrosine-phosphatase [Signal... 25 1.1 gnl|CDD|29014 cd00115, LMWPc, Low molecular weight phosphatase f... 25 1.1 gnl|CDD|47555 smart00226, LMWPc, Low molecular weight phosphatas... 24 1.9 gnl|CDD|68479 pfam04906, Tweety, Tweety. The tweety (tty) gene h... 18 105 gnl|CDD|70330 pfam06856, DUF1251, Protein of unknown function (D... 18 138 gnl|CDD|71435 pfam07999, RHSP, Retrotransposon hot spot protein.... 17 306 gnl|CDD|70701 pfam07245, Phlebovirus_G2, Phlebovirus glycoprotei... 17 306 gnl|CDD|72414 pfam08996, zf-DNA_Pol, DNA Polymerase alpha zinc f... 17 400 It wouldn't be difficult to build up vocabulary lists that distringuish different types of proteins and handle variants the same way you handle plurals, capitalization, etc in language processing. This wasn't my immediate interest but it is something to consider if you need a quick-and-easy approach. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com >From: "Iddo Friedberg" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: Re: [BiO BB] Protein Datatypes for function prediction >Date: Mon, 23 Jul 2007 15:06:44 +0200 > >Shameless plug: read my review: _________________________________________________________________ http://liveearth.msn.com From gtzanis at csd.auth.gr Sat Jul 21 16:38:42 2007 From: gtzanis at csd.auth.gr (George Tzanis) Date: Sat, 21 Jul 2007 23:38:42 +0300 Subject: [BiO BB] RefSeq mRNA sequences of barley Message-ID: <002b01c7cbd7$22e74ee0$4001a8c0@taurus> Dear All, I would like to get all the non-redundant cDNA sequences of barley. For this reason I'm thinking about retrieving all RefSeq mRNAs of barley. I have made a search to the Core Nucleotide database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore) using the following query: "barley[organism] AND biomol_mRNA[properties] AND srcdb_refseq[PROP]" However, not any sequences were returned. I have also used the Entrez Limits settings, but the result was the same. Is there something wrong in my search? Are there any RefSeq mRNAs for barley? Is there another way to get a non-redundant set of cDNA (or mRNA) sequences of barley? I will appreciate any idea. Thank you in advance, George Tzanis Department of Informatics Aristotle University of Thessaloniki 54124 Thessaloniki Greece From ethan.strauss at promega.com Tue Jul 24 14:15:14 2007 From: ethan.strauss at promega.com (Ethan Strauss) Date: Tue, 24 Jul 2007 13:15:14 -0500 Subject: [BiO BB] RefSeq mRNA sequences of barley In-Reply-To: <002b01c7cbd7$22e74ee0$4001a8c0@taurus> References: <002b01c7cbd7$22e74ee0$4001a8c0@taurus> Message-ID: Hi, I think that there are two problems. The first is that there do not seem to be any barley sequences in refseq (with the exception of the chloroplast genome NC_008590), the second may be that you need to use the scientific name for the organism ("Hordeum vulgare"). I have had variable luck using common names as [organism]. I don't have any clue why there are no barley records in RefSeq... Hope this helps a little. Ethan Ethan Strauss Ph.D. Bioinformatics Scientist Promega Corporation 2800 Woods Hollow Rd. Madison, WI 53711 608-274-4330 800-356-9526 ethan.strauss at promega.com -----Original Message----- From: bio_bulletin_board-bounces+ethan.strauss=promega.com at bioinformatics.org [mailto:bio_bulletin_board-bounces+ethan.strauss=promega.com at bioinformat ics.org] On Behalf Of George Tzanis Sent: Saturday, July 21, 2007 3:39 PM To: 'General Forum at Bioinformatics.Org' Subject: [BiO BB] RefSeq mRNA sequences of barley Dear All, I would like to get all the non-redundant cDNA sequences of barley. For this reason I'm thinking about retrieving all RefSeq mRNAs of barley. I have made a search to the Core Nucleotide database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore) using the following query: "barley[organism] AND biomol_mRNA[properties] AND srcdb_refseq[PROP]" However, not any sequences were returned. I have also used the Entrez Limits settings, but the result was the same. Is there something wrong in my search? Are there any RefSeq mRNAs for barley? Is there another way to get a non-redundant set of cDNA (or mRNA) sequences of barley? I will appreciate any idea. Thank you in advance, George Tzanis Department of Informatics Aristotle University of Thessaloniki 54124 Thessaloniki Greece _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From marchywka at hotmail.com Tue Jul 24 14:19:06 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Tue, 24 Jul 2007 14:19:06 -0400 Subject: [BiO BB] RefSeq mRNA sequences of barley Message-ID: What have you got against downloading the whole genome? I didn't see it at ensembl but, as you probably know, ag plant genomics is a big deal economically. The USDA and ARS probably have more stuff like this ( first thing I found on google). http://harvest.ucr.edu/ Offhand, it looks like you have mutually exclusive [prop] criteria as prop and properties are synonyms ( but usually the query dump tells you howmany hits on each term ): http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.table.EntrezHelp.T7 Usually eutils provides parse info that you can sort out when query results don't make sense but I've only used this from pubmed and the response from nucleotide was unhelpful- you can probably get a dump from the web page but I've never used it. $ eutilsnew -v -out asdf -dump "barley AND smith[AU] " 98 1 0 1 09zlDsvf9Apq_WlB1MrLvNbMDVmdMFyE-0fh8eDgZhtQlZKxV5kTGmzUJpJ at 1FBF 437C6A641A70_0045SID 17609926 barley (("hordeum"[TIAB] NOT Medline[SB]) OR &quo t;hordeum"[MeSH Terms] OR barley[Text Word]) "hordeum"[TIAB] TIAB 2045 Y The NCBI web interface is usually quite good but I ran into a similar problem with blast databases. I finally found their list for blasting, http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html and added to my script, $ blastnew -help db | grep -i barley genomes barley barley Before I found this, I complained that their example db names weren't right and I even had to reverse engineer a piece of their html. Try googling confined to their site, site:nih.gov Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com >From: "George Tzanis" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "'General Forum at Bioinformatics.Org'" > >Subject: [BiO BB] RefSeq mRNA sequences of barley >Date: Sat, 21 Jul 2007 23:38:42 +0300 > >Dear All, > >I would like to get all the non-redundant cDNA sequences of barley. >For this reason I'm thinking about retrieving all RefSeq mRNAs of >barley. I have made a search to the Core Nucleotide database >(http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore) using the >following query: > >"barley[organism] AND biomol_mRNA[properties] AND srcdb_refseq[PROP]" > >However, not any sequences were returned. I have also used the Entrez >Limits settings, but the result was the same. > >Is there something wrong in my search? >Are there any RefSeq mRNAs for barley? >Is there another way to get a non-redundant set of cDNA (or mRNA) >sequences of barley? I will appreciate any idea. > >Thank you in advance, > >George Tzanis >Department of Informatics >Aristotle University of Thessaloniki >54124 Thessaloniki >Greece >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ http://newlivehotmail.com From marchywka at hotmail.com Wed Jul 25 15:58:36 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Wed, 25 Jul 2007 15:58:36 -0400 Subject: [BiO BB] question on RNA and species signatures Message-ID: I've been generally trying to find a comprehensive way to analyze non-coding RNA with no luck. I've tried asking people in such areas as siRNA, riboswitch etc with out much success. Any comments or discussion? This came up most recently because I found a short sequence with unusual species distribution and I was curious to know if this thing has a name. If I just type in some random junk, I get about what you could expect: ( this is my own blast script with most terms being self explanatory, "-summ" translates into "-v" to limit summary lines, -db selects the wgs database ) 567 blastnew -out control -nuc -hits 0 -summ 3000 -db wgs -expect 1e8 TCCTGGAGTCCCAGAGTTCAGCTAAACCGATCACATTGTAT $ more control| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk '{print $1" " $2}' | sort | uniq -c | sort -g -r | more 304 Homo sapiens 261 Bos taurus 212 Pan troglodytes 171 Microcebus murinus 155 Equus caballus 136 Spermophilus tridecemlineatus 130 Canis familiaris 125 Otolemur garnettii 112 Ornithorhynchus anatinus 111 Tupaia belangeri 96 Myotis lucifugus 93 Mus musculus 86 Felis catus 75 Rattus norvegicus 74 Oryzias latipes 71 Drosophila erecta 68 Sorex araneus 63 Loxodonta africana 58 Anolis carolinensis 56 Monodelphis domestica 48 Macaca mulatta 47 Gallus gallus 34 Oryctolagus cuniculus 30 Erinaceus europaeus 27 Strongylocentrotus purpuratus 27 Callorhinchus milii 26 Dasypus novemcinctus 22 Echinops telfairi 22 Cavia porcellus 17 Danio rerio 14 Schmidtea mediterranea 13 Ochotona princeps 13 Aplysia californica 10 Anopheles gambiae This on the other hand, has much better matches ( note expect limit ) 573 blastnew -out dog_sign -nuc -hits 0 -summ 3000 -db wgs -expect .01 TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT and it is confined to dogs: $ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk '{print $1" " $2}' | sort | uniq -c | sort -g -r | more 3000 Canis familiaris And these all seem to be in different places ( most frequent location occurs once): $ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' |awk '{print $3}'| sort | uniq -c | sort -g -r | more 1 ctg19866851899833, 1 ctg19866851899815, 1 ctg19866851899794, Anyone care to comment on significance of this sequence, or reason it is just an uninteresting fluke? Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com _________________________________________________________________ http://liveearth.msn.com From austin.tanney at almacgroup.com Thu Jul 26 08:51:42 2007 From: austin.tanney at almacgroup.com (Tanney, Austin) Date: Thu, 26 Jul 2007 13:51:42 +0100 Subject: [BiO BB] question on RNA and species signatures Message-ID: Hi Mike, Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/) miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome browser (http://www.ensembl.org/index.html) Thanks Austin -----Original Message----- From: bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinformatics.o rg [mailto:bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinfor matics.org]On Behalf Of Mike Marchywka Sent: 25 July 2007 20:59 To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] question on RNA and species signatures I've been generally trying to find a comprehensive way to analyze non-coding RNA with no luck. I've tried asking people in such areas as siRNA, riboswitch etc with out much success. Any comments or discussion? This came up most recently because I found a short sequence with unusual species distribution and I was curious to know if this thing has a name. If I just type in some random junk, I get about what you could expect: ( this is my own blast script with most terms being self explanatory, "-summ" translates into "-v" to limit summary lines, -db selects the wgs database ) 567 blastnew -out control -nuc -hits 0 -summ 3000 -db wgs -expect 1e8 TCCTGGAGTCCCAGAGTTCAGCTAAACCGATCACATTGTAT $ more control| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk '{print $1" " $2}' | sort | uniq -c | sort -g -r | more 304 Homo sapiens 261 Bos taurus 212 Pan troglodytes 171 Microcebus murinus 155 Equus caballus 136 Spermophilus tridecemlineatus 130 Canis familiaris 125 Otolemur garnettii 112 Ornithorhynchus anatinus 111 Tupaia belangeri 96 Myotis lucifugus 93 Mus musculus 86 Felis catus 75 Rattus norvegicus 74 Oryzias latipes 71 Drosophila erecta 68 Sorex araneus 63 Loxodonta africana 58 Anolis carolinensis 56 Monodelphis domestica 48 Macaca mulatta 47 Gallus gallus 34 Oryctolagus cuniculus 30 Erinaceus europaeus 27 Strongylocentrotus purpuratus 27 Callorhinchus milii 26 Dasypus novemcinctus 22 Echinops telfairi 22 Cavia porcellus 17 Danio rerio 14 Schmidtea mediterranea 13 Ochotona princeps 13 Aplysia californica 10 Anopheles gambiae This on the other hand, has much better matches ( note expect limit ) 573 blastnew -out dog_sign -nuc -hits 0 -summ 3000 -db wgs -expect .01 TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT and it is confined to dogs: $ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk '{print $1" " $2}' | sort | uniq -c | sort -g -r | more 3000 Canis familiaris And these all seem to be in different places ( most frequent location occurs once): $ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' |awk '{print $3}'| sort | uniq -c | sort -g -r | more 1 ctg19866851899833, 1 ctg19866851899815, 1 ctg19866851899794, Anyone care to comment on significance of this sequence, or reason it is just an uninteresting fluke? Thanks. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com _________________________________________________________________ http://liveearth.msn.com _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board Proprietary or confidential information belonging to Almac Group Limited or to one of its affiliated companies may be contained in this message. The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the e-mail in error please notify helpdesk at almacgroup.com and delete the e-mail from your system. E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient. Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free. From marchywka at hotmail.com Thu Jul 26 10:33:03 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Thu, 26 Jul 2007 10:33:03 -0400 Subject: [BiO BB] question on RNA and species signatures In-Reply-To: Message-ID: Thanks. Nothing on the one site but ensembl has some ideas, not sure how to interpret yet. fwiw, ncbi does have several repeats databases http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html and I tried against a few of these but no low-e hits. At higher e, there were a few suggestions in human repeats: $ blastnew -out non_dog -nuc -hits 10 -summ 3000 -db humrep -expect 100 TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT >MER31-internal#LTR/MER4-group Length = 4936 Score = 26.3 bits (13), Expect = 0.22 Identities = 13/13 (100%) Strand = Plus / Minus Query: 12 caggatccagtcc 24 ||||||||||||| Sbjct: 4369 caggatccagtcc 4357 I also ran against some other wgs's and there are some lower e hits to cat but still seems to be largely dog specific( matches 38/39 IIRC). Thanks Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com >From: "Tanney, Austin" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: RE: [BiO BB] question on RNA and species signatures >Date: Thu, 26 Jul 2007 13:51:42 +0100 > >Hi Mike, > >Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/) >miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome >browser (http://www.ensembl.org/index.html) > >Thanks > >Austin > > _________________________________________________________________ http://liveearth.msn.com From austin.tanney at almacgroup.com Thu Jul 26 10:59:05 2007 From: austin.tanney at almacgroup.com (Tanney, Austin) Date: Thu, 26 Jul 2007 15:59:05 +0100 Subject: [BiO BB] question on RNA and species signatures Message-ID: Mike, For short BLASTs, the e-vlaue is generally not that usefull. Often the recommended expect for short BLASTs is 1000. In this case its % identiy and coverage you should look for.. Realistically 100% coverage should be what you expect. -----Original Message----- From: bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinformatics.o rg [mailto:bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinfor matics.org]On Behalf Of Mike Marchywka Sent: 26 July 2007 15:33 To: bio_bulletin_board at bioinformatics.org Subject: RE: [BiO BB] question on RNA and species signatures Thanks. Nothing on the one site but ensembl has some ideas, not sure how to interpret yet. fwiw, ncbi does have several repeats databases http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html and I tried against a few of these but no low-e hits. At higher e, there were a few suggestions in human repeats: $ blastnew -out non_dog -nuc -hits 10 -summ 3000 -db humrep -expect 100 TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT >MER31-internal#LTR/MER4-group Length = 4936 Score = 26.3 bits (13), Expect = 0.22 Identities = 13/13 (100%) Strand = Plus / Minus Query: 12 caggatccagtcc 24 ||||||||||||| Sbjct: 4369 caggatccagtcc 4357 I also ran against some other wgs's and there are some lower e hits to cat but still seems to be largely dog specific( matches 38/39 IIRC). Thanks Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com >From: "Tanney, Austin" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: RE: [BiO BB] question on RNA and species signatures >Date: Thu, 26 Jul 2007 13:51:42 +0100 > >Hi Mike, > >Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/) >miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome >browser (http://www.ensembl.org/index.html) > >Thanks > >Austin > > _________________________________________________________________ http://liveearth.msn.com _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board Proprietary or confidential information belonging to Almac Group Limited or to one of its affiliated companies may be contained in this message. The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the e-mail in error please notify helpdesk at almacgroup.com and delete the e-mail from your system. E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient. Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free. From marchywka at hotmail.com Thu Jul 26 11:23:32 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Thu, 26 Jul 2007 11:23:32 -0400 Subject: [BiO BB] question on RNA and species signatures In-Reply-To: Message-ID: That's why I posted the alignment :) For a length ca. 40 bases it isn't too bad but, sure for some really short things it has been a problem. Normally I just collect a bunch of hits and sort the alignments with text processing, grep "[A-Z]\{10\}" , once I have some idea what the background looks like. Since I'm guessing here, good matches to shorter sequences could help isolate important parts of the longer sequence. >From: "Tanney, Austin" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: RE: [BiO BB] question on RNA and species signatures >Date: Thu, 26 Jul 2007 15:59:05 +0100 > >Mike, > >For short BLASTs, the e-vlaue is generally not that usefull. >Often the recommended expect for short BLASTs is 1000. >In this case its % identiy and coverage you should look for.. Realistically >100% coverage should be what you expect. > >-----Original Message----- >From: >bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinformatics.o >rg >[mailto:bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinfor >matics.org]On Behalf Of Mike Marchywka >Sent: 26 July 2007 15:33 >To: bio_bulletin_board at bioinformatics.org >Subject: RE: [BiO BB] question on RNA and species signatures > > > > >Thanks. Nothing on the one site but ensembl has some ideas, not sure >how to interpret yet. fwiw, ncbi does have several repeats databases > >http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html > >and I tried against a few of these but no low-e hits. >At higher e, there were a few suggestions in human repeats: > >$ blastnew -out non_dog -nuc -hits 10 -summ 3000 -db humrep -expect 100 >TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT > > >MER31-internal#LTR/MER4-group > Length = 4936 > >Score = 26.3 bits (13), Expect = 0.22 >Identities = 13/13 (100%) >Strand = Plus / Minus > > >Query: 12 caggatccagtcc 24 > ||||||||||||| >Sbjct: 4369 caggatccagtcc 4357 > >I also ran against >some other wgs's and there are some lower e hits to cat but still >seems to be largely dog specific( matches 38/39 IIRC). > >Thanks > > >Mike Marchywka >586 Saint James Walk >Marietta GA 30067-7165 >404-788-1216 (C)<- leave message >989-348-4796 (P)<- emergency only >marchywka at hotmail.com > > > > > > >From: "Tanney, Austin" > >Reply-To: "General Forum at Bioinformatics.Org" > > > >To: "General Forum at Bioinformatics.Org" > > > >Subject: RE: [BiO BB] question on RNA and species signatures > >Date: Thu, 26 Jul 2007 13:51:42 +0100 > > > >Hi Mike, > > > >Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/) > >miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome > >browser (http://www.ensembl.org/index.html) > > > >Thanks > > > >Austin > > > > > >_________________________________________________________________ >http://liveearth.msn.com > >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > >Proprietary or confidential information belonging to Almac Group Limited or >to one of its affiliated companies may be contained in this message. The >e-mail and any files transmitted with it are confidential and privileged >and intended solely for the use of the individual or entity to whom they >are addressed. > >Any unauthorised direct or indirect dissemination, distribution or copying >of this message and any attachments is strictly prohibited. > >If you have received the e-mail in error please notify >helpdesk at almacgroup.com and delete the e-mail from your system. > >E-mail and other communications sent to this company may be reviewed or >read by persons other than the intended recipient. > >Viruses : although we have taken steps to ensure that this e-mail and any >attachments are free from any virus, you should, in keeping with good >practice, ensure that they are actually virus free. > > >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07 From wongls at comp.nus.edu.sg Thu Jul 26 05:36:08 2007 From: wongls at comp.nus.edu.sg (Limsoon Wong) Date: Thu, 26 Jul 2007 17:36:08 +0800 Subject: [BiO BB] FW: Call for Papers RECOMB 2008 (Singapore) - Genome Research parallel submission option Message-ID: <003201c7cf68$6596f990$b9b81aac@comp.nus.edu.sg> Call for Papers for the 12th Annual International Conference on Research in Computational Molecular Biology RECOMB 2008 Singapore, 30th March - 2nd April 2008 University Cultural Center National University of Singapore NEW: Parallel submission option to Genome Research special RECOMB issue http://www.comp.nus.edu.sg/~recomb08/ Papers due 24th September 2007 Hosted by National University of Singapore, RECOMB 2008 will provide a general forum for disseminating the latest research in bioinformatics and computational biology. The multidisciplinary conference brings together academic and industrial scientists from molecular biology, genetics, medicine, computer science, mathematics, and statistics. Papers reporting on original research (both theoretical and experimental) in all areas of computational molecular biology are sought. NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW !!! Recomb Conference Series and Genome Research journal team up !!! Possibility of publication of RECOMB papers in Genome Research special issue. See web site for details: http://www.comp.nus.edu.sg/~recomb08/ NEW NEW NEW NEW NEW NEW NEW NEW NEW NEW Important Deadlines: September 24, 2007 - Deadline for paper submission December 10, 2007 - Notification of paper acceptance From marchywka at hotmail.com Sat Jul 28 11:08:04 2007 From: marchywka at hotmail.com (Mike Marchywka) Date: Sat, 28 Jul 2007 11:08:04 -0400 Subject: [BiO BB] Protein Datatypes for function prediction In-Reply-To: Message-ID: >rapidly, don't dismiss even simple text processing. [...] >( a few thousand, enough to cluster perhaps) I actually tried this with osteoglycins. If you download them, there aren't that many, pickout repeated "words", and cluster by presence of absence of the most popular words, it turns out to do a decent automated job of separating by species. These are the vectors ( presence/absence of the words) along with members having that vector ( names could be ambiguous ,for illustration only). I was hoping it would separate by type but that is a problem using most common words to discriminate. The zero vector amounts to a "miscellaneous" cluster. $ for f in `cat osteo_groups | awk '{print $2}' ` ; do echo $f; g=`grep $f osteo_vectors|awk '{print $1}'| sed -e 's/>//'`; echo $g; h=`echo $g|sed -e 's/\..*//g' |sed -e 's/ */\\\|/g'`; grep -A 2 "$h" osteo_rdict| grep "DEFINITION"| sed -e 's/DEFINITION//' ; done |unix2dos >/dev/clipboard 1111111111111111111111111011111111110001 CAI16694 AAH95443 AAH37273 NP_148935 NP_054776 ABM85338 ABM82153 EAW62820 EAW62819 EAW62818 P20774 CAB53706 osteoglycin [Homo sapiens]. Osteoglycin [Homo sapiens]. Osteoglycin [Homo sapiens]. osteoglycin preproprotein isoform 2 [Homo sapiens]. osteoglycin preproprotein isoform 2 [Homo sapiens]. osteoglycin (osteoinductive factor, mimecan) [synthetic construct]. osteoglycin (osteoinductive factor, mimecan) [synthetic construct]. osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo Mimecan precursor (Osteoglycin) (Osteoinductive factor) (OIF). hypothetical protein [Homo sapiens]. 1111111011111000100001011101001000001110 NP_032786 EDL41086 AAH21939 BAA06721 Q62000 BAE35995 BAC35462 osteoglycin [Mus musculus]. osteoglycin [Mus musculus]. Osteoglycin [Mus musculus]. osteoglycin precursor [Mus musculus]. Mimecan precursor (Osteoglycin). unnamed protein product [Mus musculus]. unnamed protein product [Mus musculus]. 0000000000000000000000000000000000000000 CAK03681 NP_002336 O42235 NP_032464 NP_989507 NP_033885 novel protein similar to vertebrate osteoglycin (osteoinductive lumican precursor [Homo sapiens]. Keratocan precursor (KTN) (Keratan sulfate proteoglycan keratocan). keratocan [Mus musculus]. keratocan [Gallus gallus]. bone morphogenetic protein 1 [Mus musculus]. 1101111111100000001001011100001000001110 EDL98110 XP_001054654 XP_001054599 XP_001054725 XP_214441 osteoglycin (predicted) [Rattus norvegicus]. PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 2 PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 1 PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 3 PREDICTED: similar to Mimecan precursor (Osteoglycin) [Rattus 0000001000100110000100000000000000000000 NP_989540 AAD21085 Q9W6H0 Q9DE65 osteoglycin [Gallus gallus]. osteoglycin [Gallus gallus]. Mimecan precursor (Osteoglycin). Mimecan precursor (Osteoglycin). 1111111111111111111111111011011111110001 AAP97142 Q5RBL2 CAH90848 osteoglycin OG [Homo sapiens]. Mimecan precursor (Osteoglycin). hypothetical protein [Pongo pygmaeus]. 1110101111110110010001011001011111111110 NP_001075585 AAM46865 Q8MJF1 osteoglycin [Oryctolagus cuniculus]. osteoglycin [Oryctolagus cuniculus]. Mimecan precursor (Osteoglycin). 1110111111100011110110111111011001100010 ABQ13007 P19879 osteoglycin preproprotein [Bos taurus]. Mimecan precursor (Osteoglycin) [Contains: Corneal keratan sulfate 1110011111100011110110111111001001100010 NP_776371 AAB70264 osteoglycin [Bos taurus]. mimecan [Bos taurus]. 1111111111111111111111011011011111110000 NP_077727 osteoglycin preproprotein isoform 1 [Homo sapiens]. 1111011111101111111110111011001111100001 XP_001103337 PREDICTED: osteoglycin isoform 2 [Macaca mulatta]. 1111011111101111111110011011001111100000 XP_001103195 PREDICTED: osteoglycin isoform 1 [Macaca mulatta]. 1110111111110011010001111110011001010000 ABL96619 osteoglycin [Capra hircus]. 1101011111110010010111011100000001110110 XP_853340 PREDICTED: similar to Mimecan precursor (Osteoglycin) 1100000111011110110111000011000101110000 CAB61417 hypothetical protein [Homo sapiens]. 1011111000100111011000011000011110000000 CAI16695 osteoglycin [Homo sapiens]. 1011111000100111001000111000011010000001 AAX25979 SJCHGC07866 protein [Schistosoma japonicum]. 0000001000100000000000000000000000000000 NP_001080164 osteoglycin [Xenopus laevis]. 0000000110000000000000000000000000000000 CAJ57655 osteoglycin [Sus scrofa]. 0000000000100100000000000000000000000000 XP_001512743 PREDICTED: similar to osteoglycin preproprotein [Ornithorhynchus 0000000000000001000000100000000000000001 AAD40453 mimecan [Homo sapiens]. Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka at hotmail.com >From: "Mike Marchywka" >Reply-To: "General Forum at Bioinformatics.Org" > >To: bio_bulletin_board at bioinformatics.org >Subject: Re: [BiO BB] Protein Datatypes for function prediction >Date: Tue, 24 Jul 2007 07:50:50 -0400 > _________________________________________________________________ http://newlivehotmail.com From codeshepherd at gmail.com Tue Jul 31 11:48:24 2007 From: codeshepherd at gmail.com (=?ISO-8859-1?Q?Dee=FE=E0n_Chakravarth=FF?=) Date: Tue, 31 Jul 2007 23:48:24 +0800 Subject: [BiO BB] MetaBase 1.0 Message-ID: <46AF59C8.3040900@gmail.com> Announcing the release of MetaBase version 1.0. http://biodatabase.org This release documents over 850 databases and nearly 700 data-resources with a growing number of 'user-contributed' articles! MetaBase is a user-contributed 'database database', designed to list and categorize all the biological databases and data-resources available on the internet! This first release owes much to the Nucleic Acids Research 'Database Summaries' database (2006) and the sister 'Web Server Summaries' database (2006). Permission to use the textual content of these two databases was kindly provided by Oxford University Press. MetaBase is a 'user-contributed resource', allowing anyone to freely contribute, edit and update entries. Using the same 'MediaWiki' technology that runs the popular 'WikiPedia' website, MetaBase has the capacity to grow in content and scope and is entirely driven by its users. Additionally, many aspects of the database organization can be redesigned by users who understand the power of the MediaWiki templating system. For more information see; http://biodatabase.org/index.php/What_is_MetaBase%3F There are many ways that *you* can contribute to MetaBase! For some examples see; http://biodatabase.org/index.php/Contribute_to_MetaBase Because MetaBase is a community project, the first 30 'significant' contributors will be added to the list of authors in the first MetaBase publication. See; http://biodatabase.org/index.php/MetaBase_publications and http://biodatabase.org/index.php/List_of_contributors The idea behind this is to emphasise the community aspect of the project while encouraging people to contribute their expertise to the growing system. So if you or someone you know maintains a database, or if you are just interested in helping out, please take a look at the project. The MetaBase people. From rsachdev at usc.edu Mon Jul 30 16:48:09 2007 From: rsachdev at usc.edu (Rohan Sachdeva) Date: Mon, 30 Jul 2007 13:48:09 -0700 Subject: [BiO BB] Automatic blast database maintenance/updating Message-ID: <25b698b90707301348n2e655331j7884077dd7e26ef8@mail.gmail.com> Hello I've been charged with installing and maintaining a wwwblast server in my lab. I've got everything setup but I am looking for an easy way to keep all the databases updated. I was hoping someone could point me toward a script that used update_blastdb.pl to update whatever databases and then extract them too. Thanks From barry.hardy at vtxmail.ch Tue Jul 31 07:48:44 2007 From: barry.hardy at vtxmail.ch (Barry Hardy) Date: Tue, 31 Jul 2007 13:48:44 +0200 Subject: [BiO BB] eCheminfo Drug Discovery Workshop, Oxford, September 10-14 Message-ID: <46AF219C.3020903@vtxmail.ch> We are holding an eCheminfo Drug Discovery Workshop week at Oxford University, UK the week of 10-14 September 2007. The approach will be hands-on using leading drug discovery software packages, accompanied by practitioner-led lectures and discussions of the methods worked on by the group. Topics to be covered: Virtual Screening & Docking; Pharmacophore Derivation, Elucidation and Searching, Applications of Filtering and Similarity in Virtual Screening, Focused Library Design, Analysing Chemical Databases using Advanced Structure Searching and Structure Based Predictions, Protein Modelling, Pediction of Pharmacological Properties and QSAR Analysis, Latest advances in ADME & Predictive Toxicology; Pharmacokinetics & Pharmacodynamics and Physiological-based Simulation. More information: Program (as pdf): http://www.douglasconnect.com/files/eChemProgramOxford07-Sept-v1web.PDF Program & Schedule with Abstracts & Bios: http://www.echeminfo.com/COMTY_training/ If interested, please make your reservation soon as the size of the group is limited and we have a limited number of places remaining. best regards Barry Hardy eCheminfo Community of Practice Manager Barry Hardy, PhD Douglas Connect Zeiningen, CH-4314 Switzerland Tel: +41 61 851 0170 Blog: http://barryhardy.blogs.com/cheminfostream/ From tsmith at darwin.bu.edu Tue Jul 31 11:38:36 2007 From: tsmith at darwin.bu.edu (Temple F. Smith) Date: Tue, 31 Jul 2007 11:38:36 -0400 Subject: [BiO BB] Bioinformatic and the Smith-waterman Message-ID: <46AF577C.2080008@darwin.bu.edu> Jeff for the record: Jean-Michel Claverie of France used the term "bio-informatik" in some email at the time of the "Waterville Valley computational biology meetings in the mid to late 80's. However in his book, Bioinformatics for Dummies I do not recall that he discusses the origin of the term? Recall that the term Informatik is the French word for computer science and Jean-Michel was one of the early guys in this "field" but true computational biology goes back to Haldane (1908) and D'arcy Thompson (1942), Dayhoff (1966) etc ..... to say nothing about the x-ray crystal guys of the last 1950's!! Clearly the term was not used in 1980 to 1982 when Dr. Waterman and I were starting out!! And no one "started the Field of Bioinformatics" it grow out of the molecular biology with protein sequencing and then DNA/RNA sequencing's need for databases and computer analysis. The first such recognized early work was by people like Zuckerkandl and Pauling (1965) and Fitch and Margolisash (1967) and then Needleman and Wuchsch (1977)! Thus unless the Dr. Hwa A.Lim can claim to have been doing sequence comparative analysis in the late 1960's he is not a founder! While I was the organizer of the three Waterville Valley Genes and Machines meetings to which Jean-Michel attended, I did not use the term for at least another two years if I remember correctly. Also on my visits with Dayhoff and later Fitch they both agreed that it was likely Jean-Michel then at the FRENCH Institute Pasture who surely used it first --particularly given his use in email as something he had been using at home in the Paris Institute. Thus unless Jean-Michel says other wise all others making such claims should stop! This is an old discussion which I find not funny any more. In fact the terms is a bit out of date these days and the better term is computational biology in any case. Please pass this on to who ever is still asking this now unimportant question. Temple F. Smith, PhD BMERC Boston University From jeff at bioinformatics.org Tue Jul 31 19:34:23 2007 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Tue, 31 Jul 2007 19:34:23 -0400 Subject: [BiO BB] Re: Bioinformatic and the Smith-waterman In-Reply-To: <46AF577C.2080008@darwin.bu.edu> References: <46AF577C.2080008@darwin.bu.edu> Message-ID: <46AFC6FF.9020206@bioinformatics.org> Hi Temple! (For those on the mailing list, the origins of the word "bioinformatics" was brought up in this thread: http://bioinformatics.org/pipermail/bio_bulletin_board/2002-April/000635.html and we have a short wiki page on the topic: http://wiki.bioinformatics.org/Origins_of_bioinformatics) Thank you for clearing up some misconceptions about the origins of the field, including some of my own. I guess when we spoke about this around 1998, you actually said that you were *incorrectly* credited with having coined the word. In any case, I agree that no one person or group started the field. And it seems there are as many different definitions as there are practitioners. To me, bioinformatics is a compound of "bio" and the English/common word "informatics" (http://en.wikipedia.org/wiki/Informatics), with the latter being a subdiscipline of computer science, whereas the French "informatique" will translate to "computer science" in general. A Wikipedia contributor wrote the following about "informatics," and I pretty much agree with it: "Used as a compound, in conjunction with the name of a discipline, as in medical informatics, bioinformatics, etc., it denotes the specialization of informatics to the management and processing of data, information and knowledge in the named discipline, and the incorporation of informatic concepts and theories to enrich the other discipline." So, I think of bioinformatics as a subdiscipline of computational biology, the same way that informatics is a subdiscipline of computer science. But, I will cede that most people think that the terms are synonymous. And maybe it just doesn't matter. I will integrate most of what you've written into our wiki, which will hopefully help clear up some of the confusion about where things started. Cheers, Jeff Temple F. Smith wrote: > Jeff for the record: > > Jean-Michel Claverie of France used the term "bio-informatik" > in some email at the time of the "Waterville Valley computational > biology meetings in the mid to late 80's. However in his book, > Bioinformatics for Dummies I do not recall that he discusses the > origin of the term? Recall that the term Informatik is the French > word for computer science and Jean-Michel was one of the early guys > in this "field" but true computational biology goes back to Haldane > (1908) and D'arcy Thompson (1942), Dayhoff (1966) etc ..... to say > nothing about the x-ray crystal guys of the last 1950's!! Clearly the > term was not used in 1980 to 1982 when Dr. Waterman and I were > starting out!! And no one "started the Field of Bioinformatics" it > grow out of the molecular biology with protein sequencing and then > DNA/RNA sequencing's need for databases and computer analysis. The > first such recognized early work was by people like Zuckerkandl and > Pauling (1965) and Fitch and Margolisash (1967) and then Needleman > and Wuchsch (1977)! Thus unless the Dr. Hwa A.Lim can claim to have > been doing sequence comparative analysis in the late 1960's he is not > a founder! > While I was the organizer of the three Waterville Valley Genes > and Machines meetings to which Jean-Michel attended, I did not use > the term for at least another two years if I remember correctly. Also > on my visits with Dayhoff and later Fitch they both agreed that it > was likely Jean-Michel then at the FRENCH Institute Pasture who > surely used it first --particularly given his use in email as > something he had been using at home in the Paris Institute. Thus > unless Jean-Michel says other wise all others making such claims > should stop! This is an old discussion which I find not funny any > more. In fact the terms is a bit out of date these days and the > better term is computational biology in any case. > > Please pass this on to who ever is still asking this now unimportant > question. > > Temple F. Smith, PhD > BMERC > Boston University > -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 --