From codeshepherd at gmail.com  Mon Jul  2 07:10:53 2007
From: codeshepherd at gmail.com (=?ISO-8859-1?Q?Dee=FE=E0n_Chakravarth=FF?=)
Date: Mon, 02 Jul 2007 19:10:53 +0800
Subject: [BiO BB] parallel clustalw
Message-ID: <4688DD3D.5020305@gmail.com>

Hi All,
  Am trying to compile parallel version of clustalw on Cent OS 4.  I 
downloaded  the package from

ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/ParClustal0.2.tar.gz


I have ncbi tool kit installed in my machine.  When I run make I get the 
following error.

: undefined reference to `seek_si_sj_to_calculate'
slaves.o(.text+0x687): In function `main_processes_slaves':
: undefined reference to `subpairalign'
collect2: ld returned 1 exit status
make: *** [clustalx] Error 1

I had attached the complete output of make, uname and gcc version below.


$ uname -a
Linux panther5.nus.edu.sg 2.6.9-42.0.10.ELsmp #1 SMP Tue Feb 27 10:11:19 
EST 2007 i686 i686 i386 GNU/Linux


]$ gcc -v
Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.6/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-java-awt=gtk 
--host=i386-redhat-linux
Thread model: posix
gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)


$ make
cc -c -O -I/usr/include/openmpi/ init.c
cc -c -O -I/usr/include/openmpi/ interface.c
cc -c -O -I/usr/include/openmpi/ readseq.c
cc -c -O -I/usr/include/openmpi/ writeseq.c
cc -c -O -I/usr/include/openmpi/ showpair.c
cc -c -O -I/usr/include/openmpi/ malign.c
cc -c -O -I/usr/include/openmpi/ util.c
cc -c -O -I/usr/include/openmpi/ trees.c
cc -c -O -I/usr/include/openmpi/ gcgcheck.c
cc -c -O -I/usr/include/openmpi/ prfalign.c
cc  -c -O -I/usr/include/openmpi/ pairalign.c
cc -c -O -I/usr/include/openmpi/ calcgapcoeff.c
cc -c -O -I/usr/include/openmpi/ calcprf1.c
cc -c -O -I/usr/include/openmpi/ calcprf2.c
cc -c -O -I/usr/include/openmpi/ readtree.c
cc -c -O -I/usr/include/openmpi/ seqweight.c
cc -c -O -I/usr/include/openmpi/ readmat.c
cc -c -O -I/usr/include/openmpi/ alnscore.c
cc -c -O -I/usr/include/openmpi/ random.c
cc -c -O -I/usr/include/openmpi/ motifs.c
cc -c -O -I/usr/include/openmpi/ bionj.c
cc  -c -O -I/usr/include/openmpi/ slaves.c
cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF 
-I/home/griduser/clustalw/toolbox/ncbi/include  xutils.c
cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF 
-I/home/griduser/clustalw/toolbox/ncbi/include  xmenu.c
cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF 
-I/home/griduser/clustalw/toolbox/ncbi/include  xcolor.c
cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF 
-I/home/griduser/clustalw/toolbox/ncbi/include  xdisplay.c
cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF 
-I/home/griduser/clustalw/toolbox/ncbi/include  xscore.c
cc -c -O -I/usr/include/openmpi/ -DWIN_MOTIF 
-I/home/griduser/clustalw/toolbox/ncbi/include  clustalx.c
cc -o clustalx init.o interface.o readseq.o writeseq.o showpair.o 
malign.o util.o trees.o gcgcheck.o prfalign.o pairalign.o calcgapcoeff.o 
calcprf1.o calcprf2.o readtree.o seqweight.o readmat.o alnscore.o 
random.o motifs.o bionj.o slaves.o xutils.o xmenu.o xcolor.o xdisplay.o 
xscore.o clustalx.o -O -lm -lmpi -lrt -pthread 
-L/home/griduser/clustalw/toolbox/ncbi/lib -L/usr/lib/openmpi/ 
-L/usr/X11R6/lib  -lvibrant -lncbi -lpthread -lXm -lXmu -lXt -lX11 -lm
/home/griduser/clustalw/toolbox/ncbi/lib/libncbi.a(ncbifile.o)(.text+0x945): 
In function `Nlm_TmpNam':
: warning: the use of `tempnam' is dangerous, better use `mkstemp'
writeseq.o(.text+0x284): In function `open_output_file':
: warning: the `gets' function is dangerous and should not be used.
slaves.o(.text+0x5f0): In function `main_processes_slaves':
: undefined reference to `seek_si_sj_to_calculate'
slaves.o(.text+0x687): In function `main_processes_slaves':
: undefined reference to `subpairalign'
collect2: ld returned 1 exit status
make: *** [clustalx] Error 1


Thanks
Deepan


From wiiat at kis-lab.com  Sun Jul  1 23:40:16 2007
From: wiiat at kis-lab.com (WI-IAT07)
Date: Mon, 2 Jul 2007 12:40:16 +0900
Subject: [BiO BB] CFP: Workshop on Biomedicine Applications of Web
	technologies (BMWT 2007) <10 July>
Message-ID: <20070702034705.8FDEE368807@primary.bioinformatics.org>

================================================================================
                          Call for Papers
     International Workshop on BioMedicine applications of Web Technologies
                 in conjunction with WI/IAT 2007 
            Silicon Valley, CA, USA, November 2-5, 2007
           http://chunnan.iis.sinica.edu.tw/BMWT2007.html    
========================================================================

=========================
Topics of the workshop
=========================
BMWT 2007 will be part of events of the WI-IAT conference and will be arranged for a date 

during November 2-5. This workshop will cover interesting topics of applications of 

intelligent Web technologies in bio-medicine, including Web Ontology, Semantic Web, Web 

Usage Mining, Web Search and Intelligent Agents. Original contributions will be solicited in 

the following subjects (but not necessarily limited to):

    * Web information extraction and wrapper generation
    * Applications of Web taxonomies and ontologies in bio-medicine
    * Web-based service-oriented architecture in bio-medicine
    * Integration and maintenance of bio-medicine taxonomies and ontologies
    * Web content and structure mining , Web Information retrieval and filtering
    * Web-based data collection, curation and analysis
    * Text mining for metadata creation
    * Multimedia contents in bio-medicine on the Web
    * Web search engines, meta-search engines and inference engine
    * Semantic Web
    * Intelligent Web agents
    * Knowledge community formation and support

This workshop intends to bring together researchers and practitioners to foster the exchange 

of ideas and the dissemination of emerging techniques on intelligent Web technology in bio-

medicine applications. The workshop will capture current important developments of new 

models, new methodologies and new tools for building a variety of embodiments of scalable, 

effective and intelligent Web-based information systems for the ever-increasing needs of 

bio-medicine applications.

==================
Type of workshop
==================
The workshop will run as a half-day workshop for approximately 5 hours with 1~2 keynote 

speeches and 8~12 paper presentations.

 
==================
Important dates
==================
   July 10, 2007:      Due date for full workshop papers submission
   August 1, 2007:     Final acceptance by Workshop Co-Chairs
   August 2, 2007:     Notification of paper acceptance to authors
   August 17, 2007:    Camera-ready of accepted papers
   November 2-5, 2007: Workshops

 
==========================
Paper submission guideline
==========================
(1)   All submitted papers will be reviewed on the basis of technical quality, relevance, 

significance, and clarity by at least two reviews for each paper.

(2)   We will use WI-IAT 2007 Cyberchair system for on-line paper submission and review 

process. Details to be announced.

(3)   The length of accepted papers should NOT exceed 4 pages (IEEE-CS format, extra payment 

is only available for one more extra page).

(4)   We will not have a separate workshop registration fee this year. (i.e., only one 

conference registration covers everything).

 
=================
Program committee
=================
(Tentative)
Howard CT Ho, IBM Almaden, USA
Chun-Nan Hsu, Academia Sinica, Taiwan
Chung-Yen Lin, Academia Sinica, Taiwan
Wen-Hsiang Lu, NCKU, Taiwan
Louiqa Raschid, U of Maryland, USA
Shin-Mu Tseng, NCKU, Taiwan
Samson Tu, Stanford, USA
Qiang Yang, HKUST, Hong Kong, China
Ueng-Cheng Yang, NYMU, Taiwan

====================
Organizing committee
====================
Chun-Nan Hsu (Co-Chair)
Institute of Information Science
Academia Sinica, Taipei, Taiwan
chunnan at iis.sinica.edu.tw

Vincent Shin-Mu Tseng (Co-Chair)
Department of Computer Science and Information Engineering
National Cheng Kung University, Tainan, Taiwan
tsengsm at mail.ncku.edu.tw

Wen-Hsiang Lu (Co-Chair)
Department of Computer Science and Information Engineering
National Cheng Kung University, Tainan, Taiwan
whlu at mail.ncku.edu.tw


From wongls at comp.nus.edu.sg  Wed Jul 11 10:54:23 2007
From: wongls at comp.nus.edu.sg (Limsoon Wong)
Date: Wed, 11 Jul 2007 22:54:23 +0800
Subject: [BiO BB] GIW2007 --- Call for posters
Message-ID: <001201c7c3cb$5f3f2660$1e1015ac@comp.nus.edu.sg>

 
CALL FOR POSTERS

The 18th International Conference on Genome Informatics (GIW 2007) 

Biopolis, Singapore.

December 3-5 2007.

http://www.comp.nus.edu.sg/~giw2007 

 
The 18th International Conference on Genome Informatics (GIW 2007) will be
held at the Biopolis in Singapore on December 3-5, 2007. The GIW is the
longest running international bioinformatics conference, which has provided
unique opportunities that bridge theory and experiments, academia and
industry, and East and West. 

 
SUBMISSIONS: 

Poster or software demonstration abstracts are limited to 2 pages, including
title, figures, tables, text, and bibliography. Please see below for
Abstract Templates. All abstract should be submitted at the site
http://www.easychair.org/GIWPoster2007. 

 
Accepted posters and software demonstration abstracts will be compiled into
"The 18th International Conference on Genomic Informatics, Posters and
Software Demonstrations". Additionally, a number of notable abstracts will
be selected for oral presentations.

 
IMPORTANT DATES:

Poster submission deadline: 16 September 2007

Poster decision: 14 October 2007 

Poster templates: http://www.comp.nus.edu.sg/~giw2007/poster.html

 
From orbitz at ezabel.com  Wed Jul 11 21:46:33 2007
From: orbitz at ezabel.com (orbitz at ezabel.com)
Date: Wed, 11 Jul 2007 21:46:33 -0400
Subject: [BiO BB] Masters Program Advice
Message-ID: <83FAF617-8E05-413A-B729-1846BFE7A307@ezabel.com>

Hello,
I am looking for some advice on bioinformatics masters programs.  My  
background is mostly as a computer scientists.  I currently work as a  
programmer.  I completed an undergraduate bioinformatics program and  
have been out of school for about a year.  I would like to get a  
masters.  I have started by looking for masters programs, in the US.   
I have found three schools so far that look interesting.  I'm not  
sure how one chooses a grad school, I have been told that I should  
decide on what aspect of bioinformatics I would like to research and  
find a school with someone who specializes in that, although I  
honestly don't know enough about an bioinformaticians or the schools  
at the moment.  I know I would like to work with infectious diseases  
currently but I'm not sure about beyond that.  I am not sure if any  
of the schools I have been currently looking at are good for this, or  
if I can get into them currently.  The undergraduate program I was  
part of also did not have all of the classes which are pre-reqs in  
some of the programs I have looked at.  For instance my program did  
not involve Organic Chemistry and at least one of the programs  
requires two semester of it as a pre-req. I am not sure how this  
situation is dealt with.

The three schools I have looked at currently are Johns Hopkins,  
Stanford, and University Of Wisconsin.

Any advice would be appreciated.

Thanks,
M


From phoebe at deakin.edu.au  Fri Jul 13 06:49:15 2007
From: phoebe at deakin.edu.au (Phoebe Chen)
Date: Fri, 13 Jul 2007 20:49:15 +1000
Subject: [BiO BB] APBC2008 Deadline Approaching
Message-ID: <20070713204915.oyzifcv28w8ssowc@mail.deakin.edu.au>

APBC2008 Deadline Approaching -
Full Paper Submission on 20 July (7 more days)

CALL FOR PAPERS - APBC 2008
The Sixth Asia-Pacific Bioinformatics Conference, APBC2008,
will be held in Kyoto, Japan, during 14-17 January 2008.
See http://bic.kyoto-u.ac.jp/apbc2008/index.html

The Asia-Pacific Bioinformatics Conference series is an annual
forum for exploring research, development, and novel
applications bioinformatics.

IMPORTANT DATES
Submission of papers 20 July 2007 ***
Notification of paper acceptance 17 September 2007
Submission of posters 30 September 2007
Camera-ready copy & Author registration 30 September 2007
Notification of poster acceptance 20 October 2007
Conference 14-17 January 2008


From swhwang10 at yahoo.com  Sun Jul 15 00:42:22 2007
From: swhwang10 at yahoo.com (Seungwoo Hwang)
Date: Sat, 14 Jul 2007 21:42:22 -0700 (PDT)
Subject: [BiO BB] [Announcement]The 2nd International BioWiki Contest
Message-ID: <57012.18896.qm@web43138.mail.sp1.yahoo.com>

The 2nd International BioWiki Contest

[Objective]
Wiki is a web technology designed to allow multiple authors to freely
add and edit website contents. Wiki is thus well suited for developing
collaborative online knowledge bases, whose best known example is
Wikipedia (http://wikipedia.org). The objective of this contest is to
adopt the wiki paradigm towards collaborative development of biological
knowledge bases that anyone can contribute.

In this contest, each participant will develop a wiki-based web site
that will serve as a useful knowledge source for a biological subject
of his/her choice. Web sites created from previous contests are shown
at http://biowiki.net/index.php/Biowiki_Site_List as examples.

[Schedule]
- Registration: 2007/07/21 - 2007/08/20
- Final Result Due: 2007/08/26
- Award Notification: After 2007/08/31

[How to register]
Obtain the registration form from the contest website and send the
completed form to swhwang at kribb.re.kr

[Venue]
This is an online contest. All contest activities (registration,
submission, and award notification) take place in the internet.

In addition to the online processes, a presentation and an announcement
of the event will also be given during the 3rd ISCB Student Council
Symposium at Vienna, Austria on 2007/07/21. Evaluation will also be
done during the 6th International Conference on Bioinformatics (INCOB
2007) at Hong Kong on 2007/08/31.

[Awards]
The following prizes will be awarded:
- 1st place (1 team): $500 USD
- 2nd place (1 team): $300 USD
- 3rd place (3 teams): $200 USD per team
Prizes will be paid through wire transfer.

[Evaluation]
Each web site will be evaluated on the following criteria:
- Goal-orientedness: It should serve as a useful biological knowledge
base.
- Contents: It should contain high quality contents.
- Expandability: It should be well organized so that other users can
easily figure out how to contribute.

The evaluation will be done on-site at the 6th International Conference
on Bioinformatics (INCOB 2007) by a panel of experts from eight
countries in the Asia-Pacific region. The evaluation result will be
announced on-line.

[Sponsor]
This contest is organized by APBioNet (Asia Pacific Bioinformatics
Network) and KOBIC (Korean Bioinformation Center,
http://www.kobic.re.kr). KOBIC provides funds and computing
infrastructure for this contest.

[Copyright]
- Copyrighted contents should not be used without the permission of the
owners. Once put in the contest, all the contents will lose copyrights.
Participants should not copy and paste copyrighted text contents from
other web sites without the removal of copyrights.
- All the images and other non-text materials are under the same
principle as text.
- All the contents created and uploaded by participants will be
openfreely shared under BioLicense (http://biolicense.org/).

[Contact]
Jong Bhak, Ph.D. jong at kribb.re.kr
Seungwoo Hwang, Ph.D. swhwang at kribb.re.kr

[Contest URL]
http://biowiki.net/index.php/Current_Contests 


      ____________________________________________________________________________________
Luggage? GPS? Comic books? 
Check out fitting gifts for grads at Yahoo! Search
http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz


From me.lixue at gmail.com  Tue Jul 17 14:33:59 2007
From: me.lixue at gmail.com (Xue Li)
Date: Tue, 17 Jul 2007 13:33:59 -0500
Subject: [BiO BB] hat kinds of data mining techniques have been using in
	drug discovery and drug delivery
Message-ID: <62ed16460707171133j7d2c79evcf8bb0b366d01776@mail.gmail.com>

Hello all,

I was wondering what kinds of data mining techniques have been using in drug
discovery and drug delivery? It would be much appreciated if you could offer
me some resources to find it out. Millions of thanks.

As far as I know, classification techniques are used in protein-protein
interface prediction, and RNA-  , DNA-  interface prediction.
Are optimization techniques used? How about regression techniques?

-- 
Li Xue
Bioinformatics and Computational Biology program @ ISU
515-520-1676
Ames, IA 50010


From marchywka at hotmail.com  Tue Jul 17 19:08:03 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 17 Jul 2007 19:08:03 -0400
Subject: [BiO BB] hat kinds of data mining techniques have been using
	indrug discovery and drug delivery
In-Reply-To: <62ed16460707171133j7d2c79evcf8bb0b366d01776@mail.gmail.com>
Message-ID: <BAY108-F121E94486FB70862057FEABEF90@phx.gbl>

I don't really have anything insightful to say directly regarding your 
question but I will point
out that there are good case studies on pubmed- protease and polymerase 
targeting, or
if you prefer kinases, should keep you busy for a while :) As far as 
delivery,
lots of nano stuff, not sure on tools.

I did want to mention that I have been amazed at the ( apparent) lack of 
some simple
tools however. While it is quite possible I missed some, I have had to write
a lot of scripts and c++ code to augment the Affymetrix annotations. Much of
this is just the novel idea of using a computer to automate data processing 
rather
than require a user to appreciate someone's nice web page for every protein
he wants to investigate. However, then there are things like string 
correlators
that execute in reasonable time, programmable ribosomes, format converters 
etc.

I suppose I could have looked more carefully at conserved domain servers or
bioperl packages to address various parts of the problem but so far I think
I've done better with my own approach.  I'm not sure I have exploited all
the online tools- I only really use blast and eutils to download batches
of proteins or nucleotides - but I do know that perl, at least under cygwin,
was simply too slow to do anything of any size. I ended up writing my own
c++ string manipulation stuff ( even here under cygwin the STL string
classes were just too slow and I created objects that manipulate c-style
strings ). Even using grep+sed to convert to fasta files was quite slow
( although I think this turned out to be mostly a problem using
cygwin to pipe results - the console buffering seems to be the problem).
I don't want to sound negative on cygwin- you just can't do this with
BAT files and it is hard to get reasonable performance on top of windoze-
but I'm not sure if that is creating some of the limitations.

Don't know if any of that helps but I am curious if anyone has similar or
contrasting observations.

Thanks.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com


>From: "Xue Li" <me.lixue at gmail.com>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] hat kinds of data mining techniques have been using 
>indrug discovery and drug delivery
>Date: Tue, 17 Jul 2007 13:33:59 -0500
>
>Hello all,
>
>I was wondering what kinds of data mining techniques have been using in 
>drug
>discovery and drug delivery? It would be much appreciated if you could 
>offer
>me some resources to find it out. Millions of thanks.
>
>As far as I know, classification techniques are used in protein-protein
>interface prediction, and RNA-  , DNA-  interface prediction.
>Are optimization techniques used? How about regression techniques?
>
>--
>Li Xue
>Bioinformatics and Computational Biology program @ ISU
>515-520-1676
>Ames, IA 50010
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07


From bader at cbio.mskcc.org  Wed Jul 18 09:36:03 2007
From: bader at cbio.mskcc.org (Gary Bader)
Date: Wed, 18 Jul 2007 09:36:03 -0400
Subject: [BiO BB] First announcement: 5th Annual Cytoscape Public Symposium
 and Developers Retreat]
Message-ID: <469E1743.8060505@cbio.mskcc.org>

*_5th Annual Cytoscape Public Symposium and Developers Retreat _*
Amsterdam - Netherlands

*November 8, 2007: Public Symposium:*

*/Integrative Bioinformatics:
  At the cutting edge of network analysis and biological data integration/*

*Leroy Hood* -Institute of Systems Biology
*Ewan Birney* - European Bioinformatics Institute
*Peter Sorger* - Harvard University
*Trey Ideker* - University of California, San Diego
*Chris Sander* - Memorial-Sloan-Kettering Cancer Center
*Benno Schwikowski* - Institute Pasteur
*Andrew Hopkins*--Pfizer Global Research & Development*
  *

*November 6,7 and 9, 2007: Developers retreat *

*Department of Human Genetics, **Academic** **Medical** **Center*
*University of Amsterdam**, **Netherlands*

  We are pleased to announce the forthcoming 2007 Cytoscape Symposium and
Retreat to be held at the Academic Medical Center, University of
Amsterdam , Netherlands on 6-9 Nov 2007. More information, updates and
registration will be available at:

http://www.cytoscape.org/retreat2007

This years meeting is particularly exciting since it is the first time
it will be held in Europe, specifically in the vibrant historic city of
Amsterdam. 'Floating' on a web of canals, with its unique combination of
old-world charm and cosmopolitan culture, Amsterdam is one of the most
popular European cities for international visitors.


To reach Cytoscape's large European user base a Public Symposium will be
held on November 8th for which there is a formidable list of confirmed
speakers. Apart from the Symposium the retreat consists of hands-on
demo's, user-training sessions  and informal, technically focused
developer meetings:

     * *Tues 6th Nov:*      Cytoscape Developer's Discussions
     * *Wed 7th Nov:*       *Application showcase, hands-on sessions,
       tutorials *
     * *Thur 8th Nov:*       */Public Symposium/*
     * *Fri 9th Nov:*          Development of Cytoscape Roadmap for 2007,
       2008

The Symposium on the 8th is of general interest to both biologists and
informaticians. Current and future users of Cytoscape are invited to
visit the Application showcase on the 7th also. The developers days are
targeted at the core developers of Cytoscape and plugins.

We hope that you can join us!

The Organizing Committee, 5^th Cytoscape Retreat 2007

Annette Adler   - Agilent Technologies
Piet Molenaar   - Human Genetics Department AMC
Guy Warner     - Unilever

/Contact/: cytoretreat at cytoscape.org <mailto:cytoretreat at cytoscape.org>/
/

/The retreat is supported by/:

Unilever (www.unilever.com <http://www.unilever.com/>)
Netherlands Bioinformatics Center (NBIC) ( www.nbic.nl
<http://www.nbic.nl/>)
Agilent ( www.agilent.com <http://www.agilent.com/>)

/About Cytoscape:/
Cytoscape (www.cytoscape.org <http://www.cytoscape.org/>) is an open
source bioinformatics software platform for */visualizing/* molecular
interaction networks and */integrating /*these interactions with gene
expression profiles and other state data. The software architecture
enables adaptation of Cytoscape functionality to the specific needs of
biologists and bioinformaticians. It is jointly developed by the groups
of Benno Schwikowski (Pasteur Institute, Paris), Trey Ideker (University
of California San Diego), Chris Sander (Memorial Sloan-Kettering Cancer
Center), Lee Hood (Institute of Systems Biology, Seattle), Annette Adler
(Agilent Technologies, Santa Clara, CA) and Bruce Conklin
(Gladstone/UCSF, GenMAPP).

PS: Excuse us for cross-posting; we're trying to avoid this as much as
possible. However, to reach a large audience we decided to include
several relevant maillists.


From me.lixue at gmail.com  Wed Jul 18 13:52:29 2007
From: me.lixue at gmail.com (Xue Li)
Date: Wed, 18 Jul 2007 12:52:29 -0500
Subject: [BiO BB] hat kinds of data mining techniques have been using
	indrug discovery and drug delivery
In-Reply-To: <BAY108-F121E94486FB70862057FEABEF90@phx.gbl>
References: <62ed16460707171133j7d2c79evcf8bb0b366d01776@mail.gmail.com>
	<BAY108-F121E94486FB70862057FEABEF90@phx.gbl>
Message-ID: <62ed16460707181052h32656144j635dabbde8269b64@mail.gmail.com>

Thank you, Mike, for your such long help :P

I am just a beginner in Data mining and bioinformatics, so I am sure about
the problems with cygwin and perl. I know that there is Bio++ where some C++
library for bioinformatics are be found.
http://162.38.181.25/BioPP/

Hope it will help.

Li


On 7/17/07, Mike Marchywka <marchywka at hotmail.com> wrote:
>
> I don't really have anything insightful to say directly regarding your
> question but I will point
> out that there are good case studies on pubmed- protease and polymerase
> targeting, or
> if you prefer kinases, should keep you busy for a while :) As far as
> delivery,
> lots of nano stuff, not sure on tools.
>
> I did want to mention that I have been amazed at the ( apparent) lack of
> some simple
> tools however. While it is quite possible I missed some, I have had to
> write
> a lot of scripts and c++ code to augment the Affymetrix annotations. Much
> of
> this is just the novel idea of using a computer to automate data
> processing
> rather
> than require a user to appreciate someone's nice web page for every
> protein
> he wants to investigate. However, then there are things like string
> correlators
> that execute in reasonable time, programmable ribosomes, format converters
> etc.
>
> I suppose I could have looked more carefully at conserved domain servers
> or
> bioperl packages to address various parts of the problem but so far I
> think
> I've done better with my own approach.  I'm not sure I have exploited all
> the online tools- I only really use blast and eutils to download batches
> of proteins or nucleotides - but I do know that perl, at least under
> cygwin,
> was simply too slow to do anything of any size. I ended up writing my own
> c++ string manipulation stuff ( even here under cygwin the STL string
> classes were just too slow and I created objects that manipulate c-style
> strings ). Even using grep+sed to convert to fasta files was quite slow
> ( although I think this turned out to be mostly a problem using
> cygwin to pipe results - the console buffering seems to be the problem).
> I don't want to sound negative on cygwin- you just can't do this with
> BAT files and it is hard to get reasonable performance on top of windoze-
> but I'm not sure if that is creating some of the limitations.
>
> Don't know if any of that helps but I am curious if anyone has similar or
> contrasting observations.
>
> Thanks.
>
>
>
> Mike Marchywka
> 586 Saint James Walk
> Marietta GA 30067-7165
> 404-788-1216 (C)<- leave message
> 989-348-4796 (P)<- emergency only
> marchywka at hotmail.com
>
>
>
>
>
> >From: "Xue Li" <me.lixue at gmail.com>
> >Reply-To: "General Forum at Bioinformatics.Org"
> ><bio_bulletin_board at bioinformatics.org>
> >To: bio_bulletin_board at bioinformatics.org
> >Subject: [BiO BB] hat kinds of data mining techniques have been using
> >indrug discovery and drug delivery
> >Date: Tue, 17 Jul 2007 13:33:59 -0500
> >
> >Hello all,
> >
> >I was wondering what kinds of data mining techniques have been using in
> >drug
> >discovery and drug delivery? It would be much appreciated if you could
> >offer
> >me some resources to find it out. Millions of thanks.
> >
> >As far as I know, classification techniques are used in protein-protein
> >interface prediction, and RNA-  , DNA-  interface prediction.
> >Are optimization techniques used? How about regression techniques?
> >
> >--
> >Li Xue
> >Bioinformatics and Computational Biology program @ ISU
> >515-520-1676
> >Ames, IA 50010
> >_______________________________________________
> >General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
> _________________________________________________________________
> http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


-- 
Li Xue
Bioinformatics and Computational Biology program @ ISU
515-520-1676
Ames, IA 50010


From veredcc at bgu.ac.il  Thu Jul 19 15:52:57 2007
From: veredcc at bgu.ac.il (Vered Caspi)
Date: Thu, 19 Jul 2007 19:52:57 GMT
Subject: [BiO BB] Genbank file conversion to GCG format
Message-ID: <f6d4ec6611529.469fc119@bgu.ac.il>

Hello,
I am looking for a free software (unix) for mass conversion of GenBank files to GCG format.
If any one has experience with that, I will be happy to learn.
          Vered

===
Vered Caspi, Ph.D.
Bioinformatics Support Unit, Head
National Institute for Biotechnology in the Negev,
Building 39, room 214
Ben-Gurion University of the Negev
Beer-Sheva 84105, Israel
Email: veredcc at bgu.ac.il
Tel: 08-6479034 054-7915969?


From pmr at ebi.ac.uk  Fri Jul 20 03:48:18 2007
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 20 Jul 2007 08:48:18 +0100
Subject: [BiO BB] Genbank file conversion to GCG format
In-Reply-To: <f6d4ec6611529.469fc119@bgu.ac.il>
References: <f6d4ec6611529.469fc119@bgu.ac.il>
Message-ID: <46A068C2.9010306@ebi.ac.uk>

Vered Caspi wrote:
> Hello,
> I am looking for a free software (unix) for mass conversion of GenBank files to GCG format.
> If any one has experience with that, I will be happy to learn.

Assuming you mean GCG sequence files (not a GCG sequence database):

EMBOSS can convert many sequence formats, including Genbank and GCG, 
using the program "seqret".

Just one warning ... you need the -ossingle option on the command line 
to write each sequence to a separate file (or process one Genbank 
sequence at a time if they are already in separate files). EMBOSS can 
read GCG files with more than one sequence but other applications may 
assume only one sequence per file.

regards,

Peter Rice


From Sterten at aol.com  Fri Jul 20 04:19:55 2007
From: Sterten at aol.com (Sterten at aol.com)
Date: Fri, 20 Jul 2007 04:19:55 EDT
Subject: [BiO BB] Genbank file conversion to GCG format
Message-ID: <d35.e885afc.33d1ca2b@aol.com>

was ist GCG ?


From pmr at ebi.ac.uk  Fri Jul 20 04:44:27 2007
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 20 Jul 2007 09:44:27 +0100
Subject: [BiO BB] Genbank file conversion to GCG format
In-Reply-To: <d35.e885afc.33d1ca2b@aol.com>
References: <d35.e885afc.33d1ca2b@aol.com>
Message-ID: <46A075EB.9040206@ebi.ac.uk>

Sterten at aol.com wrote:
> was ist GCG ?

"The Wisconsin Package" from Accelrys - originally academic software 
from the University of Wisconsin Genetics Computer Group, but a 
commercial package for many years.

Those of us who remember the old days still call it GCG ... partly 
because the package has changed name so many times. Its home page is now 
http://www.accelrys.com/products/gcg/ (so they do still use the old name 
too :-)

They invented their own file formats with a "checksum" line marked by 
".." at the end. Documentation is above this line, and only sequence below.

regards,

Peter Rice


From Sterten at aol.com  Fri Jul 20 04:53:26 2007
From: Sterten at aol.com (Sterten at aol.com)
Date: Fri, 20 Jul 2007 04:53:26 EDT
Subject: [BiO BB] Genbank file conversion to GCG format
Message-ID: <c42.15e5eb1e.33d1d206@aol.com>


no checksum needed these days, data storage is reliable.
 
Should be easy to write a short program to convert the  formats...


From skhadar at gmail.com  Fri Jul 20 08:40:35 2007
From: skhadar at gmail.com (Shameer Khadar)
Date: Fri, 20 Jul 2007 18:10:35 +0530
Subject: [BiO BB] How to rank active site(s) ?
Message-ID: <b6ff81950707200540x1ed64a9bt727a23602b68d248@mail.gmail.com>

Dear All,

Is there any computational tool/protocol available to rank active site(s)
present in a protein. I am looking in to a couple of proteins which have
multiple site (say active site / binding sites ), I am looking for a method
to rank binding/Active sites. I have done with an extended google search. I
couldnt find anything as such. I would appreciate if any one can point me to
any  similar work / paper / tool etc .

Many thanks in advance,
Shameer Khadar
NCBS-TIFR


From christoph.gille at charite.de  Fri Jul 20 09:02:18 2007
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Fri, 20 Jul 2007 15:02:18 +0200 (CEST)
Subject: [BiO BB] How to rank active site(s) ?
In-Reply-To: <b6ff81950707200540x1ed64a9bt727a23602b68d248@mail.gmail.com>
References: <b6ff81950707200540x1ed64a9bt727a23602b68d248@mail.gmail.com>
Message-ID: <59689.141.42.56.114.1184936538.squirrel@webmail.charite.de>

The active site atlas of the thornton group may be useful.
The collected data from publications and provide links to the
publications.
Christoph


From ykalidas at gmail.com  Fri Jul 20 09:28:12 2007
From: ykalidas at gmail.com (Kalidas Yeturu)
Date: Fri, 20 Jul 2007 18:58:12 +0530
Subject: [BiO BB] How to rank active site(s) ?
In-Reply-To: <b6ff81950707200540x1ed64a9bt727a23602b68d248@mail.gmail.com>
References: <b6ff81950707200540x1ed64a9bt727a23602b68d248@mail.gmail.com>
Message-ID: <5632703b0707200628q17d427d9lec99dcf00f0b0b1f@mail.gmail.com>

Hi Shameer,
 There are many ways of ranking binding sites..
 (1) Size
 (2) Cumulative electro static potential of probe positions in the site
 (3) Occurance of most possible binding site residues
 etc.,
 Please refer to :CASTp, Q-SiteFinder, LigsiteCSC algorithms..

With Regards
Kalidas. Y


On 7/20/07, Shameer Khadar <skhadar at gmail.com> wrote:
>
> Dear All,
>
> Is there any computational tool/protocol available to rank active site(s)
> present in a protein. I am looking in to a couple of proteins which have
> multiple site (say active site / binding sites ), I am looking for a
> method
> to rank binding/Active sites. I have done with an extended google search.
> I
> couldnt find anything as such. I would appreciate if any one can point me
> to
> any  similar work / paper / tool etc .
>
> Many thanks in advance,
> Shameer Khadar
> NCBS-TIFR
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


-- 
Kalidas Y
http://ssl.serc.iisc.ernet.in/~kalidas


From operon at cbiot.ufrgs.br  Thu Jul 19 17:45:14 2007
From: operon at cbiot.ufrgs.br (Marcos de Carvalho)
Date: Thu, 19 Jul 2007 18:45:14 -0300
Subject: [BiO BB] Genbank file conversion to GCG format
In-Reply-To: <f6d4ec6611529.469fc119@bgu.ac.il>
References: <f6d4ec6611529.469fc119@bgu.ac.il>
Message-ID: <op.tvqepodolmdxxi@wolfgang>


Hi Vered,

Maybe you would like to try Readseq:

http://www.ebi.ac.uk/cgi-bin/readseq.cgi
(web based)

http://bioinformatics.org/~thomas/mol_lin/readseq/
(linux elf binary)

http://iubio.bio.indiana.edu/soft/molbio/readseq/java/
(java version)


regards

Marcos


On Thu, 19 Jul 2007 16:52:57 -0300, Vered Caspi <veredcc at bgu.ac.il> wrote:

> Hello,
> I am looking for a free software (unix) for mass conversion of GenBank  
> files to GCG format.
> If any one has experience with that, I will be happy to learn.
>           Vered
>
> ===
> Vered Caspi, Ph.D.
> Bioinformatics Support Unit, Head
> National Institute for Biotechnology in the Negev,
> Building 39, room 214
> Ben-Gurion University of the Negev
> Beer-Sheva 84105, Israel
> Email: veredcc at bgu.ac.il
> Tel: 08-6479034 054-7915969?
> _______________________________________________
> General Forum at Bioinformatics.Org -  
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From orbitz at ezabel.com  Fri Jul 20 08:34:07 2007
From: orbitz at ezabel.com (orbitz at ezabel.com)
Date: Fri, 20 Jul 2007 08:34:07 -0400
Subject: [BiO BB] Genbank file conversion to GCG format
In-Reply-To: <c42.15e5eb1e.33d1d206@aol.com>
References: <c42.15e5eb1e.33d1d206@aol.com>
Message-ID: <EBCA74E5-18E5-4263-A2CA-8BA61F92AE4C@ezabel.com>

Famous last words!

I don't know about in bioinformatics, but in other areas checksums  
are needed these days, and very important, but it's outside the scope  
of the data you deal with, external.


On Jul 20, 2007, at 4:53 AM, Sterten at aol.com wrote:

>
> no checksum needed these days, data storage is reliable.
>
> Should be easy to write a short program to convert the  formats...
>
>
>
>
> _______________________________________________
> General Forum at Bioinformatics.Org -  
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From marchywka at hotmail.com  Fri Jul 20 11:29:20 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Fri, 20 Jul 2007 11:29:20 -0400
Subject: [BiO BB] Genbank file conversion to GCG format
In-Reply-To: <EBCA74E5-18E5-4263-A2CA-8BA61F92AE4C@ezabel.com>
Message-ID: <BAY108-F183A0138C2B6AD7A21B0D9BEF40@phx.gbl>

No matter how simple the code it is important to determine its working 
right.
It isn't the computer you are checking, its the human :)
Dumb things like implementation specific string size limits, buffer 
flushing,etc
can drop random stuff that would take forever to find.


>From: orbitz at ezabel.com
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>Subject: Re: [BiO BB] Genbank file conversion to GCG format
>Date: Fri, 20 Jul 2007 08:34:07 -0400
>
>Famous last words!
>
>I don't know about in bioinformatics, but in other areas checksums  are 
>needed these days, and very important, but it's outside the scope  of the 
>data you deal with, external.
>
>
>On Jul 20, 2007, at 4:53 AM, Sterten at aol.com wrote:
>
>>
>>no checksum needed these days, data storage is reliable.
>>
>>Should be easy to write a short program to convert the  formats...
>>
>>
>>
>>
>>_______________________________________________
>>General Forum at Bioinformatics.Org -  
>>BiO_Bulletin_Board at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
Local listings, incredible imagery, and driving directions - all in one 
place! http://maps.live.com/?wip=69&FORM=MGAC01


From pmr at ebi.ac.uk  Fri Jul 20 12:22:32 2007
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 20 Jul 2007 17:22:32 +0100
Subject: [BiO BB] Genbank file conversion to GCG format
In-Reply-To: <EBCA74E5-18E5-4263-A2CA-8BA61F92AE4C@ezabel.com>
References: <c42.15e5eb1e.33d1d206@aol.com>
	<EBCA74E5-18E5-4263-A2CA-8BA61F92AE4C@ezabel.com>
Message-ID: <46A0E148.9090709@ebi.ac.uk>

> On Jul 20, 2007, at 4:53 AM, Sterten at aol.com wrote:
> 
> no checksum needed these days, data storage is reliable.

Hah! It isn't reliable in this case. GCG added the checksum to catch 
users who edited their sequence files and deliberately (or accidentally 
while editing the annotation in the header) changed the sequence data :-)

If you edit a GCG file, you run the reformat program to calculate a new 
checksum line.

> Should be easy to write a short program to convert the  formats...

Generating the checksum is ... ummm .... interesting. It helps if you 
have a friend with access to the GCG "reformat" program who can tell you 
if you got it right. Some years ago there was a thread in one of the 
bionet newsgroups (ah, those were the days) when it took 4 attempts 
before someone could reliably agree with GCG's calculation (upper and 
lower case, numeric characters, spaces, and other IUPAC "standard" 
sequence characters such as "=" all have to be considered.

regards,

Peter


From pmr at ebi.ac.uk  Fri Jul 20 12:30:37 2007
From: pmr at ebi.ac.uk (Peter Rice)
Date: Fri, 20 Jul 2007 17:30:37 +0100
Subject: [BiO BB] Genbank file conversion to GCG format
In-Reply-To: <BAY108-F183A0138C2B6AD7A21B0D9BEF40@phx.gbl>
References: <BAY108-F183A0138C2B6AD7A21B0D9BEF40@phx.gbl>
Message-ID: <46A0E32D.2020201@ebi.ac.uk>

Mike Marchywka wrote:
> No matter how simple the code it is important to determine its working 
> right.
> It isn't the computer you are checking, its the human :)
> Dumb things like implementation specific string size limits, buffer 
> flushing,etc
> can drop random stuff that would take forever to find.

For those who have never seen one, here is a GCG format sequence as 
written by EMBOSS. It has enough essential details to be used by GCG.


!!AA_SEQUENCE 1.0

Example GCG sequence

This is all text up to the first line with two dots
so we cannot have a normal EMBL/GenBank feature table here
unless we first convert '.' to '_'

Checksum line coming up:

gcgseq  Length: 22  Type: P  Check:  731 ..

   1 acdefghikl mnpqrstvwx yz


From phoebe at deakin.edu.au  Sat Jul 21 02:11:51 2007
From: phoebe at deakin.edu.au (Phoebe Chen)
Date: Sat, 21 Jul 2007 16:11:51 +1000
Subject: [BiO BB] APBC2008 (Extension of Paper Submission Deadline)
Message-ID: <20070721161151.v7feugym8wsgcwgg@mail.deakin.edu.au>

Due to many requests, the submission deadline for APBC2008
has been extended to 30 July 2007, 23:59 EST.


Sincerely,
Conference Chair Tatsuya Akutsu
Program committee co-chairs Alvis Brazma and Satoru Miyano


***********************************************************************
CALL FOR PAPERS - APBC 2008
The Sixth Asia-Pacific Bioinformatics Conference, APBC2008,
will be held in Kyoto, Japan, during 14-17 January 2008.
See http://bic.kyoto-u.ac.jp/apbc2008/index.html
***********************************************************************
The Asia-Pacific Bioinformatics Conference series is an annual
forum for exploring research, development, and novel
applications bioinformatics. The aim of this conference is to
bring together researchers, professionals, and industrial practitioners
for interaction and exchange of knowledge and ideas.
We invite submissions that address conceptual and practical issues
of bioinformatics.


Topics of Interest
Papers are solicited on, but not limited to, the following topics:
- Sequence Analysis
- Motif Finding
- Recognition of Genes
- RNA Analysis
- Physical and Genetic Mapping
- Evolution and Phylogeny
- Protein Structure Analysis
- Microarray Design
- Transcriptome, Gene Expression
- Proteomics
- Pathways, Networks and Systems
- Ontologies
- Databases and Data Integration
- Text Mining
- Population Genetics/SNP/Haplotyping
- Comparative Genomics, Genome Rearrangements
- Applications


IMPORTANT DATES
Submission of papers July 30, 2007  (23:59 Eastern Standard Time
(GMT-4)) (** new deadline **)
Notification of paper acceptance 17 September 2007
Submission of posters 30 September 2007
Camera-ready copy & Author registration 30 September 2007
Notification of poster acceptance 20 October 2007
Conference 14-17 January 2008


From a.gopee at utm.intnet.mu  Mon Jul 23 06:46:41 2007
From: a.gopee at utm.intnet.mu (Ajit)
Date: Mon, 23 Jul 2007 14:46:41 +0400
Subject: [BiO BB] Protein Datatypes for function prediction
Message-ID: <003a01c7cd16$c0fbf1e0$083c10ac@Ajit>


Subject: Protein function prediction based using different datatypes


Hello
Can anyone tell me what are the currently available data types for protein function prediction and their associated tools/web site links?

I already have a small list but I want to make sure I cover all of them...aznd I want to get some preliminaries on each of these...

(1) Amino acid sequences 
(2) Protein structure 

(3) Genome sequences 

(4) Phylogenetic data 

(5) Microarray expression data 

(6) Protein interaction networks and protein complexes 

(7) Biomedical literature 

(8) Gene Ontology


Please do send some details on actually on how each of the above can be used in protein function prediction... 


Thanks a lot 


Rgds


Ajit


From idoerg at gmail.com  Mon Jul 23 09:06:44 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 23 Jul 2007 15:06:44 +0200
Subject: [BiO BB] Protein Datatypes for function prediction
In-Reply-To: <003a01c7cd16$c0fbf1e0$083c10ac@Ajit>
References: <003a01c7cd16$c0fbf1e0$083c10ac@Ajit>
Message-ID: <b5bbbc970707230606u23bce63jf871da6e00f78886@mail.gmail.com>

Shameless plug: read my review:


*Friedberg I.* *Automated Function Prediction: the Genomic Challenge*
*Briefings
in Bioinformatics* (2006) *7*(3):225-242

There is a "tunnel through" link on my page to BiB:

http://iddo-friedberg.org/papers.html

The second link after the citation takes you to the paper, in case your
institute does not subscribe.

best,

Iddo

On 7/23/07, Ajit <a.gopee at utm.intnet.mu> wrote:
>
>
>
> Subject: Protein function prediction based using different datatypes
>
>
> Hello
> Can anyone tell me what are the currently available data types for protein
> function prediction and their associated tools/web site links?
>
> I already have a small list but I want to make sure I cover all of
> them...aznd I want to get some preliminaries on each of these...
>
> (1) Amino acid sequences
> (2) Protein structure
>
> (3) Genome sequences
>
> (4) Phylogenetic data
>
> (5) Microarray expression data
>
> (6) Protein interaction networks and protein complexes
>
> (7) Biomedical literature
>
> (8) Gene Ontology
>
>
>
> Please do send some details on actually on how each of the above can be
> used in protein function prediction...
>
>
>
> Thanks a lot
>
>
>
> Rgds
>
>
>
> Ajit
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


-- 

I. Friedberg

"The only problem with troubleshooting is that
sometimes trouble shoots back."


From marchywka at hotmail.com  Tue Jul 24 07:50:50 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 24 Jul 2007 07:50:50 -0400
Subject: [BiO BB] Protein Datatypes for function prediction
In-Reply-To: <b5bbbc970707230606u23bce63jf871da6e00f78886@mail.gmail.com>
Message-ID: <BAY108-F167829B9F5C993B3E055D8BEF00@phx.gbl>


Also, while I was going to suggest "ab initio QM" as computing power is 
increasing
rapidly, don't dismiss even simple text processing. That is, I was trying to 
find some
simple ways to screen some AFFX probe sets  I thought I would try this:

If you wanted to ask, "what is a tyrosine phosphatase?"
First, you could go download all the sequences that mention the term:
( a few thousand, enough to cluster perhaps)
518  eutilsnew -protein -out ptp -v "protein tyrosine phosphatase"
Extract fasta files,
519  /cygdrive/c/mydocs/scripts/cc/affx/file_parsing -fastas ptp ptp_fastas
Use your favorite word finder,
524  /cygdrive/c/mydocs/scripts/cc/affx/string_correlator -motif ptp_fastas 
10 >some_ptp_words
And sort them for the most common
532  cat some_ptp_words | awk '{print $3}' | sort | uniq -c | sort -g -r | 
more

2374 VCLGNICRSP
1678 FVCLGNICRSP
1182 LFVCLGNICRSP
  975 GNICRSPMAE
  904 GNICRSPTAE
  698 VLFVCLGNICRSP
  [...]

I hadn't been able to check this until I got my blast script to find the cdd 
database
but it does seem to predict known stuff:

blastnew -out ptp_cdd -hits 50 -cdd -expect 1000 VCLGNICRSP

$ cat ptp_cdd | more
BLASTP 2.2.16 [Mar-25-2007]


                                                                 Score    E
Sequences producing significant alignments:                      (bits) 
Value
gnl|CDD|65262 pfam01451, LMWPc, Low molecular weight phosphotyro...    25   
1.1
gnl|CDD|30743 COG0394, Wzb, Protein-tyrosine-phosphatase [Signal...    25   
1.1
gnl|CDD|29014 cd00115, LMWPc, Low molecular weight phosphatase f...    25   
1.1
gnl|CDD|47555 smart00226, LMWPc, Low molecular weight phosphatas...    24   
1.9
gnl|CDD|68479 pfam04906, Tweety, Tweety. The tweety (tty) gene h...    18    
  105
gnl|CDD|70330 pfam06856, DUF1251, Protein of unknown function (D...    18    
  138
gnl|CDD|71435 pfam07999, RHSP, Retrotransposon hot spot protein....    17    
  306
gnl|CDD|70701 pfam07245, Phlebovirus_G2, Phlebovirus glycoprotei...    17    
  306
gnl|CDD|72414 pfam08996, zf-DNA_Pol, DNA Polymerase alpha zinc f...    17    
  400


It wouldn't be difficult to build up vocabulary lists that distringuish 
different types
of proteins and handle variants the same way you handle plurals, 
capitalization,
etc in language processing. This wasn't my immediate interest but it
is something to consider if you need a quick-and-easy approach.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com


>From: "Iddo Friedberg" <idoerg at gmail.com>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>Subject: Re: [BiO BB] Protein Datatypes for function prediction
>Date: Mon, 23 Jul 2007 15:06:44 +0200
>
>Shameless plug: read my review:

_________________________________________________________________
http://liveearth.msn.com


From gtzanis at csd.auth.gr  Sat Jul 21 16:38:42 2007
From: gtzanis at csd.auth.gr (George Tzanis)
Date: Sat, 21 Jul 2007 23:38:42 +0300
Subject: [BiO BB] RefSeq mRNA sequences of barley
Message-ID: <002b01c7cbd7$22e74ee0$4001a8c0@taurus>

Dear All,
 
I would like to get all the non-redundant cDNA sequences of barley.
For this reason I'm thinking about retrieving all RefSeq mRNAs of 
barley. I have made a search to the Core Nucleotide database
(http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore) using the 
following query:
 
"barley[organism] AND biomol_mRNA[properties] AND srcdb_refseq[PROP]"
 
However, not any sequences were returned. I have also used the Entrez 
Limits settings, but the result was the same. 
 
Is there something wrong in my search?
Are there any RefSeq mRNAs for barley?
Is there another way to get a non-redundant set of cDNA (or mRNA) 
sequences of barley? I will appreciate any idea.
 
Thank you in advance,
 
George Tzanis
Department of Informatics
Aristotle University of Thessaloniki
54124 Thessaloniki
Greece


From ethan.strauss at promega.com  Tue Jul 24 14:15:14 2007
From: ethan.strauss at promega.com (Ethan Strauss)
Date: Tue, 24 Jul 2007 13:15:14 -0500
Subject: [BiO BB] RefSeq mRNA sequences of barley
In-Reply-To: <002b01c7cbd7$22e74ee0$4001a8c0@taurus>
References: <002b01c7cbd7$22e74ee0$4001a8c0@taurus>
Message-ID: <D8D8119118899D4A8EB5AD9BD24C1932034DD209@MADMSG003.promega.com>

Hi,
	I think that there are two problems. The first is that there do
not seem to be any barley sequences in refseq (with the exception of the
chloroplast genome NC_008590), the second may be that you need to use
the scientific name for the organism ("Hordeum vulgare"). I have had
variable luck using common names as [organism]. 
	I don't have any clue why there are no barley records in
RefSeq...
Hope this helps a little. 
Ethan
Ethan Strauss Ph.D.
Bioinformatics Scientist
Promega Corporation
2800 Woods Hollow Rd.
Madison, WI 53711
608-274-4330
800-356-9526
ethan.strauss at promega.com

-----Original Message-----
From:
bio_bulletin_board-bounces+ethan.strauss=promega.com at bioinformatics.org
[mailto:bio_bulletin_board-bounces+ethan.strauss=promega.com at bioinformat
ics.org] On Behalf Of George Tzanis
Sent: Saturday, July 21, 2007 3:39 PM
To: 'General Forum at Bioinformatics.Org'
Subject: [BiO BB] RefSeq mRNA sequences of barley

Dear All,
 
I would like to get all the non-redundant cDNA sequences of barley.
For this reason I'm thinking about retrieving all RefSeq mRNAs of
barley. I have made a search to the Core Nucleotide database
(http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore) using the
following query:
 
"barley[organism] AND biomol_mRNA[properties] AND srcdb_refseq[PROP]"
 
However, not any sequences were returned. I have also used the Entrez
Limits settings, but the result was the same. 
 
Is there something wrong in my search?
Are there any RefSeq mRNAs for barley?
Is there another way to get a non-redundant set of cDNA (or mRNA)
sequences of barley? I will appreciate any idea.
 
Thank you in advance,
 
George Tzanis
Department of Informatics
Aristotle University of Thessaloniki
54124 Thessaloniki
Greece
_______________________________________________
General Forum at Bioinformatics.Org -
BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From marchywka at hotmail.com  Tue Jul 24 14:19:06 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 24 Jul 2007 14:19:06 -0400
Subject: [BiO BB] RefSeq mRNA sequences of barley
Message-ID: <BAY108-F18BE33D58858F9152FB5FABEF00@phx.gbl>

What have you got against downloading the whole genome? I didn't see
it at ensembl but, as you probably know, ag plant genomics is
a big deal economically. The USDA and ARS probably have more stuff like
this ( first thing I found on google).

http://harvest.ucr.edu/

Offhand, it looks like you have mutually exclusive [prop] criteria as prop 
and properties
are synonyms ( but usually the query dump tells you howmany hits on each 
term ):
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpentrez.table.EntrezHelp.T7


Usually eutils provides parse info that you can sort out when query results 
don't
make sense but I've only used this from pubmed and the response from
nucleotide was unhelpful- you can probably get a dump from the web page
but I've never used it.

$ eutilsnew -v  -out asdf -dump "barley AND smith[AU] "
<?xml version="1.0"?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" 
"htt
p://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">
<eSearchResult>
        <Count>98</Count>
        <RetMax>1</RetMax>
        <RetStart>0</RetStart>
        <QueryKey>1</QueryKey>
        
<WebEnv>09zlDsvf9Apq_WlB1MrLvNbMDVmdMFyE-0fh8eDgZhtQlZKxV5kTGmzUJpJ at 1FBF
437C6A641A70_0045SID</WebEnv>
        <IdList>
                <Id>17609926</Id>
        </IdList>
        <TranslationSet>
                <Translation>
                        <From>barley</From>
                        <To>((&quot;hordeum&quot;[TIAB] NOT Medline[SB]) OR 
&quo
t;hordeum&quot;[MeSH Terms] OR barley[Text Word])</To>
                </Translation>
        </TranslationSet>
        <TranslationStack>
                <TermSet>
                        <Term>&quot;hordeum&quot;[TIAB]</Term>
                        <Field>TIAB</Field>
                        <Count>2045</Count>
                        <Explode>Y</Explode>
                </TermSet>
                <TermSet>


The NCBI web interface is usually quite good but I ran into a similar 
problem
with blast databases. I finally found their list for blasting,

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html

and added to my script,

$ blastnew -help db | grep -i barley
   genomes  barley          barley

Before I found this, I complained that their example db names weren't right
and I even had to reverse engineer a piece of their html. Try googling 
confined
to their site, site:nih.gov


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com


>From: "George Tzanis" <gtzanis at csd.auth.gr>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: "'General Forum at Bioinformatics.Org'" 
><bio_bulletin_board at bioinformatics.org>
>Subject: [BiO BB] RefSeq mRNA sequences of barley
>Date: Sat, 21 Jul 2007 23:38:42 +0300
>
>Dear All,
>
>I would like to get all the non-redundant cDNA sequences of barley.
>For this reason I'm thinking about retrieving all RefSeq mRNAs of
>barley. I have made a search to the Core Nucleotide database
>(http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore) using the
>following query:
>
>"barley[organism] AND biomol_mRNA[properties] AND srcdb_refseq[PROP]"
>
>However, not any sequences were returned. I have also used the Entrez
>Limits settings, but the result was the same.
>
>Is there something wrong in my search?
>Are there any RefSeq mRNAs for barley?
>Is there another way to get a non-redundant set of cDNA (or mRNA)
>sequences of barley? I will appreciate any idea.
>
>Thank you in advance,
>
>George Tzanis
>Department of Informatics
>Aristotle University of Thessaloniki
>54124 Thessaloniki
>Greece
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
http://newlivehotmail.com


From marchywka at hotmail.com  Wed Jul 25 15:58:36 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Wed, 25 Jul 2007 15:58:36 -0400
Subject: [BiO BB] question on RNA and species signatures
Message-ID: <BAY108-F20FC0ED3E02E67CC0CE325BEF10@phx.gbl>

I've been generally trying to find a comprehensive way to analyze non-coding 
RNA
with no luck. I've tried asking people in such areas as siRNA, riboswitch 
etc with out much
success. Any comments or discussion?

This came up most recently because I found a short sequence with unusual
species distribution and I was curious to know if this thing has a name.

If I just type in some random junk, I get about what you could expect:
( this is my own blast script with most terms being self explanatory, 
"-summ"
translates into "-v" to limit summary lines, -db selects the wgs database )
  567  blastnew -out control -nuc -hits 0 -summ 3000 -db wgs -expect 1e8 
TCCTGGAGTCCCAGAGTTCAGCTAAACCGATCACATTGTAT

$ more control| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk 
'{print $1" " $2}' | sort | uniq -c | sort -g -r | more

304 Homo sapiens
261 Bos taurus
212 Pan troglodytes
171 Microcebus murinus
155 Equus caballus
136 Spermophilus tridecemlineatus
130 Canis familiaris
125 Otolemur garnettii
112 Ornithorhynchus anatinus
111 Tupaia belangeri
96 Myotis lucifugus
93 Mus musculus
86 Felis catus
75 Rattus norvegicus
74 Oryzias latipes
71 Drosophila erecta
68 Sorex araneus
63 Loxodonta africana
58 Anolis carolinensis
56 Monodelphis domestica
48 Macaca mulatta
47 Gallus gallus
34 Oryctolagus cuniculus
30 Erinaceus europaeus
27 Strongylocentrotus purpuratus
27 Callorhinchus milii
26 Dasypus novemcinctus
22 Echinops telfairi
22 Cavia porcellus
17 Danio rerio
14 Schmidtea mediterranea
13 Ochotona princeps
13 Aplysia californica
10 Anopheles gambiae

This on the other hand, has much better matches ( note expect limit )
  573  blastnew -out dog_sign -nuc -hits 0 -summ 3000 -db wgs -expect .01 
TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT
and it is confined to dogs:
$ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk 
'{print $1" " $2}' | sort | uniq -c | sort -g -r | more
   3000 Canis familiaris

And these all seem to be in different places ( most frequent location occurs 
once):

$ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' |awk 
'{print $3}'| sort | uniq -c | sort -g -r | more
      1 ctg19866851899833,
      1 ctg19866851899815,
      1 ctg19866851899794,


Anyone care to comment on significance of this sequence, or reason it is 
just an uninteresting
fluke?

Thanks.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com

_________________________________________________________________
http://liveearth.msn.com


From austin.tanney at almacgroup.com  Thu Jul 26 08:51:42 2007
From: austin.tanney at almacgroup.com (Tanney, Austin)
Date: Thu, 26 Jul 2007 13:51:42 +0100
Subject: [BiO BB] question on RNA and species signatures
Message-ID: <EBEA7CF8A45DD84797EADA3120927993049D5A@ni-cr-svc-ex1.pharms-services.com>

Hi Mike,

Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/)
miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome browser (http://www.ensembl.org/index.html)

Thanks

Austin


-----Original Message-----
From:
bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinformatics.o
rg
[mailto:bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinfor
matics.org]On Behalf Of Mike Marchywka
Sent: 25 July 2007 20:59
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] question on RNA and species signatures


I've been generally trying to find a comprehensive way to analyze non-coding 
RNA
with no luck. I've tried asking people in such areas as siRNA, riboswitch 
etc with out much
success. Any comments or discussion?

This came up most recently because I found a short sequence with unusual
species distribution and I was curious to know if this thing has a name.

If I just type in some random junk, I get about what you could expect:
( this is my own blast script with most terms being self explanatory, 
"-summ"
translates into "-v" to limit summary lines, -db selects the wgs database )
  567  blastnew -out control -nuc -hits 0 -summ 3000 -db wgs -expect 1e8 
TCCTGGAGTCCCAGAGTTCAGCTAAACCGATCACATTGTAT

$ more control| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk 
'{print $1" " $2}' | sort | uniq -c | sort -g -r | more

304 Homo sapiens
261 Bos taurus
212 Pan troglodytes
171 Microcebus murinus
155 Equus caballus
136 Spermophilus tridecemlineatus
130 Canis familiaris
125 Otolemur garnettii
112 Ornithorhynchus anatinus
111 Tupaia belangeri
96 Myotis lucifugus
93 Mus musculus
86 Felis catus
75 Rattus norvegicus
74 Oryzias latipes
71 Drosophila erecta
68 Sorex araneus
63 Loxodonta africana
58 Anolis carolinensis
56 Monodelphis domestica
48 Macaca mulatta
47 Gallus gallus
34 Oryctolagus cuniculus
30 Erinaceus europaeus
27 Strongylocentrotus purpuratus
27 Callorhinchus milii
26 Dasypus novemcinctus
22 Echinops telfairi
22 Cavia porcellus
17 Danio rerio
14 Schmidtea mediterranea
13 Ochotona princeps
13 Aplysia californica
10 Anopheles gambiae

This on the other hand, has much better matches ( note expect limit )
  573  blastnew -out dog_sign -nuc -hits 0 -summ 3000 -db wgs -expect .01 
TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT
and it is confined to dogs:
$ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' | awk 
'{print $1" " $2}' | sort | uniq -c | sort -g -r | more
   3000 Canis familiaris

And these all seem to be in different places ( most frequent location occurs 
once):

$ more dog_sign| sed -n '/producing signif/,/^>/p'| sed -n 's/.*|//p' |awk 
'{print $3}'| sort | uniq -c | sort -g -r | more
      1 ctg19866851899833,
      1 ctg19866851899815,
      1 ctg19866851899794,


Anyone care to comment on significance of this sequence, or reason it is 
just an uninteresting
fluke?

Thanks.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com

_________________________________________________________________
http://liveearth.msn.com

_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

Proprietary or confidential information belonging to Almac Group Limited or to one of its affiliated companies may be contained in this message. The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. 

Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. 

If you have received the e-mail in error please notify helpdesk at almacgroup.com  and delete the e-mail from your system.

E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient.

Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free.


From marchywka at hotmail.com  Thu Jul 26 10:33:03 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Thu, 26 Jul 2007 10:33:03 -0400
Subject: [BiO BB] question on RNA and species signatures
In-Reply-To: <EBEA7CF8A45DD84797EADA3120927993049D5A@ni-cr-svc-ex1.pharms-services.com>
Message-ID: <BAY108-F1517285ADD97AF2E597DB5BEF20@phx.gbl>


Thanks. Nothing on the one site but ensembl has some ideas, not sure
how to interpret yet. fwiw, ncbi does have several repeats databases

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html

and I tried against a few of these but no low-e hits.
At higher e, there were a few suggestions in human repeats:

$ blastnew -out non_dog -nuc -hits 10 -summ 3000 -db humrep -expect 100 
TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT

>MER31-internal#LTR/MER4-group
          Length = 4936

Score = 26.3 bits (13), Expect = 0.22
Identities = 13/13 (100%)
Strand = Plus / Minus


Query: 12   caggatccagtcc 24
            |||||||||||||
Sbjct: 4369 caggatccagtcc 4357

I also ran against
some other wgs's and there are some lower e hits to cat but still
seems to be largely dog specific( matches 38/39 IIRC).

Thanks


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com


>From: "Tanney, Austin" <austin.tanney at almacgroup.com>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>Subject: RE: [BiO BB] question on RNA and species signatures
>Date: Thu, 26 Jul 2007 13:51:42 +0100
>
>Hi Mike,
>
>Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/)
>miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome 
>browser (http://www.ensembl.org/index.html)
>
>Thanks
>
>Austin
>
>

_________________________________________________________________
http://liveearth.msn.com


From austin.tanney at almacgroup.com  Thu Jul 26 10:59:05 2007
From: austin.tanney at almacgroup.com (Tanney, Austin)
Date: Thu, 26 Jul 2007 15:59:05 +0100
Subject: [BiO BB] question on RNA and species signatures
Message-ID: <EBEA7CF8A45DD84797EADA31209279935DAB42@ni-cr-svc-ex1.pharms-services.com>

Mike,

For short BLASTs, the e-vlaue is generally not that usefull. 
Often the recommended expect for short BLASTs is 1000.
In this case its % identiy and coverage you should look for.. Realistically 100% coverage should be what you expect.

-----Original Message-----
From:
bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinformatics.o
rg
[mailto:bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinfor
matics.org]On Behalf Of Mike Marchywka
Sent: 26 July 2007 15:33
To: bio_bulletin_board at bioinformatics.org
Subject: RE: [BiO BB] question on RNA and species signatures


Thanks. Nothing on the one site but ensembl has some ideas, not sure
how to interpret yet. fwiw, ncbi does have several repeats databases

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html

and I tried against a few of these but no low-e hits.
At higher e, there were a few suggestions in human repeats:

$ blastnew -out non_dog -nuc -hits 10 -summ 3000 -db humrep -expect 100 
TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT

>MER31-internal#LTR/MER4-group
          Length = 4936

Score = 26.3 bits (13), Expect = 0.22
Identities = 13/13 (100%)
Strand = Plus / Minus


Query: 12   caggatccagtcc 24
            |||||||||||||
Sbjct: 4369 caggatccagtcc 4357

I also ran against
some other wgs's and there are some lower e hits to cat but still
seems to be largely dog specific( matches 38/39 IIRC).

Thanks


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com


>From: "Tanney, Austin" <austin.tanney at almacgroup.com>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>Subject: RE: [BiO BB] question on RNA and species signatures
>Date: Thu, 26 Jul 2007 13:51:42 +0100
>
>Hi Mike,
>
>Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/)
>miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome 
>browser (http://www.ensembl.org/index.html)
>
>Thanks
>
>Austin
>
>

_________________________________________________________________
http://liveearth.msn.com

_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

Proprietary or confidential information belonging to Almac Group Limited or to one of its affiliated companies may be contained in this message. The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. 

Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. 

If you have received the e-mail in error please notify helpdesk at almacgroup.com  and delete the e-mail from your system.

E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient.

Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free.


From marchywka at hotmail.com  Thu Jul 26 11:23:32 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Thu, 26 Jul 2007 11:23:32 -0400
Subject: [BiO BB] question on RNA and species signatures
In-Reply-To: <EBEA7CF8A45DD84797EADA31209279935DAB42@ni-cr-svc-ex1.pharms-services.com>
Message-ID: <BAY108-F40CD7EF09CCF7BB3872ACCBEF20@phx.gbl>

That's why I posted the alignment :)
For a length ca. 40 bases it isn't too bad but, sure for
some really short  things it has been a problem. Normally I just
collect a bunch of hits and sort the alignments with
text processing, grep "[A-Z]\{10\}" , once I have
some idea what the background looks like.

Since I'm guessing here, good matches to shorter sequences could help 
isolate
important parts of the longer sequence.


>From: "Tanney, Austin" <austin.tanney at almacgroup.com>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>Subject: RE: [BiO BB] question on RNA and species signatures
>Date: Thu, 26 Jul 2007 15:59:05 +0100
>
>Mike,
>
>For short BLASTs, the e-vlaue is generally not that usefull.
>Often the recommended expect for short BLASTs is 1000.
>In this case its % identiy and coverage you should look for.. Realistically 
>100% coverage should be what you expect.
>
>-----Original Message-----
>From:
>bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinformatics.o
>rg
>[mailto:bio_bulletin_board-bounces+austin.tanney=almacgroup.com at bioinfor
>matics.org]On Behalf Of Mike Marchywka
>Sent: 26 July 2007 15:33
>To: bio_bulletin_board at bioinformatics.org
>Subject: RE: [BiO BB] question on RNA and species signatures
>
>
>
>
>Thanks. Nothing on the one site but ensembl has some ideas, not sure
>how to interpret yet. fwiw, ncbi does have several repeats databases
>
>http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html
>
>and I tried against a few of these but no low-e hits.
>At higher e, there were a few suggestions in human repeats:
>
>$ blastnew -out non_dog -nuc -hits 10 -summ 3000 -db humrep -expect 100
>TCCTGGAGTCCCAGGATCCAGTCCCACGTCGGGCTCCCT
>
> >MER31-internal#LTR/MER4-group
>           Length = 4936
>
>Score = 26.3 bits (13), Expect = 0.22
>Identities = 13/13 (100%)
>Strand = Plus / Minus
>
>
>Query: 12   caggatccagtcc 24
>             |||||||||||||
>Sbjct: 4369 caggatccagtcc 4357
>
>I also ran against
>some other wgs's and there are some lower e hits to cat but still
>seems to be largely dog specific( matches 38/39 IIRC).
>
>Thanks
>
>
>Mike Marchywka
>586 Saint James Walk
>Marietta GA 30067-7165
>404-788-1216 (C)<- leave message
>989-348-4796 (P)<- emergency only
>marchywka at hotmail.com
>
>
>
>
>
> >From: "Tanney, Austin" <austin.tanney at almacgroup.com>
> >Reply-To: "General Forum at Bioinformatics.Org"
> ><bio_bulletin_board at bioinformatics.org>
> >To: "General Forum at Bioinformatics.Org"
> ><bio_bulletin_board at bioinformatics.org>
> >Subject: RE: [BiO BB] question on RNA and species signatures
> >Date: Thu, 26 Jul 2007 13:51:42 +0100
> >
> >Hi Mike,
> >
> >Have you tried looking at Rfam (http://www.sanger.ac.uk/Software/Rfam/)
> >miRBase (http://microrna.sanger.ac.uk/sequences/) or the ensembl genome
> >browser (http://www.ensembl.org/index.html)
> >
> >Thanks
> >
> >Austin
> >
> >
>
>_________________________________________________________________
>http://liveearth.msn.com
>
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>Proprietary or confidential information belonging to Almac Group Limited or 
>to one of its affiliated companies may be contained in this message. The 
>e-mail and any files transmitted with it are confidential and privileged 
>and intended solely for the use of the individual or entity to whom they 
>are addressed.
>
>Any unauthorised direct or indirect dissemination, distribution or copying 
>of this message and any attachments is strictly prohibited.
>
>If you have received the e-mail in error please notify 
>helpdesk at almacgroup.com  and delete the e-mail from your system.
>
>E-mail and other communications sent to this company may be reviewed or 
>read by persons other than the intended recipient.
>
>Viruses : although we have taken steps to ensure that this e-mail and any 
>attachments are free from any virus, you should, in keeping with good 
>practice, ensure that they are actually virus free.
>
>
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07


From wongls at comp.nus.edu.sg  Thu Jul 26 05:36:08 2007
From: wongls at comp.nus.edu.sg (Limsoon Wong)
Date: Thu, 26 Jul 2007 17:36:08 +0800
Subject: [BiO BB] FW: Call for Papers RECOMB 2008 (Singapore) - Genome
	Research parallel submission option
Message-ID: <003201c7cf68$6596f990$b9b81aac@comp.nus.edu.sg>


              Call for Papers for the 
       12th Annual International Conference on 
      Research in Computational Molecular Biology

                   RECOMB 2008

       Singapore, 30th March - 2nd April 2008
            University Cultural Center
          National University of Singapore

NEW:  Parallel submission option to Genome Research special RECOMB issue

       http://www.comp.nus.edu.sg/~recomb08/

Papers due 24th September 2007

Hosted by National University of Singapore, RECOMB 2008 
will provide a general forum for disseminating the latest 
research in bioinformatics and computational biology. The 
multidisciplinary conference brings together academic and 
industrial scientists from molecular biology, genetics, 
medicine, computer science, mathematics, and statistics.

Papers reporting on original research (both theoretical and 
experimental) in all areas of computational molecular biology 
are sought.

NEW   NEW   NEW   NEW   NEW   NEW   NEW   NEW   NEW   NEW

!!! Recomb Conference Series and Genome Research journal team up !!!

Possibility of publication of RECOMB papers in Genome Research special
issue.
See web site for details:  http://www.comp.nus.edu.sg/~recomb08/

NEW   NEW   NEW   NEW   NEW   NEW   NEW   NEW   NEW   NEW

Important Deadlines:
September 24, 2007 - Deadline for paper submission
December 10, 2007 - Notification of paper acceptance


From marchywka at hotmail.com  Sat Jul 28 11:08:04 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Sat, 28 Jul 2007 11:08:04 -0400
Subject: [BiO BB] Protein Datatypes for function prediction
In-Reply-To: <BAY108-F167829B9F5C993B3E055D8BEF00@phx.gbl>
Message-ID: <BAY108-F23F895783710C5AEDFA68BBEEC0@phx.gbl>


>rapidly, don't dismiss even simple text processing.
[...]
>( a few thousand, enough to cluster perhaps)


I actually tried this with osteoglycins. If you download them, there aren't 
that many,
pickout repeated "words", and cluster by presence of absence of the most
popular words, it turns out to do a decent automated job of separating by 
species.
These are the vectors ( presence/absence of the words) along with members
having that vector ( names could be ambiguous ,for illustration only). I was 
hoping it would
separate by type but that is a problem using most common words to 
discriminate.
The zero vector amounts to a "miscellaneous" cluster.

$ for f in `cat osteo_groups | awk '{print $2}' ` ; do echo $f; g=`grep $f 
osteo_vectors|awk '{print $1}'| sed -e 's/>//'`; echo $g; h=`echo $g|sed -e 
's/\..*//g' |sed -e 's/  */\\\|/g'`; grep -A 2 "$h" osteo_rdict| grep 
"DEFINITION"| sed -e 's/DEFINITION//' ; done |unix2dos >/dev/clipboard

1111111111111111111111111011111111110001
CAI16694 AAH95443 AAH37273 NP_148935 NP_054776 ABM85338 ABM82153 EAW62820 
EAW62819 EAW62818 P20774 CAB53706
  osteoglycin [Homo sapiens].
  Osteoglycin [Homo sapiens].
  Osteoglycin [Homo sapiens].
  osteoglycin preproprotein isoform 2 [Homo sapiens].
  osteoglycin preproprotein isoform 2 [Homo sapiens].
  osteoglycin (osteoinductive factor, mimecan) [synthetic construct].
  osteoglycin (osteoinductive factor, mimecan) [synthetic construct].
  osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo
  osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo
  osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo
  Mimecan precursor (Osteoglycin) (Osteoinductive factor) (OIF).
  hypothetical protein [Homo sapiens].
1111111011111000100001011101001000001110
NP_032786 EDL41086 AAH21939 BAA06721 Q62000 BAE35995 BAC35462
  osteoglycin [Mus musculus].
  osteoglycin [Mus musculus].
  Osteoglycin [Mus musculus].
  osteoglycin precursor [Mus musculus].
  Mimecan precursor (Osteoglycin).
  unnamed protein product [Mus musculus].
  unnamed protein product [Mus musculus].
0000000000000000000000000000000000000000
CAK03681 NP_002336 O42235 NP_032464 NP_989507 NP_033885
  novel protein similar to vertebrate osteoglycin (osteoinductive
  lumican precursor [Homo sapiens].
  Keratocan precursor (KTN) (Keratan sulfate proteoglycan keratocan).
  keratocan [Mus musculus].
  keratocan [Gallus gallus].
  bone morphogenetic protein 1 [Mus musculus].
1101111111100000001001011100001000001110
EDL98110 XP_001054654 XP_001054599 XP_001054725 XP_214441
  osteoglycin (predicted) [Rattus norvegicus].
  PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 2
  PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 1
  PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 3
  PREDICTED: similar to Mimecan precursor (Osteoglycin) [Rattus
0000001000100110000100000000000000000000
NP_989540 AAD21085 Q9W6H0 Q9DE65
  osteoglycin [Gallus gallus].
  osteoglycin [Gallus gallus].
  Mimecan precursor (Osteoglycin).
  Mimecan precursor (Osteoglycin).
1111111111111111111111111011011111110001
AAP97142 Q5RBL2 CAH90848
  osteoglycin OG [Homo sapiens].
  Mimecan precursor (Osteoglycin).
  hypothetical protein [Pongo pygmaeus].
1110101111110110010001011001011111111110
NP_001075585 AAM46865 Q8MJF1
  osteoglycin [Oryctolagus cuniculus].
  osteoglycin [Oryctolagus cuniculus].
  Mimecan precursor (Osteoglycin).
1110111111100011110110111111011001100010
ABQ13007 P19879
  osteoglycin preproprotein [Bos taurus].
  Mimecan precursor (Osteoglycin) [Contains: Corneal keratan sulfate
1110011111100011110110111111001001100010
NP_776371 AAB70264
  osteoglycin [Bos taurus].
  mimecan [Bos taurus].
1111111111111111111111011011011111110000
NP_077727
  osteoglycin preproprotein isoform 1 [Homo sapiens].
1111011111101111111110111011001111100001
XP_001103337
  PREDICTED: osteoglycin isoform 2 [Macaca mulatta].
1111011111101111111110011011001111100000
XP_001103195
  PREDICTED: osteoglycin isoform 1 [Macaca mulatta].
1110111111110011010001111110011001010000
ABL96619
  osteoglycin [Capra hircus].
1101011111110010010111011100000001110110
XP_853340
  PREDICTED: similar to Mimecan precursor (Osteoglycin)
1100000111011110110111000011000101110000
CAB61417
  hypothetical protein [Homo sapiens].
1011111000100111011000011000011110000000
CAI16695
  osteoglycin [Homo sapiens].
1011111000100111001000111000011010000001
AAX25979
  SJCHGC07866 protein [Schistosoma japonicum].
0000001000100000000000000000000000000000
NP_001080164
  osteoglycin [Xenopus laevis].
0000000110000000000000000000000000000000
CAJ57655
  osteoglycin [Sus scrofa].
0000000000100100000000000000000000000000
XP_001512743
  PREDICTED: similar to osteoglycin preproprotein [Ornithorhynchus
0000000000000001000000100000000000000001
AAD40453
  mimecan [Homo sapiens].


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com


>From: "Mike Marchywka" <marchywka at hotmail.com>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: bio_bulletin_board at bioinformatics.org
>Subject: Re: [BiO BB] Protein Datatypes for function prediction
>Date: Tue, 24 Jul 2007 07:50:50 -0400
>

_________________________________________________________________
http://newlivehotmail.com


From codeshepherd at gmail.com  Tue Jul 31 11:48:24 2007
From: codeshepherd at gmail.com (=?ISO-8859-1?Q?Dee=FE=E0n_Chakravarth=FF?=)
Date: Tue, 31 Jul 2007 23:48:24 +0800
Subject: [BiO BB] MetaBase 1.0
Message-ID: <46AF59C8.3040900@gmail.com>

Announcing the release of MetaBase version 1.0.

http://biodatabase.org


This release documents over 850 databases and nearly 700 data-resources 
with a growing number of 'user-contributed' articles!

MetaBase is a user-contributed 'database database', designed to list and 
categorize all the biological databases and data-resources available on 
the internet!

This first release owes much to the Nucleic Acids Research 'Database 
Summaries' database (2006) and the sister 'Web Server Summaries' 
database (2006). Permission to use the textual content of these two 
databases was kindly provided by Oxford University Press.

MetaBase is a 'user-contributed resource', allowing anyone to freely 
contribute, edit and update entries. Using the same 'MediaWiki' 
technology that runs the popular 'WikiPedia' website, MetaBase has the 
capacity to grow in content and scope and is entirely driven by its 
users. Additionally, many aspects of the database organization can be 
redesigned by users who understand the power of the MediaWiki templating 
system. For more information see;

http://biodatabase.org/index.php/What_is_MetaBase%3F


There are many ways that *you* can contribute to MetaBase! For some 
examples see;

http://biodatabase.org/index.php/Contribute_to_MetaBase


Because MetaBase is a community project, the first 30 'significant' 
contributors will be added to the list of authors in the first MetaBase 
publication. See;

http://biodatabase.org/index.php/MetaBase_publications and 
http://biodatabase.org/index.php/List_of_contributors


The idea behind this is to emphasise the community aspect of the project 
while encouraging people to contribute their expertise to the growing 
system. So if you or someone you know maintains a database, or if you 
are just interested in helping out, please take a look at the project.


The MetaBase people.


From rsachdev at usc.edu  Mon Jul 30 16:48:09 2007
From: rsachdev at usc.edu (Rohan Sachdeva)
Date: Mon, 30 Jul 2007 13:48:09 -0700
Subject: [BiO BB] Automatic blast database maintenance/updating
Message-ID: <25b698b90707301348n2e655331j7884077dd7e26ef8@mail.gmail.com>

Hello I've been charged with installing and maintaining a wwwblast server in
my lab. I've got everything setup but I am looking for an easy way to keep
all the databases updated. I was hoping someone could point me toward a
script that used update_blastdb.pl to update whatever databases and then
extract them too.

Thanks


From barry.hardy at vtxmail.ch  Tue Jul 31 07:48:44 2007
From: barry.hardy at vtxmail.ch (Barry Hardy)
Date: Tue, 31 Jul 2007 13:48:44 +0200
Subject: [BiO BB] eCheminfo Drug Discovery Workshop, Oxford, September 10-14
Message-ID: <46AF219C.3020903@vtxmail.ch>

We are holding an eCheminfo Drug Discovery Workshop week at Oxford
University, UK the week of 10-14 September 2007.

The approach will be hands-on using leading drug discovery software
packages, accompanied by practitioner-led lectures and discussions of
the methods worked on by the group.

Topics to be covered: Virtual Screening & Docking; Pharmacophore
Derivation, Elucidation and Searching, Applications of Filtering and
Similarity in Virtual Screening, Focused Library Design, Analysing
Chemical Databases using Advanced Structure Searching and Structure
Based Predictions, Protein Modelling, Pediction of Pharmacological
Properties and QSAR Analysis, Latest advances in ADME & Predictive
Toxicology; Pharmacokinetics & Pharmacodynamics and Physiological-based
Simulation.

More information:

Program (as pdf):
http://www.douglasconnect.com/files/eChemProgramOxford07-Sept-v1web.PDF
Program & Schedule with Abstracts & Bios:
http://www.echeminfo.com/COMTY_training/

If interested, please make your reservation soon as the size of the
group is limited and we have a limited number of places remaining.

best regards
Barry Hardy
eCheminfo Community of Practice Manager


Barry Hardy, PhD
Douglas Connect
Zeiningen, CH-4314
Switzerland
Tel: +41 61 851 0170
Blog: http://barryhardy.blogs.com/cheminfostream/


From tsmith at darwin.bu.edu  Tue Jul 31 11:38:36 2007
From: tsmith at darwin.bu.edu (Temple F. Smith)
Date: Tue, 31 Jul 2007 11:38:36 -0400
Subject: [BiO BB] Bioinformatic and the Smith-waterman
Message-ID: <46AF577C.2080008@darwin.bu.edu>

Jeff for the record:

  Jean-Michel Claverie of France used the term "bio-informatik" in some
email
at the time of the "Waterville Valley computational biology meetings in the
mid to late 80's.  However in his book, Bioinformatics for Dummies I do
not recall that he discusses the origin of the term?  Recall that the term
Informatik is the French word for computer science and Jean-Michel was
one of the early guys in this "field" but true computational biology
goes back
to Haldane (1908) and D'arcy Thompson (1942), Dayhoff (1966) etc .....
to say nothing about the x-ray crystal guys of the last 1950's!! 
Clearly the
term was not used in  1980 to 1982 when Dr. Waterman and I were starting
out!!  And no one "started the Field of Bioinformatics" it grow out of
the molecular
biology with protein sequencing and then DNA/RNA sequencing's need for
databases and computer analysis.  The first such recognized early work
was by
people like Zuckerkandl and Pauling (1965) and Fitch and Margolisash (1967)
and then Needleman and Wuchsch (1977)!  Thus unless the Dr. Hwa A.Lim
can claim to have been doing sequence comparative analysis in the late
1960's
he is not a founder! 
       While I was the organizer of the three Waterville Valley Genes
and Machines
meetings to which Jean-Michel attended, I did not use the term for at
least another
two years if I remember correctly.  Also on my visits with Dayhoff and
later Fitch
they both agreed that it was likely Jean-Michel then at the FRENCH
Institute Pasture
who surely used it first --particularly given his use in email as
something he
had been using at home in the Paris Institute.  Thus unless Jean-Michel says
other wise all others making such claims should stop!  This is an old
discussion
which I find not funny any more.  In fact the terms is a bit out of date
these days
and the better term is computational biology in any case.

Please pass this on to who ever is still asking this now unimportant
question.

Temple F. Smith, PhD
BMERC
Boston University


From jeff at bioinformatics.org  Tue Jul 31 19:34:23 2007
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Tue, 31 Jul 2007 19:34:23 -0400
Subject: [BiO BB] Re: Bioinformatic and the Smith-waterman
In-Reply-To: <46AF577C.2080008@darwin.bu.edu>
References: <46AF577C.2080008@darwin.bu.edu>
Message-ID: <46AFC6FF.9020206@bioinformatics.org>

Hi Temple!

(For those on the mailing list, the origins of the word "bioinformatics" was brought up in this thread: http://bioinformatics.org/pipermail/bio_bulletin_board/2002-April/000635.html and we have a short wiki page on the topic: http://wiki.bioinformatics.org/Origins_of_bioinformatics)

Thank you for clearing up some misconceptions about the origins of the field, including some of my own.  I guess when we spoke about this around 1998, you actually said that you were *incorrectly* credited with having coined the word.  In any case, I agree that no one person or group started the field.

And it seems there are as many different definitions as there are practitioners.  To me, bioinformatics is a compound of "bio" and the English/common word "informatics" (http://en.wikipedia.org/wiki/Informatics), with the latter being a subdiscipline of computer science, whereas the French "informatique" will translate to "computer science" in general.

A Wikipedia contributor wrote the following about "informatics," and I pretty much agree with it: "Used as a compound, in conjunction with the name of a discipline, as in medical informatics, bioinformatics, etc., it denotes the specialization of informatics to the management and processing of data, information and knowledge in the named discipline, and the incorporation of informatic concepts and theories to enrich the other discipline."

So, I think of bioinformatics as a subdiscipline of computational biology, the same way that informatics is a subdiscipline of computer science.  But, I will cede that most people think that the terms are synonymous.  And maybe it just doesn't matter.

I will integrate most of what you've written into our wiki, which will hopefully help clear up some of the confusion about where things started.

Cheers,
Jeff

Temple F. Smith wrote:
> Jeff for the record:
> 
>       Jean-Michel Claverie of France used the term "bio-informatik"
> in some email at the time of the "Waterville Valley computational
> biology meetings in the mid to late 80's. However in his book,
> Bioinformatics for Dummies I do not recall that he discusses the
> origin of the term? Recall that the term Informatik is the French
> word for computer science and Jean-Michel was one of the early guys
> in this "field" but true computational biology goes back to Haldane
> (1908) and D'arcy Thompson (1942), Dayhoff (1966) etc ..... to say
> nothing about the x-ray crystal guys of the last 1950's!! Clearly the
> term was not used in 1980 to 1982 when Dr. Waterman and I were
> starting out!! And no one "started the Field of Bioinformatics" it
> grow out of the molecular biology with protein sequencing and then
> DNA/RNA sequencing's need for databases and computer analysis. The
> first such recognized early work was by people like Zuckerkandl and
> Pauling (1965) and Fitch and Margolisash (1967) and then Needleman
> and Wuchsch (1977)! Thus unless the Dr. Hwa A.Lim can claim to have
> been doing sequence comparative analysis in the late 1960's he is not
> a founder!
>        While I was the organizer of the three Waterville Valley Genes
> and Machines meetings to which Jean-Michel attended, I did not use
> the term for at least another two years if I remember correctly. Also
> on my visits with Dayhoff and later Fitch they both agreed that it
> was likely Jean-Michel then at the FRENCH Institute Pasture who
> surely used it first --particularly given his use in email as 
> something he had been using at home in the Paris Institute. Thus
> unless Jean-Michel says other wise all others making such claims
> should stop! This is an old discussion which I find not funny any
> more. In fact the terms is a bit out of date these days and the
> better term is computational biology in any case.
> 
> Please pass this on to who ever is still asking this now unimportant
> question.
> 
> Temple F. Smith, PhD
> BMERC
> Boston University
> 

-- 
J.W. Bizzaro
Bioinformatics Organization, Inc. (Bioinformatics.Org)
E-mail: jeff at bioinformatics.org
Phone:  +1 508 890 8600
--