From A.Bossers at id.dlo.nl  Tue Apr  1 06:29:37 2003
From: A.Bossers at id.dlo.nl (Bossers, A.)
Date: Tue, 01 Apr 2003 13:29:37 +0200
Subject: [BiO BB] Clustering EST sequences
Message-ID: <EA56C399C5A6D5118B5400508BF98376021DC274@id008s.id.dlo.nl>

Dear All,

I have a very basic problem of which I wonder how others have solved this.

I want to make a unigene collection of a large EST database. We have chromat
files in ABI format and I use Linux on the intel platform.
I have phred and phrap running but since phrap was originally designed for
genomic sequences we get lots of misaasemblies on poly-A or poly-T
stretches.

Therefore I installed the TIGR tigcl package which is designed for EST
databases and also runs very well on multi node machines.
However, it uses multi fasta files (and corresponding (optional) quality
files) as input.
I wanted to use the phred package to generate the required fasta and qual
files. This runs fine but the fasta file has in the >name line additional
info separated with spaces. These files are not accepted by TGICL.

Is there an easy unix (linux) utility to convert these multi fasta files and
quality fasta files in simpel >name {CRT} seq files so they kan be used as
input for tgicl? Or is a conversion utility available to convert/extract
phreds phd files into fasta-seq and fasta-qual?

Any help would be appreciated,

	Alex


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030401/3bcee204/attachment.html>

From shalpine at ecomplexsystems.com  Tue Apr  1 15:13:37 2003
From: shalpine at ecomplexsystems.com (Scott A. Halpine)
Date: Tue, 1 Apr 2003 15:13:37 -0500
Subject: [BiO BB] Clustering EST sequences
References: <EA56C399C5A6D5118B5400508BF98376021DC274@id008s.id.dlo.nl>
Message-ID: <002601c2f88b$2e56d3a0$1308a8c0@scott01>

Clustering EST sequencesI don't know of any conversion utilities but you can certainly write a quick conversion in Perl. I'm not familiar with the specific layouts but it sounds like you simply need to properly truncate each row of data. There shouldn't be a problem if your field partition is white space (or any other specific delimiter for that matter). 
If you don't get a better offer, send me a small data file of what you need converted, the field delimiter used, and an example of what it needs converted into. I should be able to write you a Perl routine and send it back to you. 
Scott A. Halpine
Ecologic Complex Systems, LLC
4640 Forbes Blvd, Suite 200
Lanham, MD 20706-4885
Phone: 301-918-3283
Fax: 301-429-8762

  ----- Original Message ----- 
  From: Bossers, A. 
  To: bio_bulletin_board at bioinformatics.org 
  Cc: biodevelopers at bioinformatics.org 
  Sent: Tuesday, April 01, 2003 6:29 AM
  Subject: [BiO BB] Clustering EST sequences


  Dear All, 

  I have a very basic problem of which I wonder how others have solved this. 

  I want to make a unigene collection of a large EST database. We have chromat files in ABI format and I use Linux on the intel platform.

  I have phred and phrap running but since phrap was originally designed for genomic sequences we get lots of misaasemblies on poly-A or poly-T stretches.

  Therefore I installed the TIGR tigcl package which is designed for EST databases and also runs very well on multi node machines.

  However, it uses multi fasta files (and corresponding (optional) quality files) as input. 
  I wanted to use the phred package to generate the required fasta and qual files. This runs fine but the fasta file has in the >name line additional info separated with spaces. These files are not accepted by TGICL.

  Is there an easy unix (linux) utility to convert these multi fasta files and quality fasta files in simpel >name {CRT} seq files so they kan be used as input for tgicl? Or is a conversion utility available to convert/extract phreds phd files into fasta-seq and fasta-qual?

  Any help would be appreciated, 

          Alex 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030401/ff415f75/attachment.html>

From mgollery at unr.edu  Tue Apr  1 17:58:43 2003
From: mgollery at unr.edu (Martin Gollery)
Date: Tue,  1 Apr 2003 14:58:43 -0800
Subject: [BiO BB] Clustering EST sequences
In-Reply-To: <002601c2f88b$2e56d3a0$1308a8c0@scott01>
References: <EA56C399C5A6D5118B5400508BF98376021DC274@id008s.id.dlo.nl> <002601c2f88b$2e56d3a0$1308a8c0@scott01>
Message-ID: <1049237923.3e8a19a31ee27@webmail.unr.edu>

This is very strange- spaces are allowed in fasta, at least in the description 
section. In the first part you may need to replace the spaces with | symbols, 
as follows:
Change
>gi 29125973 emb AJ550374.1 USO550374 Uncultured soil bacterium partial nosZ 
gene for putative nitrous oxide reductase, clone T8C23
GGCTGGGG...


to

>gi|29125973|emb|AJ550374.1|USO550374 Uncultured soil bacterium partial nosZ 
gene for putative nitrous oxide reductase, clone T8C23
GGCTGGGG...


Quoting "Scott A. Halpine" <shalpine at ecomplexsystems.com>:

> Clustering EST sequencesI don't know of any conversion utilities but you can
> certainly write a quick conversion in Perl. I'm not familiar with the
> specific layouts but it sounds like you simply need to properly truncate each
> row of data. There shouldn't be a problem if your field partition is white
> space (or any other specific delimiter for that matter). 
> If you don't get a better offer, send me a small data file of what you need
> converted, the field delimiter used, and an example of what it needs
> converted into. I should be able to write you a Perl routine and send it back
> to you. 
> Scott A. Halpine
> Ecologic Complex Systems, LLC
> 4640 Forbes Blvd, Suite 200
> Lanham, MD 20706-4885
> Phone: 301-918-3283
> Fax: 301-429-8762
> 
>   ----- Original Message ----- 
>   From: Bossers, A. 
>   To: bio_bulletin_board at bioinformatics.org 
>   Cc: biodevelopers at bioinformatics.org 
>   Sent: Tuesday, April 01, 2003 6:29 AM
>   Subject: [BiO BB] Clustering EST sequences
> 
> 
>   Dear All, 
> 
>   I have a very basic problem of which I wonder how others have solved this.
> 
> 
>   I want to make a unigene collection of a large EST database. We have
> chromat files in ABI format and I use Linux on the intel platform.
> 
>   I have phred and phrap running but since phrap was originally designed for
> genomic sequences we get lots of misaasemblies on poly-A or poly-T
> stretches.
> 
>   Therefore I installed the TIGR tigcl package which is designed for EST
> databases and also runs very well on multi node machines.
> 
>   However, it uses multi fasta files (and corresponding (optional) quality
> files) as input. 
>   I wanted to use the phred package to generate the required fasta and qual
> files. This runs fine but the fasta file has in the >name line additional
> info separated with spaces. These files are not accepted by TGICL.
> 
>   Is there an easy unix (linux) utility to convert these multi fasta files
> and quality fasta files in simpel >name {CRT} seq files so they kan be used
> as input for tgicl? Or is a conversion utility available to convert/extract
> phreds phd files into fasta-seq and fasta-qual?
> 
>   Any help would be appreciated, 
> 
>           Alex 
> 
> 
> 


Martin Gollery
Associate Director of Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS200
(775)784-6048


-------------------------------------------------
This mail sent through https://webmail.unr.edu


From gianluca.dellavedova at unimib.it  Tue Apr  1 11:57:25 2003
From: gianluca.dellavedova at unimib.it (Gianluca Della Vedova)
Date: 01 Apr 2003 18:57:25 +0200
Subject: [BiO BB] JCST  Special Issue on Bioinformatics
Message-ID: <1049216244.1174.21.camel@localhost>

****** Journal of Computer and Science Technology ******
****** Special Issue on Bioinformatics ******
****** Call for Papers ******

The Journal of Computer and Science Technology is inviting papers for a
special issue devoted to Bioinformatics scheduled to appear in late
2003.

The aim: The special issue aims at giving an up-to-date snapshot of the
current trends of research in Bioinformatics. Papers reporting on
original research in all areas of bioinformatics and computational
molecular biology are preferred, even though surveys of particular
relevance are good candidate.

Topics: All aspects of bioinformatics, both theoretical and practical.
Some not exclusive topics of interest include: Genomics, Recognition of
genes and regulatory elements, Molecular evolution, Phylogenetic
inference, Protein structure, Gene expression, Gene networks, Genetic
variation (SNPs, haplotypes, etc.), Metabolic Pathways, Combinatorial
libraries and drug design, Computational proteomics, Data management
methods and systems, Sequence analysis, motifs, and pattern matching,
Comparative genomics and annotation.

Paper submission: Authors are invited to send one copy of a full-length
paper to the email address jcst at bioinformatics.bio.disco.unimib.it.
Electronic submissions via email, in the form of a postscript or PDF
file are encouraged, alternatively sending a hardcopy to the contact
person is acceptable. Successful printing or reception of the paper will
be acknowledged via email. Submissions must be received by May 26, 2003.
Authors will be notified of acceptance by July 14, 2003.

Final version: The usual authors' instructions of Journal of Computer
Science and Technology apply (available at
http://www.ict.ac.cn/jcst/efile3.html).

Special Issue Editors:

Paola Bonizzoni      
Dipartimento di Informatica, Sistemistica e Comunicazione
Universit? degli Studi di Milano-Bicocca
bonizzoni at disco.unimib.it 

Gianluca Della Vedova
Dipartimento di Statistica
Universit? degli Studi di Milano-Bicocca
gianluca.dellavedova at unimib.it

Tao Jiang
Department of Computer Science
University of California at Riverside
jiang at cs.ucr.edu

Contact Person:
Paola Bonizzoni
DISCo
Universit? degli Studi di Milano-Bicocca
via Bicocca degli Arcimboldi 8
20126 - Milano (Italy)
bonizzoni at disco.unimib.it
tel: ++39-0264487814
fax: ++39-0264487839

Important Dates:
Submission deadline: May 26, 2003
Notification of acceptance: July 14, 2003
Final version: August 4, 2003

At the URL http://www.bio.disco.unimib.it/jcst it is possible to
download a printable version of the call for paper.

-- 
Gianluca Della Vedova
Dip. Statistica, Univ. Milano-Bicocca
http://www.statistica.unimib.it/utenti/dellavedova/


From A.Bossers at id.dlo.nl  Wed Apr  2 01:31:52 2003
From: A.Bossers at id.dlo.nl (Bossers, A.)
Date: Wed, 02 Apr 2003 08:31:52 +0200
Subject: [BiO BB] FW/Re: Fasta convertion in large EST assemblies
Message-ID: <EA56C399C5A6D5118B5400508BF98376021DC27A@id008s.id.dlo.nl>

Dear all,
 
thanks for the quick replies and help with the fasta conversion problem. I
already started fiddling around in perl to convert the fasta files into
files acceptable to tgicl for EST assembly. But Eitan provided the most
simpel solution in his one line perl 'script' that exactly did what I
needed. BIG THANKS. The script just gets rid of all stuff after the filename
(as long as no spaces are in the filename) and preserves all sequence or
quality info behind it. His solution is below.
 
I still don't get why tgicl does't accept files in allowed fastA format. But
I don't bother anymore. My EST assembly is one step further.
 
Thanks again to all people sending me perl solutions!
 
Alex
 
 
-----Oorspronkelijk bericht-----
Van: Eitan Rubin [mailto:Eitan.Rubin at weizmann.ac.il] 
Verzonden: dinsdag 1 april 2003 20:28
Aan: A.Bossers at ID.DLO.NL
Onderwerp: Fasta convertion


Hi,
 
  If I am not mistaken, you question is "how do I convert format A below to
format B". If this indeed what you need, the following should do the trick:
 
perl -pe 's/^>(\S+).*/>$1/;' old_format_file > new_format_file
 
Format A:
==========
>seqname1 some text with spaces
ACGTAGACTGACT..
>seqname2 some other text etc
ACGATCGATAGCT
 
Format B:
========
>seqname1
ACGTAGACTGACT..
>seqname2
ACGATCGATAGCT
 
  Eitan
 
------------------------------------------------------------------------
Eitan Rubin, PhD
Head of Bioinformatics and Biological Computing
Dept. Biological Services
Weizmann Institute of Science
Tel: +972-8-9343456
Fax: +972-8-9346006
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030402/3516bd34/attachment.html>

From jeff at bioinformatics.org  Wed Apr  2 10:27:44 2003
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Wed, 02 Apr 2003 10:27:44 -0500
Subject: [BiO BB] BioDarwin mailing list
Message-ID: <3E8B0170.7060300@bioinformatics.org>

Greetings.

With the help of Apple Computer, which is an organizational member of 
Bioinformatics.Org, we have set up a forum for open source bioinformatic 
developers to discuss development on the Mac OSX platform.  For 
starters, there is a mailing list on which Apple Computer engineers will 
be subscribed and available for help.

To subscribe to the mailing list yourself, please go to the following URL:
https://bioinformatics.org/lists/biodarwin

"BioDarwin" will likely grow into an "open lab" (special interest group) 
at Bioinformatics.Org and include many resources for the developer and 
Mac user.

As a reminder of the resources available, Bioinformatics.Org has a G4 
Server which can be used by Bioinformatics.Org members for software 
development and other projects related to bioinformatics.  There's also 
Apple's treasure trove of developer resources, documentation and help:
http://developer.apple.com/

Cheers.
Jeff
-- 
J.W. Bizzaro                                jeff at bioinformatics.org
President, Bioinformatics.Org       http://bioinformatics.org/~jeff
"As we enjoy great advantages from the inventions of others, we
should be glad of an opportunity to serve others by any invention
of ours; and this we should do freely and generously."
                    -- Benjamin Franklin
--


From crasmen at magic.fr  Wed Apr  2 17:25:17 2003
From: crasmen at magic.fr (Corentin =?iso-8859-1?Q?Cras=2DM=E9neur?=)
Date: Wed, 2 Apr 2003 16:25:17 -0600
Subject: [BiO BB] BioDarwin mailing list
In-Reply-To: <3E8B0170.7060300@bioinformatics.org>
References: <3E8B0170.7060300@bioinformatics.org>
Message-ID: <p05210611bab1132da87c@[128.252.140.4]>

At 10:27 -0500 on 2/04/03, you wrote:


>Greetings.
>
>With the help of Apple Computer, which is an organizational member 
>of Bioinformatics.Org, we have set up a forum for open source 
>bioinformatic developers to discuss development on the Mac OSX 
>platform.  For starters, there is a mailing list on which Apple 
>Computer engineers will be subscribed and available for help.
>
>To subscribe to the mailing list yourself, please go to the following URL:
>https://bioinformatics.org/lists/biodarwin


  Thanks a lot for the information. I thought other people could be 
interested on the Apple Scitech list as well so I send a copy of your 
announcement there. People in the MicroArray Yahoo Group could be 
interested as well (but I didn't forward the message there yet)?
  Sincerely,


		Corentin Cras-M?neur


From Ttlusa at aol.com  Fri Apr  4 08:58:09 2003
From: Ttlusa at aol.com (Ttlusa at aol.com)
Date: Fri, 04 Apr 2003 08:58:09 -0500
Subject: [BiO BB] RE: Confirmation
Message-ID: <1B867DFC.1701E96D.00055F61@aol.com>

Gentlemen,

I would like to confirm my subscription to the mailing list.
Regards,
ttlclinical


From derek at biotechrecruiter.org  Fri Apr  4 09:03:09 2003
From: derek at biotechrecruiter.org (Derek Pyper)
Date: Fri, 4 Apr 2003 06:03:09 -0800
Subject: [BiO BB] Dynamics Engineer Position
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAtLCII0a1PEWB9xUFtGpvPsKAAAAQAAAA2xL2wWmsxEGxjqG1l97wVQEAAAAA@biotechrecruiter.org>

Hi All,

 
I am working to fill the following position, if you know of anyone who
may be qualified or where I may look to find people with the skill set
listed, please send me an e-mail.

 
Dynamics Engineer 

Company is a growing, privately held company headquartered in
California, is the commercial leader in predictive biosimulation for in
silico drug discovery and development. Employing its patented
technology, company develops dynamic, large-scale, mathematical models
of human disease, called "confidential" platforms, and utilizes them in
all stages of drug discovery and development. We currently engage in in
silico research in the areas of asthma, rheumatoid arthritis, obesity,
and diabetes. Company collaborates with pharmaceutical and biotech
companies to develop effective new treatments for disease and
dramatically reduce the time and cost needed to develop them. 

Company is seeking engineers and applied mathematicians for positions on
our In Silico Research and Development staff. The Dynamics Engineer
works with scientists and other engineers to perform in silico research
and development. He or she provides mathematical modeling expertise,
helps develop complex mathematical models that meet the needs of
pharmaceutical researchers, uses these models in research projects,
identifies novel experiments that can be done to address major knowledge
gaps, writes project proposals and reports, and makes technical
presentations of their work.

The ideal candidate would have:

*	A Ph.D. in Chemical, Mechanical, or Aerospace Engineering,
Applied Mathematics, Physics, or closely related field with a strong
background in nonlinear dynamics and control theory. 
*	The ability to translate complex, real world problems into
compelling mathematical models. 
*	Experience with collaborative, multidisciplinary research
projects. 
*	A strong interest in biology. 
*	Very strong communication skills, both oral and written,
including technical writing and presentation. 
*	The ability to work independently in a diverse, cross-functional
team environment. 

 
Derek Pyper

Biotech Recruiters International

2640 Castro Way

Sacramento, Ca

Work  916-455-7091

Fax    916-455-7082

www.biotech-recruiters.com

Derek at biotechrecruiter.org

 
CONFIDENTIALITY STATEMENT: This electronic message contains privileged
and confidential information from  Biotech Recruiters International.
This information is intended solely for the use of the individual(s) or
entity(ies) named above. If you are not the intended recipient, be aware
that any disclosure, copying, distribution, or use of the contents of
this message is prohibited. If you have received this email in error,
please notify us immediately by telephone at  916-455-7091  or by email
reply. Thank you.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030404/4957097b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Derek Pyper.vcf
Type: text/x-vcard
Size: 493 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030404/4957097b/attachment.vcf>

From Ttlusa at aol.com  Fri Apr  4 11:48:37 2003
From: Ttlusa at aol.com (Ttlusa at aol.com)
Date: Fri, 04 Apr 2003 11:48:37 -0500
Subject: [BiO BB] RE: Confirmation
Message-ID: <63F99B6E.5D91DEF7.00055F61@aol.com>

Gentlemen,

I would like to confirm my subscription to the mailing list.
Regards,
ttlclinical
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board 


From jeff at bioinformatics.org  Fri Apr  4 12:02:35 2003
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Fri, 04 Apr 2003 12:02:35 -0500
Subject: [BiO BB] RE: Confirmation
References: <63F99B6E.5D91DEF7.00055F61@aol.com>
Message-ID: <3E8DBAAB.5020207@bioinformatics.org>

If you're able to post and you're reading this message, then you are 
subscribed :-)

Jeff

Ttlusa at aol.com wrote:
> Gentlemen,
> 
> I would like to confirm my subscription to the mailing list.
> Regards,
> ttlclinical


-- 
J.W. Bizzaro                                jeff at bioinformatics.org
President, Bioinformatics.Org       http://bioinformatics.org/~jeff
"As we enjoy great advantages from the inventions of others, we
should be glad of an opportunity to serve others by any invention
of ours; and this we should do freely and generously."
                    -- Benjamin Franklin
--


From KarenD721 at aol.com  Fri Apr  4 14:17:56 2003
From: KarenD721 at aol.com (KarenD721 at aol.com)
Date: Fri, 4 Apr 2003 14:17:56 EST
Subject: [BiO BB] (no subject)
Message-ID: <3a.36fda4e7.2bbf3464@aol.com>

 
From chrisg at sbs.bangor.ac.uk  Tue Apr  8 07:13:29 2003
From: chrisg at sbs.bangor.ac.uk (Chris Gliddon)
Date: Tue, 08 Apr 2003 12:13:29 +0100
Subject: [BiO BB] Mascot alternatives
Message-ID: <3E92AED9.2080501@sbs.bangor.ac.uk>

Hi All,

I'm looking for alternatives to Mascot from Matrix science which uses 
mass spectrometry data to identify proteins from primary sequence 
databases.  I would prefer open source software running on Linux.

Thanks for your help.

Chris
-- 
____________________________________________________
Dr. Chris Gliddon
School of Biological Sciences
University of Wales, Bangor
LL57 2UW  United Kingdom

Tel: +44 (0)1248 382533
FAX: +44 (0)1248 382569
Mobile: +44 (0)7941 060423
____________________________________________________


From marchal at mediagen.fr  Tue Apr  8 12:18:46 2003
From: marchal at mediagen.fr (Ingrid Marchal)
Date: Tue, 08 Apr 2003 18:18:46 +0200
Subject: [BiO BB] Re: Mascot alternatives
In-Reply-To: <20030408160103.8917BD2830@www.bioinformatics.org>
Message-ID: <5.2.0.9.0.20030408180926.00a92478@pop.mediagen.fr>

Hi,

emowse (http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/emowse.html) is an 
algorithm from the EMBOSS package that does the same job. It is free and 
runs in command line under linux.
I am not sure, but I think that originally Mascot was derived from mowse, 
now replaced by emowse.
Regards,
Ingrid

At 18:01 08/04/2003, you wrote:
>Today's Topics:
>
>    1. Mascot alternatives (Chris Gliddon)
>
>--__--__--
>
>Message: 1
>Date: Tue, 08 Apr 2003 12:13:29 +0100
>From: Chris Gliddon <chrisg at sbs.bangor.ac.uk>
>To: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] Mascot alternatives
>Reply-To: bio_bulletin_board at bioinformatics.org
>
>Hi All,
>
>I'm looking for alternatives to Mascot from Matrix science which uses
>mass spectrometry data to identify proteins from primary sequence
>databases.  I would prefer open source software running on Linux.
>
>Thanks for your help.
>
>Chris
>--
>____________________________________________________
>Dr. Chris Gliddon
>School of Biological Sciences
>University of Wales, Bangor
>LL57 2UW  United Kingdom
>
>Tel: +44 (0)1248 382533
>FAX: +44 (0)1248 382569
>Mobile: +44 (0)7941 060423
>____________________________________________________

Ingrid Marchal, Ph.D.
MEDIAGEN - bioinformatic solutions
www.mediagen.fr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030408/774810f4/attachment.html>

From sasa at muh.biglobe.ne.jp  Wed Apr  9 08:22:00 2003
From: sasa at muh.biglobe.ne.jp (Takeshi Sasayama)
Date: Wed, 9 Apr 2003 21:22:00 +0900
Subject: [BiO BB] Audio archives
Message-ID: <MBBBIHNJMFPODHKPKOMEMEABCDAA.sasa@muh.biglobe.ne.jp>

Hi all,

I am looking for a web site which has audio archives of
biological or medical presentations or news in English. I
hope the archives should be mp3 files and can be
downloaded. Does anyone know a good site?

Takeshi Sasayama


From mkaut at bu.edu  Wed Apr  9 11:46:23 2003
From: mkaut at bu.edu (Maurya Kaut)
Date: Wed, 09 Apr 2003 11:46:23 -0400
Subject: [BiO BB] multinomial distribution and Ig light chains
Message-ID: <3E94404F.5020507@bu.edu>

  Hello,

I'm attempting to replicate the analysis of immunoglobulin genes as in 
this article:
Lossos, et al.  The Inference of Antigen Selection on Ig Genes.  The 
Journal of Immunology 165(9): 5122-5126 (2000)
The article is available here:
http://www.jimmunol.org/cgi/content/full/165/9/5122
The Java applet mentioned in the paper is here:
http://www-stat.stanford.edu/immunoglobin/
The contact information in the paper no longer appears to be valid. 
 Basically, I'd like to understand the multinomial tail probability as 
they've applied it to heavy chains, so that I might apply it to light 
chains.  The information I've gathered on statistics thus far only 
explains bits and pieces of the equations, but I'm having trouble 
putting it all together.  I've written a Perl script that calculates 
expected replacement frequency for Ig light chain germline genes with 
some success, but the numbers I get for P values are two or three orders 
of magnitude off.  Firstly, I would just like to know if there is anyone 
who is familiar with this type of statistical work.  Also, I've heard of 
the "S" engine, and its cousin "R", but I'm not quite sure if they are 
applicable here.  Has anyone used them in conjuction with Perl/CGI?  Any 
advice is appreciated.

Maurya Kaut

-- 
<><><><><><><><><><><><><><><><><><>
Maurya G. Kaut
Gerry Amyloid Research Laboratory
Boston University School of Medicine
715 Albany St. K-508
Boston, MA 02118
(617) 638-5389 - Voice
<><><><><><><><><><><><><><><><><><>


From cjh02 at liverpool.ac.uk  Wed Apr  9 09:26:50 2003
From: cjh02 at liverpool.ac.uk (Chris Houseman)
Date: Wed, 09 Apr 2003 14:26:50 +0100
Subject: [BiO BB] Audio archives
In-Reply-To: <MBBBIHNJMFPODHKPKOMEMEABCDAA.sasa@muh.biglobe.ne.jp>
Message-ID: <200304091326.h39DQpX14917@webmail2.liv.ac.uk>

Don't know of mp3's, but http://www.s-star.org has ~600mb worth of
video bioinformatics lectures
regards,

CJH

--------------- reply ----------------
> Hi all,
> 
> I am looking for a web site which has audio archives of
> biological or medical presentations or news in English. I
> hope the archives should be mp3 files and can be
> downloaded. Does anyone know a good site?
> 
> Takeshi Sasayama
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  - 
BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
Chris Houseman
Research Fellow
University of Liverpool
University Department of Pathology
Duncan Building
Daulby Street
Liverpool
L69 3GA

0151 7064965


From lhewei at TripathImaging.com  Wed Apr  9 12:04:59 2003
From: lhewei at TripathImaging.com (Li, Hewei)
Date: Wed, 9 Apr 2003 12:04:59 -0400 
Subject: [BiO BB] Tools to search AG rich fragments around a long DNA sequence?
Message-ID: <235AEBD0012949478E0E2FB3449D7A7D014FF453@tpe-exch.tripathimaging.com>

Dear All,

I am looking for DNA sequence analysis tools which could locate DNA
fragments around 20 bases in length with rich purines (i.e., A or G) over a
given DNA sequence. Could someone help me out? Many thanks!

Hewei


From rossini at blindglobe.net  Wed Apr  9 12:25:20 2003
From: rossini at blindglobe.net (A.J. Rossini)
Date: Wed, 09 Apr 2003 09:25:20 -0700
Subject: [BiO BB] multinomial distribution and Ig light chains
In-Reply-To: <3E94404F.5020507@bu.edu> (Maurya Kaut's message of "Wed, 09
 Apr 2003 11:46:23 -0400")
References: <3E94404F.5020507@bu.edu>
Message-ID: <87of3fvdsf.fsf@jeeves.blindglobe.net>

Maurya Kaut <mkaut at bu.edu> writes:

> I'm attempting to replicate the analysis of immunoglobulin genes as in
> this article:
> Lossos, et al.  The Inference of Antigen Selection on Ig Genes.  The
> Journal of Immunology 165(9): 5122-5126 (2000)
> The article is available here:
> http://www.jimmunol.org/cgi/content/full/165/9/5122
> The Java applet mentioned in the paper is here:
> http://www-stat.stanford.edu/immunoglobin/
> The contact information in the paper no longer appears to be
> valid. Basically, I'd like to understand the multinomial tail

Both Rob and Naras are still at Stanford Stat.

> having trouble putting it all together.  I've written a Perl script
> that calculates expected replacement frequency for Ig light chain
> germline genes with some success, but the numbers I get for P values
> are two or three orders of magnitude off.  

Sounds like an implementation error -- I doubt if the distribution is
pathological enough to admit to round-off problems on that order of
magnitude.

> Firstly, I would just like to know if there is anyone who is
> familiar with this type of statistical work.  Also, I've heard of
> the "S" engine, and its cousin "R", but I'm not quite sure if they
> are applicable here.  Has anyone used them in conjuction with
> Perl/CGI?  Any advice is appreciated.

The S statistical programming language, as implemented by S (not
generally available), S-PLUS (commercially available) and R
(open-source), is a full featured language for programming, not unlike
a functional, white-space agnostic version of Python (in a sense). 

I would (and generally only) use R for data analysis, and it'll make
programming this problem up much simpler (assuming that you know both
how to program as well as have intuition for statistical data
analysis).

best,
-tony

-- 
A.J. Rossini rossini at u.washington.edu http://software.biostat.washington.edu/ 
Biostatistics, U Washington and Fred Hutchinson Cancer Research Center 
FHCRC:Tu: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email 
UW  : Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX 
CONFIDENTIALITY NOTICE: This e-mail message and any attachments may be
confidential and privileged. If you received this message in error,
please destroy it and notify the sender. Thank you.


From jrg at world.std.com  Wed Apr  9 12:59:18 2003
From: jrg at world.std.com (James Graham)
Date: Wed, 9 Apr 2003 12:59:18 -0400
Subject: [BiO BB] Tools to search AG rich fragments around a long DNA sequence?
In-Reply-To: <235AEBD0012949478E0E2FB3449D7A7D014FF453@tpe-exch.tripathimaging.com>
Message-ID: <99FF337A-6AAC-11D7-8169-000A956A62E4@world.std.com>

On Wednesday, April 9, 2003, at 12:04 PM, Li, Hewei wrote:

> I am looking for DNA sequence analysis tools which could locate DNA
> fragments around 20 bases in length with rich purines (i.e., A or G) 
> over a
> given DNA sequence. Could someone help me out? Many thanks!

can you define purine richness in terms of a percentage? it seems 
writing such a tool in a perl script would be very easy and very quick. 
do you have access to a unix/linux machine or are you looking more for 
an app with a GUI?

james


From cmdobson at ucalgary.ca  Wed Apr  9 16:39:01 2003
From: cmdobson at ucalgary.ca (C. Melissa Dobson)
Date: Wed, 09 Apr 2003 14:39:01 -0600
Subject: [BiO BB] Transcription factor
Message-ID: <3E9484E5.4090802@ucalgary.ca>

Hello,
Can anyone suggest a good promoter binding site locator program for 
mouse genomic DNA sequences?  I am aware of Transfac are there any 
others that can be recommended?

Melissa Dobson


From mgollery at unr.edu  Wed Apr  9 17:07:07 2003
From: mgollery at unr.edu (Martin Gollery)
Date: Wed,  9 Apr 2003 14:07:07 -0700
Subject: [BiO BB] Transcription factor
In-Reply-To: <3E9484E5.4090802@ucalgary.ca>
References: <3E9484E5.4090802@ucalgary.ca>
Message-ID: <1049922427.3e948b7b09b23@webmail.unr.edu>

Hi Melissa,

Try the Genomatix suite- the promoter inspector program will predict Mammalian 
Promoter regions, and MatInspector looks for transcription factor binding sites.

Marty

Quoting "C. Melissa Dobson" <cmdobson at ucalgary.ca>:

> 
> Hello,
> Can anyone suggest a good promoter binding site locator program for 
> mouse genomic DNA sequences?  I am aware of Transfac are there any 
> others that can be recommended?
> 
> Melissa Dobson


Martin Gollery
Associate Director of Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS200
(775)784-6048


-------------------------------------------------
This mail sent through https://webmail.unr.edu


From mkgovindis at yahoo.com  Wed Apr  9 22:43:47 2003
From: mkgovindis at yahoo.com (govind mk)
Date: Wed, 9 Apr 2003 19:43:47 -0700 (PDT)
Subject: [BiO BB] Re: Extracting location from a genbank flatfile
In-Reply-To: <1049922427.3e948b7b09b23@webmail.unr.edu>
Message-ID: <20030410024347.78989.qmail@web40109.mail.yahoo.com>

Hi all

I am stuck with a rather simple problem.
I would like to extract locations of specific features
(Eg .CDS)from a Genbank flat file.

I tried using Bioperl but couldnt manage to get the 
exact locations for complicated representations of
locations such as
complement(join(295405..295443,295492..295529))
as Bioperl modules return the minimum start and
maximum stop.

Any suggestions ???

-Govind


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com


From daniel.ducat at metalife.de  Thu Apr 10 03:44:38 2003
From: daniel.ducat at metalife.de (Daniel Ducat)
Date: Thu, 10 Apr 2003 10:44:38 +0300
Subject: [BiO BB] Re: Extracting location from a genbank flatfile
In-Reply-To: <20030410024347.78989.qmail@web40109.mail.yahoo.com>
Message-ID: <00d601c2ff35$0aab1910$3c01a8c0@metalife.bg>

Hello Govind

We had the same problem with Genbank locations.
What we did here is to write a C++ program that parse
a Genbank entry file into flatfiles, ready to be imported into a relational
DB.
In the database(MSSQL) we wrote a stored procedure, that parse every
location, notwithstanding how complicated is it , and break it into a set
of smaller simple ones. Note, that a location can have a link to other
entry, for
example (100..130, A01234.12..15).

It look complicated, but in a such a way we get all we need.

For more simple solution write a program or perl script (or bash script)
that extracts
the location from of the feature and parse it. This task is not so difficult
as it seems,
since there are clear rules for feature table location positions in Genbank
entry files.

Regards

Daniel Ducat
Senior Database Developer Metalife AG
e-mail: daniel.ducat at metalife.de
Phone: +359 (02) 950-18-04
URL: http://www.metalife.de


-----Original Message-----
From: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of govind mk
Sent: Thursday, April 10, 2003 5:44 AM
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] Re: Extracting location from a genbank flatfile


Hi all

I am stuck with a rather simple problem.
I would like to extract locations of specific features
(Eg .CDS)from a Genbank flat file.

I tried using Bioperl but couldnt manage to get the
exact locations for complicated representations of
locations such as
complement(join(295405..295443,295492..295529))
as Bioperl modules return the minimum start and
maximum stop.

Any suggestions ???

-Govind


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From pmr at ebi.ac.uk  Thu Apr 10 05:50:04 2003
From: pmr at ebi.ac.uk (Peter Rice)
Date: Thu, 10 Apr 2003 10:50:04 +0100
Subject: [BiO BB] Re: Extracting location from a genbank flatfile
References: <20030410024347.78989.qmail@web40109.mail.yahoo.com>
Message-ID: <3E953E4C.9060800@ebi.ac.uk>

govind mk wrote:
> I am stuck with a rather simple problem.
> I would like to extract locations of specific features
> (Eg .CDS)from a Genbank flat file.
> 
> I tried using Bioperl but couldnt manage to get the 
> exact locations for complicated representations of
> locations such as
> complement(join(295405..295443,295492..295529))
> as Bioperl modules return the minimum start and
> maximum stop.

You can use EMBOSS (the European Molecular Biology Open Software Suite) 
http://www.uk.embnet.org/Software/EMBOSS/

EMBOSS is an open source (GPL/LGPL) package of sequence analysis 
libraries and programs.

Among other features, EMBOSS can read EMBL/Genbank, SwissProt and PIR 
feature tables and convert to/from GFF without losing information 
(although this does require adding some extra GFF tags to retain 
information about complex feature locations). The internals are similar 
to the ARTEMIS feature table editor from the Sanger Institute.

I am currently extending the feature table internals of EMBOSS for the 
next release, to allow deletion/insertion of sequence ranges, and would 
be interested in any feedback - especially things that are hard to do 
with existing tools.

regards,

Peter Rice
European Bioinformatics Institute.


From sasa at muh.biglobe.ne.jp  Thu Apr 10 16:19:56 2003
From: sasa at muh.biglobe.ne.jp (Takeshi Sasayama)
Date: Fri, 11 Apr 2003 05:19:56 +0900
Subject: [BiO BB] Transcription factor
In-Reply-To: <3E9484E5.4090802@ucalgary.ca>
Message-ID: <MBBBIHNJMFPODHKPKOMEOEAGCDAA.sasa@muh.biglobe.ne.jp>

Hello,

There is a program called tfscan in EMBOSS package, which scans
transcription factors in DNA sequences.

See this page
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/tfscan.html

Takeshi Sasayama


From a.menze at kws.de  Fri Apr 11 12:23:49 2003
From: a.menze at kws.de (a.menze at kws.de)
Date: Fri, 11 Apr 2003 18:23:49 +0200
Subject: [BiO BB] (removed)
Message-ID: <OFD4018853.366D89BE-ONC1256D05.005A1251-C1256D05.005A1253@kws.de>

(removed)


From sasa at muh.biglobe.ne.jp  Fri Apr 11 12:43:51 2003
From: sasa at muh.biglobe.ne.jp (Takeshi Sasayama)
Date: Sat, 12 Apr 2003 01:43:51 +0900
Subject: [BiO BB] Audio archives
In-Reply-To: <200304091326.h39DQpX14917@webmail2.liv.ac.uk>
Message-ID: <MBBBIHNJMFPODHKPKOMEMEAICDAA.sasa@muh.biglobe.ne.jp>

Hi Chris,

I could download the lecture videos and saw some of them. This was better
than I expected!
Thanks for your valuable information.

Takeshi Sasayama


> -----Original Message-----
> From: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of Chris
> Houseman
> Sent: Wednesday, April 09, 2003 10:27 PM
> To: bio_bulletin_board at bioinformatics.org
> Subject: Re: [BiO BB] Audio archives
>
>
> Don't know of mp3's, but http://www.s-star.org has ~600mb worth of
> video bioinformatics lectures
> regards,
>
> CJH
>
> --------------- reply ----------------
> > Hi all,
> >
> > I am looking for a web site which has audio archives of
> > biological or medical presentations or news in English. I
> > hope the archives should be mp3 files and can be
> > downloaded. Does anyone know a good site?
> >
> > Takeshi Sasayama
> >
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -
> BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> >
> Chris Houseman
> Research Fellow
> University of Liverpool
> University Department of Pathology
> Duncan Building
> Daulby Street
> Liverpool
> L69 3GA
>
> 0151 7064965
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>


From a.menze at kws.de  Sat Apr 12 12:19:49 2003
From: a.menze at kws.de (a.menze at kws.de)
Date: Sat, 12 Apr 2003 18:19:49 +0200
Subject: [BiO BB] (removed)
Message-ID: <OFBB3A5BCC.C43E9DF1-ONC1256D06.0059B4B6-C1256D06.0059B4B6@kws.de>

(removed)


From dfield at ceh.ac.uk  Sun Apr 13 18:18:38 2003
From: dfield at ceh.ac.uk (Dawn Field)
Date: Sun, 13 Apr 2003 23:18:38 +0100
Subject: [BiO BB] PhD available in comparative genomics, Oxford, UK
Message-ID: <se99f058.069@wpo.nerc.ac.uk>

A NERC funded PhD position is available entitled:

"Comparative Genomics, Phylogeny, Ecology and the study of Collections
of Genomes"

Supervisors: Dr Dawn Field, Prof Mark Bailey, Ed Feil (University
Advisor, University of Bath)
Primary Location: CEH Oxford, Oxford UK
Approaches:  Molecular Evolution and Bioinformatics, Large-scale
comparative genomics
Application Deadline: May 2 (Eligibility extends to UK residents. EU
residents can apply but only for fee-based support).

For more information please contact Dawn Field (dfield at ceh.ac.uk) or Ed
Feil (e.feil at bath.ac.uk).  To apply, please send a CV, brief statement
of past research experience and future research goals, and the names of
two academic referees to Angela Morrison (asmor at ceh.ac.uk, 0865-281630).


Summary

Whole genome sequencing is fuelling an information revolution that is
changing the face of biology. The ability to determine the complete
complement of DNA of a wide range of organisms is allowing us to ask in
unprecedented detail fundamental questions about the molecular basis of
life. There are now more than 1900 genomes from bacteria (Eubacteria and
Archaea), plasmids, phage, viruses and organelles in public databases. 
These genomes are so numerous that they constitute 'collections' of
genomes instead of small sets.  These collections are rapidly growing
and their evolutionary and ecological richness provides an unparalleled
opportunity to investigate the molecular basis of ecological adaptation
using computational approaches.  We are in the process of establishing a
database that combines complete genomes with evolutionary and ecological
meta-data.  Specific tasks to be carried out in this PhD include 1)
writing programming code to detect and characterise core genomic
features, 2) collecting a wider range of descriptive meta-data, and most
importantly, 3) using this resource to test a range of hypotheses about
the evolution and biological significance of shared and unshared genomic
features.  

Key areas of research to be addressed using data primarily from
bacterial genome sequences include, but are not limited to 1) testing
for relationships between ecological features and genomic features, 2)
examining the rate of evolution of features like G+C content and genome
size by mapping traits onto 16S RNA phylogenies and trees based on whole
proteome comparisons, 3) extracting 16S RNA operons from all bacterial
genomes to study the evolution of 16S RNA operons and their flanking
sequences, 4) detecting and studying the numbers and distributions of
orphan genes (putative proteins with no known homologues), and 5)
detecting and quantifying the number of conserved hypothetical
proteins.


From jeff at bioinformatics.org  Fri Apr 18 18:57:36 2003
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Fri, 18 Apr 2003 18:57:36 -0400
Subject: [BiO BB] Volunteers for a quartly newsletter
Message-ID: <3EA082E0.3060306@bioinformatics.org>

Greetings.

We're looking for some volunteers to help produce a quartly newsletter 
for Bioinformatics.Org.  Some of the topics that could be covered in the 
newsletter are as follows:

- Hosted project spotlights: a short review of a project hosted at 
Bioinformatics.Org (we have 105!)

- Articles about free/open source/access in bioinformatics, including 
projects outside of Bioinformatics.Org

- Articles about events in which Bioinformatics.Org is involved (e.g., 
the Annual Meeting)

- Notes about changes and developments in the Organization itself

- Reviews of books related to bioinformatics

- Listings of repository submissions (note that we are slowly developing 
a repository mechanism, which will include software, algorithms, data, 
publications/literature, and educational material--more on this later)

We will need several people to be involved.  Some of the work would 
require good English, writing, and desktop publishing skills, but other 
work wouldn't.  (It would be nice to produce the newsletter in LaTeX, 
but other programs can be used.)  You can even contribute cartoons if 
you'd like ;-)

If you're interested (and note that this is a *volunteer* position, like 
everything else at Bioinformatics.Org for now), please contact me off of 
the list <jeff at bioinformatics.org>.

Cheers.
Jeff
-- 
J.W. Bizzaro                                jeff at bioinformatics.org
President, Bioinformatics.Org       http://bioinformatics.org/~jeff
"As we enjoy great advantages from the inventions of others, we
should be glad of an opportunity to serve others by any invention
of ours; and this we should do freely and generously."
                    -- Benjamin Franklin
--


From yhkhoo at wspc.com  Tue Apr 29 00:01:26 2003
From: yhkhoo at wspc.com (Khoo Yee Hong)
Date: Tue, 29 Apr 2003 12:01:26 +0800
Subject: [BiO BB] Journal of Bioinformatics and Computational Biology (JBCB)
Message-ID: <3EADF916.4090901@wspc.com>

Journal of Bioinformatics and Computational Biology (JBCB)

The Journal of Bioinformatics and Computational Biology aims to publish 
high quality, original research articles, expository tutorial papers and 
review papers as well as short, critical comments on technical issues 
associated with the analysis of cellular information and the use of such 
information in biomedicine.

The research papers will be technical presentations of new assertions, 
discoveries and tools, intended for a narrower specialist community. The 
tutorials, reviews and critical commentary will be targeted at a broader 
readership of biologists who are interested in using computers but are 
not knowledgeable about scientific computing, and equally, computer 
scientists who have an interest in biology but are not familiar with 
current thrusts nor the language of biology. Such carefully chosen 
tutorials and articles should greatly accelerate the rate of entry of 
these new creative scientists into the field.

Researchers in Computer Science and Bioinformatics will find the journal 
a useful resource.  Please go to: 
http://www.worldscinet.com/journals/jbcb/jbcb.shtml to find out how to 
submit papers, request for a complimentary copy or for a detailed 
description of the journal.

The free electronic version of JBCB's inaugural issue can also be found 
on this web site.

Warm Regards,
WorldSciNet


From landman at scalableinformatics.com  Wed Apr 30 01:27:26 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: 30 Apr 2003 01:27:26 -0400
Subject: [BiO BB] Script available for running mpiBLAST using SGE mpich parallel
 environment
Message-ID: <1051680446.16458.61.camel@protein.scalableinformatics.com>

Hi folks:

  I had been working on integrating mpiBLAST into various environments
for a number of customers recently.  After completing this work, I began
playing with the SGE mpich environment for another customer code.  Some
things clicked, and I developed a simple shell script to run mpiBLAST
through the SGE mpich environment.  

  You can find it at
http://scalableinformatics.com/downloads/sge_mpiblast, and the writeup
is at http://scalableinformatics.com/sge_mpiblast.html .  This isn't
terribly fancy, but it worked well for me.  I have since integrated the
basic ideas into other customer environments with success.


Joe

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615