From budhi19 at bi.itb.ac.id  Mon Sep  1 02:25:10 2003
From: budhi19 at bi.itb.ac.id (Narumi ayumu)
Date: Mon, 01 Sep 2003 13:25:10 +0700
Subject: [BiO BB] Bio modelling
In-Reply-To: <003901c36e49$29e09d40$1efea8c0@raptor>
Message-ID: <MDAEMON-F200309011325.AA251159md50003745958@bi.itb.ac.id>

Dear All,

i want to make some  biomodelling in biology aplication
like growth of ecosystem, growth of rotifera etc,
is there any opensource software which can do that?
and where i can find source of papers or publication
about biomodelling?

sincrelly yours

Budhi,


From dmb at mrc-dunn.cam.ac.uk  Mon Sep  1 05:45:30 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Mon, 1 Sep 2003 10:45:30 +0100 (BST)
Subject: [BiO BB] Bio modelling
In-Reply-To: <MDAEMON-F200309011325.AA251159md50003745958@bi.itb.ac.id>
Message-ID: <Pine.LNX.4.21.0309011042480.21926-100000@mail.mrc-dunn.cam.ac.uk>

Have you heard of swarm?

I think that is OS.

http://www.linuxselfhelp.com/HOWTO/AI-Alife-HOWTO-5.html

Looks relevant.

Best of luck !

Dan.

On Mon, 1 Sep 2003, Narumi ayumu wrote:

> Dear All,
> 
> i want to make some  biomodelling in biology aplication
> like growth of ecosystem, growth of rotifera etc,
> is there any opensource software which can do that?
> and where i can find source of papers or publication
> about biomodelling?
> 
> sincrelly yours
> 
> Budhi,
> 
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From rls at cin.ufpe.br  Mon Sep  1 08:40:07 2003
From: rls at cin.ufpe.br (Rafael Luiz da Silva)
Date: Mon, 1 Sep 2003 09:40:07 -0300 (BRT)
Subject: [BiO BB] Re: Help
In-Reply-To: <Pine.LNX.4.44.0308311454210.25614-100000@jurema.cin.ufpe.br>
Message-ID: <Pine.LNX.4.44.0309010940050.29654-100000@jurema.cin.ufpe.br>


[]s Rafael Luiz
(www.cin.ufpe.br/~rls)


From rls at cin.ufpe.br  Mon Sep  1 09:15:47 2003
From: rls at cin.ufpe.br (Rafael Luiz da Silva)
Date: Mon, 1 Sep 2003 10:15:47 -0300 (BRT)
Subject: [BiO BB] Re: Help
In-Reply-To: <Pine.LNX.4.44.0308311454210.25614-100000@jurema.cin.ufpe.br>
Message-ID: <Pine.LNX.4.44.0309011015310.29654-100000@jurema.cin.ufpe.br>

  I need the complete protein sequence of arabidopsis Thaliana and
 Plasmodium falciparum.

  Can some of you help me (or say to me where I get easily)?

  Thanks :)

 []s Rafael Luiz
 (www.cin.ufpe.br/~rls)


From idoerg at burnham.org  Mon Sep  1 16:03:18 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Mon, 01 Sep 2003 13:03:18 -0700
Subject: [BiO BB] Re: Help
In-Reply-To: <Pine.LNX.4.44.0309011015310.29654-100000@jurema.cin.ufpe.br>
References: <Pine.LNX.4.44.0309011015310.29654-100000@jurema.cin.ufpe.br>
Message-ID: <3F53A606.9080503@burnham.org>

NCBI:

http://www.ncbi.nlm.nih.gov/genomes/static/euk_g.html

./I


Rafael Luiz da Silva wrote:
>   I need the complete protein sequence of arabidopsis Thaliana and
>  Plasmodium falciparum.
> 
>   Can some of you help me (or say to me where I get easily)?
> 
>   Thanks :)
> 
>  []s Rafael Luiz
>  (www.cin.ufpe.br/~rls)
> 
> 
> 
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
> 

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From zfu at cs.ucr.edu  Tue Sep  2 16:03:53 2003
From: zfu at cs.ucr.edu (Zheng Fu)
Date: Tue, 2 Sep 2003 13:03:53 -0700 (PDT)
Subject: [BiO BB] Bio modelling
In-Reply-To: <Pine.LNX.4.21.0309011042480.21926-100000@mail.mrc-dunn.cam.ac.uk>
Message-ID: <Pine.LNX.4.44.0309021302040.2376-100000@hill.cs.ucr.edu>

I used Swarm a little bit before. It is not a OS. It is a Jave/Objective C
package. They provides very useful library for complex system simulation.

On Mon, 1 Sep 2003, Dan Bolser wrote:

>
> Have you heard of swarm?
>
> I think that is OS.
>
> http://www.linuxselfhelp.com/HOWTO/AI-Alife-HOWTO-5.html
>
> Looks relevant.
>
> Best of luck !
>
> Dan.
>
> On Mon, 1 Sep 2003, Narumi ayumu wrote:
>
> > Dear All,
> >
> > i want to make some  biomodelling in biology aplication
> > like growth of ecosystem, growth of rotifera etc,
> > is there any opensource software which can do that?
> > and where i can find source of papers or publication
> > about biomodelling?
> >
> > sincrelly yours
> >
> > Budhi,
> >
> >
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> >
>
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>

-- 
Love & Peace


From lucifer at chiark.greenend.org.uk  Tue Sep  2 07:03:36 2003
From: lucifer at chiark.greenend.org.uk (Lucy McWilliam)
Date: Tue, 2 Sep 2003 12:03:36 +0100 (BST)
Subject: [BiO BB] Re: Help
In-Reply-To: <20030901160112.BB4BBD2865@www.bioinformatics.org>
Message-ID: <Pine.LNX.4.21.0309021155170.4689-100000@chiark.greenend.org.uk>

Rafael Luiz da Silva wrote:

> I need the complete protein sequence of arabidopsis Thaliana and
> Plasmodium falciparum.

ftp://ftp.arabidopsis.org/home/tair/ (currently refusing connections)
http://plasmodb.org/restricted/GridddPf.shtml


Lucy McWilliam
http://www.flychip.org.uk/


From dmb at mrc-dunn.cam.ac.uk  Wed Sep  3 04:32:09 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Wed, 3 Sep 2003 09:32:09 +0100 (BST)
Subject: [BiO BB] Bio modelling
In-Reply-To: <Pine.LNX.4.44.0309021302040.2376-100000@hill.cs.ucr.edu>
Message-ID: <Pine.LNX.4.21.0309030930380.14479-100000@mail.mrc-dunn.cam.ac.uk>

Sorry, bad abbriv. (Open Source).

Looks like there is a lot of work in the field
of population dynamics though, can you apply
that to growth?

I know growth processes have been studied a lot
in fractal geometry.

Cheers, 


On Tue, 2 Sep 2003, Zheng Fu wrote:

> I used Swarm a little bit before. It is not a OS. It is a Jave/Objective C
> package. They provides very useful library for complex system simulation.
> 
> On Mon, 1 Sep 2003, Dan Bolser wrote:
> 
> >
> > Have you heard of swarm?
> >
> > I think that is OS.
> >
> > http://www.linuxselfhelp.com/HOWTO/AI-Alife-HOWTO-5.html
> >
> > Looks relevant.
> >
> > Best of luck !
> >
> > Dan.
> >
> > On Mon, 1 Sep 2003, Narumi ayumu wrote:
> >
> > > Dear All,
> > >
> > > i want to make some  biomodelling in biology aplication
> > > like growth of ecosystem, growth of rotifera etc,
> > > is there any opensource software which can do that?
> > > and where i can find source of papers or publication
> > > about biomodelling?
> > >
> > > sincrelly yours
> > >
> > > Budhi,
> > >
> > >
> > > _______________________________________________
> > > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > >
> >
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> >
> 
> 


From dmb at mrc-dunn.cam.ac.uk  Wed Sep  3 05:37:29 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Wed, 3 Sep 2003 10:37:29 +0100 (BST)
Subject: [BiO BB] Clustering
In-Reply-To: <003901c36e49$29e09d40$1efea8c0@raptor>
Message-ID: <Pine.LNX.4.21.0309031035230.15608-100000@mail.mrc-dunn.cam.ac.uk>

Hi, 

What packages support clustering of points
with a with a similarity matrix?

How can I derive the similarity of two matrices?

Cheers, 
Dan.


From a.bashir at bioc.cam.ac.uk  Wed Sep  3 10:59:03 2003
From: a.bashir at bioc.cam.ac.uk (Asam Bashir)
Date: Wed, 3 Sep 2003 17:59:03 +0300
Subject: [BiO BB] Re: Looking for suggestions on where to hold next
 meeting
In-Reply-To: <1f1.efe98a4.2c8417d6@aol.com>
References: <1f1.efe98a4.2c8417d6@aol.com>
Message-ID: <p06001201bb7bb1e82012@[213.16.149.254]>

>Me to  (Boston)
>Mark

Fancy Athens, Greece?

http://bioinformatics.biol.uoa.gr/


From idoerg at burnham.org  Wed Sep  3 12:19:08 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Wed, 03 Sep 2003 09:19:08 -0700
Subject: [BiO BB] Clustering
In-Reply-To: <Pine.LNX.4.21.0309031035230.15608-100000@mail.mrc-dunn.cam.ac.uk>
References: <Pine.LNX.4.21.0309031035230.15608-100000@mail.mrc-dunn.cam.ac.uk>
Message-ID: <3F56147C.4070600@burnham.org>


Dan Bolser wrote:
> Hi, 
> 
> What packages support clustering of points
> with a with a similarity matrix?

I don't think I quite understand the question, can you elaborate on that?

> 
> How can I derive the similarity of two matrices?
> 

If you mean that you would like to check how "close" two similarity 
matrices (e.g. BLOSUM, PAM) are to each other, then one method is to 
compare the amino-acid pair frequency distributions used to construct 
these matrices. Look to the following paper (fig 4, and the last 
paragraph in the "methods" section) for one example on how to do this, 
although other methods of comparing distributions may be used just as 
effectively:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=11790845&dopt=Abstract

./I


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From derek at biotechrecruiter.org  Wed Sep  3 12:41:58 2003
From: derek at biotechrecruiter.org (Derek Pyper)
Date: Wed, 03 Sep 2003 09:41:58 -0700
Subject: [BiO BB] Computational Chemist/Biologist Positions
Message-ID:  <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAtLCII0a1PEWB9xUFtGpvPsKAAAAQAAAAoZq3M1c/KUq1bgDzuuhNBQEAAAAA@biotechrecruiter.org>

Hi All,

 
I am working on positions for extraordinarily gifted computational
chemists and other computational scientists that are sought to join a
rapidly growing New York-based research group that is pursuing an
ambitious, long-term strategy aimed at fundamentally transforming the
process of drug discovery.

 
Candidates should have world-class credentials in computational
chemistry, biology, or physics, or in a relevant area of computer
science or applied mathematics, and must have unusually strong research
and software engineering skills.  Relevant areas of experience might
include the computation of protein-ligand binding free energies,
molecular dynamics and/or Monte Carlo simulations of biomolecular
systems, application of statistical mechanics to biomolecular systems,
free energy perturbation methods, and methods for speeding up evaluation
of electrostatic energies -- but specific knowledge of any of these
areas is less critical than exceptional intellectual ability and a
demonstrated track record of achievement.  Current areas of interest
within the group include the prediction of protein structures and
binding free energies, structure- and ligand-based drug design, de novo
ligand design algorithms, and the development of special-purpose
hardware to accelerate computational chemistry simulations.

 
This research effort is being financed by a confidential investment and
technology development firm with approximately $5 billion in aggregate
capital.  The project was initiated by the firm's founder, and operates
under his direct scientific leadership.  

 
We are eager to add both senior- and junior-level members to our
world-class team, and are prepared to offer above-market compensation to
candidates of truly exceptional ability. Please send your CV to
derek at biotech-recruiters.com

 
Please send:

-your resume (including list of publications, thesis topic, and advisor,
if applicable),
-history of academic performance (including GPAs as well as SAT, GRE,
and other standardized test scores),
-salary requirement
-relocation considerations (if any)
-your work authorization (h1B, etc)

Warm Regards, 

 
Derek Pyper

Biotech Recruiters International, Inc.

Principal

Office       916-652-2186

Fax          916-652-2178

Email:      Derek at biotech-recruiters.com

URL:        www.biotech-recruiters.com

 
CONFIDENTIALITY STATEMENT: This electronic message contains privileged
and confidential information from  Biotech Recruiters International,
Inc.  This information is intended solely for the use of the
individual(s) or entity(ies) named above. If you are not the intended
recipient, be aware that any disclosure, copying, distribution, or use
of the contents of this message is prohibited. If you have received this
email in error, please notify us immediately by telephone at
916-652-2186  or by email reply. Thank you.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030903/e727daa8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Derek Pyper (derek at biotech-recruiters.com).vcf
Type: text/x-vcard
Size: 524 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030903/e727daa8/attachment.vcf>

From derek at biotechrecruiter.org  Wed Sep  3 12:43:06 2003
From: derek at biotechrecruiter.org (Derek Pyper)
Date: Wed, 03 Sep 2003 09:43:06 -0700
Subject: [BiO BB] Computational Chemist/Biologist Positions
Message-ID:  <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAtLCII0a1PEWB9xUFtGpvPsKAAAAQAAAAoZq3M1c/KUq1bgDzuuhNBQEAAAAA@biotechrecruiter.org>

Hi All,

 
I am working on positions for extraordinarily gifted computational
chemists and other computational scientists that are sought to join a
rapidly growing New York-based research group that is pursuing an
ambitious, long-term strategy aimed at fundamentally transforming the
process of drug discovery.

 
Candidates should have world-class credentials in computational
chemistry, biology, or physics, or in a relevant area of computer
science or applied mathematics, and must have unusually strong research
and software engineering skills.  Relevant areas of experience might
include the computation of protein-ligand binding free energies,
molecular dynamics and/or Monte Carlo simulations of biomolecular
systems, application of statistical mechanics to biomolecular systems,
free energy perturbation methods, and methods for speeding up evaluation
of electrostatic energies -- but specific knowledge of any of these
areas is less critical than exceptional intellectual ability and a
demonstrated track record of achievement.  Current areas of interest
within the group include the prediction of protein structures and
binding free energies, structure- and ligand-based drug design, de novo
ligand design algorithms, and the development of special-purpose
hardware to accelerate computational chemistry simulations.

 
This research effort is being financed by a confidential investment and
technology development firm with approximately $5 billion in aggregate
capital.  The project was initiated by the firm's founder, and operates
under his direct scientific leadership.  

 
We are eager to add both senior- and junior-level members to our
world-class team, and are prepared to offer above-market compensation to
candidates of truly exceptional ability. Please send your CV to
derek at biotech-recruiters.com

 
Please send:

-your resume (including list of publications, thesis topic, and advisor,
if applicable),
-history of academic performance (including GPAs as well as SAT, GRE,
and other standardized test scores),
-salary requirement
-relocation considerations (if any)
-your work authorization (h1B, etc)

Warm Regards, 

 
Derek Pyper

Biotech Recruiters International, Inc.

Principal

Office       916-652-2186

Fax          916-652-2178

Email:      Derek at biotech-recruiters.com

URL:        www.biotech-recruiters.com

 
CONFIDENTIALITY STATEMENT: This electronic message contains privileged
and confidential information from  Biotech Recruiters International,
Inc.  This information is intended solely for the use of the
individual(s) or entity(ies) named above. If you are not the intended
recipient, be aware that any disclosure, copying, distribution, or use
of the contents of this message is prohibited. If you have received this
email in error, please notify us immediately by telephone at
916-652-2186  or by email reply. Thank you.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030903/e727daa8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Derek Pyper (derek at biotech-recruiters.com).vcf
Type: text/x-vcard
Size: 524 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030903/e727daa8/attachment-0001.vcf>

From dmb at mrc-dunn.cam.ac.uk  Wed Sep  3 12:46:56 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Wed, 3 Sep 2003 17:46:56 +0100 (BST)
Subject: [BiO BB] Clustering
In-Reply-To: <3F56147C.4070600@burnham.org>
Message-ID: <Pine.LNX.4.21.0309031735100.21608-100000@mail.mrc-dunn.cam.ac.uk>

> > What packages support clustering of points
> > with a with a similarity matrix?
> 
> I don't think I quite understand the question, can you elaborate on that?

Yup... I am always finding that I have some similarities between things,
and I would like to be able to do a simple clustering of the points,
but I am not familiar with the algoithms, so I would just like to play
around a bit.

I know you can do phylogenetic analysis on any similarity matrix, but
I don't need the high resolution (many similar points closly linked to
one short branch). I would like to generally see what 'blobs' of data
I have without investing too much time into the analysis (or the
computation!).

For example I might have the AA composition of 1000 sequences, and we
may suspect that the composition is biased across these sequences (not 
uniform). So we think - maby I should break up into secondary structure,
maby into families, maby I should perform chi-squaird between every
possible combination of groups of the 1000 to find sub populations within
which the composition isn't biased...

If I take each protein and compare it's composition to every other, I have
an N**2/2 similarity matrix, which I would like to cluster, just to see
if any protein families, structural classes or taxonomic groups have a
particular bias in terms of AA composition, but this is a long complicated
analysis (I think to myself), so I don't bother.

Now I ask I am sure there are 1000's of clustering toolkits out there, 
I should just google. Does anyone have any recomendations?


> > How can I derive the similarity of two matrices?
> > 
> 
> If you mean that you would like to check how "close" two similarity 
> matrices (e.g. BLOSUM, PAM) are to each other, then one method is to 
> compare the amino-acid pair frequency distributions used to construct 
> these matrices. 

You mean the similarity of two distributions? sounds interesting...

> Look to the following paper (fig 4, and the last 
> paragraph in the "methods" section) for one example on how to do this, 
> although other methods of comparing distributions may be used just as 
> effectively:
> 
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=11790845&dopt=Abstract

Thanks very much,
Dan.


> ./I
> 
> 
> 
> 
> 


From mgollery at unr.edu  Wed Sep  3 13:00:03 2003
From: mgollery at unr.edu (Martin Gollery)
Date: Wed,  3 Sep 2003 10:00:03 -0700
Subject: [BiO BB] InterProScan on SGE
Message-ID: <1062608403.3f561e13ab43c@webmail.unr.edu>

Has anyone gotten InterProScan to work on a cluster using SGE? I have 
substituted qsub for bsub in the queueing configuration, to no avail. Anybody 
who has gotten this to work, please contact me.

Martin Gollery
Associate Director of Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS330
New phone number! 775-784-7042


-------------------------------------------------
This mail sent through https://webmail.unr.edu


From derek at biotechrecruiter.org  Wed Sep  3 13:34:49 2003
From: derek at biotechrecruiter.org (Derek Pyper)
Date: Wed, 03 Sep 2003 10:34:49 -0700
Subject: [BiO BB] Applications Scientist Position
Message-ID:  <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAtLCII0a1PEWB9xUFtGpvPsKAAAAQAAAA5jvizldpQkiGkrqmd43IWQEAAAAA@biotechrecruiter.org>

Hi,

 
Application Scientist/ Collaborative eR&D Senior Consultant


1.    Mission


*        To deliver to our customers a solution which exceeds their
expectations while remaining on time and within budget

*        To ensure that each deployment of our software is completed in
a professional, consistent and successful manner. 

*        To maximize customer retention and penetration by offering a
world class client support service 


2.    Key Tasks


2.1.   Responsible for the successful deployment of our product within
client organisations

2.2.   Working with researchers within the client company to analyse the
current research procedures and processes and to configure our product
to replicate these

2.3.   Write requirements specifications based on customer feedback and
liase with the product development group to ensure these customisations
are delivered on time and to spec

2.4.   The training of client staff in the use of the product

2.5.   Working closely with sales and business development to ensure
that each client's sales potential is fully exploited

2.6.   To liase with the client's IT dept to ensure that our product is
installed and configured correctly 

2.7.   Once deployment has been completed, be the primary point of
contact for client support issues

2.8.   Be involved in designing customer-training materials and handling
change management issues within our client sites.

2.9.   Liase with other departments within the clients organization to
ensure that the usage of our product is as widespread as possible


3.    Key Relationships


3.1.   Reporting relationship - reporting to the VP of Products and
Services

3.2.   Liase closely with the Product Development manager 


4.    Key Values


We are a global company headquartered in Cork dedicated to a clear
mission. It is also a knowledge based company which is concerned to
harness fully the knowledge and skills of its staff. The Collaborative
eR&D Senior Consultant is expected to adopt a professional style which
embodies in practice certain key organisational values:

4.1.   Skill in dealing with people issues.

4.2.   Solidarity with other members of the team

4.3.   Fairness in dealing with staff regardless of their background and
position and in handling all personnel issues and situations.

4.4.   Harnessing the talents, ideas and abilities of staff.

4.5.   Proactive development of the attitudes, knowledge, and skill of
people.

4.6.   The creation and maintenance of standards of performance and work
discipline.


5.    Key Expectations


5.1.   Mainly based on the west coast or east coast depending on
individuals preference.

5.2.   Expected to travel.

5.3.   In exceptional circumstances people will be expected to work
outside of normal business hours.           

5.4.   Expected to remain on site for the duration of a deployment (can
be up to 2 months)


6.    Key Technical Skills


6.1.   Worked as a researcher in an R&D organization 

6.2.   Familiar with general R&D processes and procedures e.g. GLP

6.3.   Knowledge of statistics e.g. Design of Experiments

6.4.   Intermediate IT knowledge

6.5.   Requirements gathering, systems analysis experience (ideally)

6.6.   Have been involved in a support role before (ideally)


7.    Key People Skills


7.1.   Excellent communication skills both spoken and written.

7.2.   Excellent Interpersonal skills and a strong team player.

7.3.   Negotiation, persuasion and presentation skills.

7.4.   Ability to communicate to a wide variety of audiences -
executive, management and researcher.


8.    Key Business Skills


8.1.   Managed the deployment of an enterprise IT system.

8.2.   Project management 

8.3.   Account Management/Sales (Ideally)


9.    Educational/Professional Requirements


9.1.   Science qualification (preferably to Phd level), with a strong
statistical content.

9.2.   Ideally would have a foreign language

9.3.   You will from a Proteomics or Biochemistry background

9.4.   Superb interpersonal skills

Please email a copy of your CV to Derek at biotech-recruiters.com for
further consideration.

 
Derek Pyper

Biotech Recruiters International, Inc.

Principal

Office       916-652-2186

Fax          916-652-2178

Email:      Derek at biotech-recruiters.com

URL:        www.biotech-recruiters.com

 
CONFIDENTIALITY STATEMENT: This electronic message contains privileged
and confidential information from  Biotech Recruiters International,
Inc.  This information is intended solely for the use of the
individual(s) or entity(ies) named above. If you are not the intended
recipient, be aware that any disclosure, copying, distribution, or use
of the contents of this message is prohibited. If you have received this
email in error, please notify us immediately by telephone at
916-652-2186  or by email reply. Thank you.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030903/59353cb8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Derek Pyper (derek at biotech-recruiters.com).vcf
Type: text/x-vcard
Size: 524 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030903/59353cb8/attachment.vcf>

From thompson at wadsworth.org  Wed Sep  3 13:40:41 2003
From: thompson at wadsworth.org (William Thompson)
Date: Wed, 3 Sep 2003 13:40:41 -0400 (EDT)
Subject: [BiO BB] Clustering
Message-ID: <200309031740.h83Hefk13720@csserv.wadsworth.org>

Dan 
If all you are looking for is simple clustering, check out R 
http://www.r-project.org/ It has an extensive clustering package.

Bill

Bill Thompson, PhD
Center for Bioinformatics
Wadsworth Center
NY State Dept of Health
ESP C-644
P.O. Box 509
Albany, NY  12201-0509
phone: (518) 486-7882


> Date: Wed, 3 Sep 2003 17:46:56 +0100 (BST)
> From: Dan Bolser <dmb at mrc-dunn.cam.ac.uk>
> To: bio_bulletin_board at bioinformatics.org
> Subject: Re: [BiO BB] Clustering
> Reply-To: bio_bulletin_board at bioinformatics.org
> 
> 
> > > What packages support clustering of points
> > > with a with a similarity matrix?
> > 
> > I don't think I quite understand the question, can you elaborate on that?
> 
> Yup... I am always finding that I have some similarities between things,
> and I would like to be able to do a simple clustering of the points,
> but I am not familiar with the algoithms, so I would just like to play
> around a bit.
> 
> I know you can do phylogenetic analysis on any similarity matrix, but
> I don't need the high resolution (many similar points closly linked to
> one short branch). I would like to generally see what 'blobs' of data
> I have without investing too much time into the analysis (or the
> computation!).
> 
> For example I might have the AA composition of 1000 sequences, and we
> may suspect that the composition is biased across these sequences (not 
> uniform). So we think - maby I should break up into secondary structure,
> maby into families, maby I should perform chi-squaird between every
> possible combination of groups of the 1000 to find sub populations within
> which the composition isn't biased...
> 
> If I take each protein and compare it's composition to every other, I have
> an N**2/2 similarity matrix, which I would like to cluster, just to see
> if any protein families, structural classes or taxonomic groups have a
> particular bias in terms of AA composition, but this is a long complicated
> analysis (I think to myself), so I don't bother.
> 
> Now I ask I am sure there are 1000's of clustering toolkits out there, 
> I should just google. Does anyone have any recomendations?
> 


From tfiedler at rsmas.miami.edu  Wed Sep  3 15:09:34 2003
From: tfiedler at rsmas.miami.edu (Tristan Fiedler)
Date: Wed, 3 Sep 2003 15:09:34 -0400 (EDT)
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <20030903160114.A85CDD2853@www.bioinformatics.org>
References: <20030903160114.A85CDD2853@www.bioinformatics.org>
Message-ID: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu>

Dear Bio Gurus!

Two quick questions :

1.  could someone please assist me in writing a shell script (awk, sed,
etc.) which would use a loop to run thru about 1000 files (filenames all
end in '.seq') and remove all occurences of control-M, resulting in a file
containing the sequence on a single line.

Currently each file looks similar to :

% cat -v seq_018_G05.seq
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^M
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGG^M
TTTTTTTTTTTTTTTTCCCAAAAAAAAAAAAA^M


2.  We are planning to buy a workstation for our local (~3 labs producing
sequences from an ABI sequencer) genomics needs (lots of blast runs,
database management, standard bioinformatics software), and were planning
on getting something like :

4 GB RAM  (is this enough for doing local blast searches against genbank?)
2 x 3 GHz Xeon processors (how about Mac OSX?)
400 GB storage


Thank you - and feel free to reply directly to me (not waste bb resources).

Cheers!


-- 
Tristan J. Fiedler, Ph.D.
Postdoctoral Research Fellow
NIEHS Marine & Freshwater Biomedical Sciences Center
Rosenstiel School of Marine & Atmospheric Sciences
University of Miami

tfiedler at rsmas.miami.edu
t.fiedler at umiami.edu (alias)
305-361-4626


From idoerg at burnham.org  Wed Sep  3 17:00:18 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Wed, 03 Sep 2003 14:00:18 -0700
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu>
References: <20030903160114.A85CDD2853@www.bioinformatics.org> <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu>
Message-ID: <3F565662.4020405@burnham.org>


Tristan Fiedler wrote:
> Dear Bio Gurus!
> 
> Two quick questions :
> 
> 1.  could someone please assist me in writing a shell script (awk, sed,
> etc.) which would use a loop to run thru about 1000 files (filenames all
> end in '.seq') and remove all occurences of control-M, resulting in a file
> containing the sequence on a single line.
> 
> Currently each file looks similar to :
> 
> % cat -v seq_018_G05.seq
> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^M
> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGGGGG^M
> TTTTTTTTTTTTTTTTCCCAAAAAAAAAAAAA^M
> 

Sounds like you need the dos2unix utility. Comes bundled in with Linux, 
in case you are working on another OS, you can download it free.. use 
Google to find it.


> 
> 2.  We are planning to buy a workstation for our local (~3 labs producing
> sequences from an ABI sequencer) genomics needs (lots of blast runs,
> database management, standard bioinformatics software), and were planning
> on getting something like :
> 
> 4 GB RAM  (is this enough for doing local blast searches against genbank?)

Definitely, that's what I have, haven't had any issues. BLAST/PSI-BLAST 
is not that memory-intensive actually.

> 2 x 3 GHz Xeon processors (how about Mac OSX?)

The more processors, the merrier. BLAST parallelizes nicely. Regarding 
OS: I'm partial to Linux, but that's me.

> 400 GB storage
> 

You can always add more, and 400 is ample for starters.

> 
> Thank you - and feel free to reply directly to me (not waste bb resources).
> 
> Cheers!
> 
> 
> 

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From p.pagel at gsf.de  Thu Sep  4 02:51:50 2003
From: p.pagel at gsf.de (Philipp Pagel)
Date: Thu, 4 Sep 2003 08:51:50 +0200
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu>
References: <20030903160114.A85CDD2853@www.bioinformatics.org> <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu>
Message-ID: <20030904065149.GB1960@porcupine.gsf.de>

> 1.  could someone please assist me in writing a shell script (awk, sed,
> etc.) which would use a loop to run thru about 1000 files (filenames all
> end in '.seq') and remove all occurences of control-M, resulting in a file
> containing the sequence on a single line.

No script required. Just a one-liner...
cd into the folder with your sequences and do this:

for f in *.seq; do tr -d '\n' < $f > tmp_sequence; mv tmp_sequence $f; done


> 2.  We are planning to buy a workstation for our local (~3 labs producing
> sequences from an ABI sequencer) genomics needs (lots of blast runs,
> database management, standard bioinformatics software), and were planning
> on getting something like :
> 
> 4 GB RAM  (is this enough for doing local blast searches against genbank?)
> 2 x 3 GHz Xeon processors (how about Mac OSX?)
> 400 GB storage

Sounds like a nice machine. Certainly big enough for BLAST.

cu
	Philipp

-- 
Dr. Philipp Pagel                                Tel.  +49-89-3187-3675
Institute for Bioinformatics / MIPS              Fax.  +49-89-3187-3585
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1
85764 Neuherberg, Germany


From mkgovindis at yahoo.com  Thu Sep  4 05:07:08 2003
From: mkgovindis at yahoo.com (govind mk)
Date: Thu, 4 Sep 2003 02:07:08 -0700 (PDT)
Subject: [BiO BB] Swissprot to Refseq entry mapping
In-Reply-To: <20030904065149.GB1960@porcupine.gsf.de>
Message-ID: <20030904090708.1449.qmail@web40104.mail.yahoo.com>

hi all 

I would like to know if there is any file which can
map human proteins of swissprot to Locuslink or Refseq
entries of NCBI or is there any round about way to
achieve that.

Thank you

Regards
M.K.Govind

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com


From dmb at mrc-dunn.cam.ac.uk  Thu Sep  4 06:23:54 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Thu, 4 Sep 2003 11:23:54 +0100 (BST)
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu>
Message-ID: <Pine.LNX.4.21.0309041123070.22563-100000@mail.mrc-dunn.cam.ac.uk>

> 2.  We are planning to buy a workstation for our local (~3 labs producing
> sequences from an ABI sequencer) genomics needs (lots of blast runs,
> database management, standard bioinformatics software), and were planning
> on getting something like :
> 
> 4 GB RAM  (is this enough for doing local blast searches against genbank?)
> 2 x 3 GHz Xeon processors (how about Mac OSX?)
> 400 GB storage

The new dual processor xeon with hyperthreading (gives 4 cpu!) are great!

Get as much storage as you can!


From idh at poulet.org  Thu Sep  4 08:03:04 2003
From: idh at poulet.org (Yannick Wurm)
Date: Thu, 4 Sep 2003 14:03:04 +0200
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu>
Message-ID: <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>

Hi,
I don't have any hands-on experience with large-volume blasting, but 
you might want to have a look at Apple's new G5 computers.
The numbers shown in the "Performance Whitepaper" of june 2003 (you'll 
find a link at http://www.apple.com/lae/g5/ ) are quite impressive. 
Apple compared the performance of the dual 2GHz Power Mac G5 running 
Apple/Genentech BLAST with a 3GHz Pentium 4-based system and a dual 
3.06GHz Xeon-based system, both running Red Hat Linux 9.0 and NCBI 
BLAST.

A/G BLAST is an optimized version of NCBI
BLAST developed by Apple in collaboration
with Genentech. Optimized for dual PowerPC
G5 processors, the Velocity Engine, and the
symmetric multiprocessing capabilities of
Mac OS X, A/G BLAST makes a wide variety
of searches available at higher speeds.

According to the graph they show, using a word length of 40, the Dual 
G5 ran 3 million nucleotides per second whereas the Linux boxes did 
only about 0.75.

The same paper also states that HMMer runs 4 times faster on the dual 
g5 than on the dual xeon.

And Microsoft Word runs on it too :)

Are the results shown surprising? The code for A/G Blast and HMMer seem 
to be optimized for the G5, whereas the standard vanilla versions where 
used on the linux boxes. Could optimization reduce the bias against the 
xeon?

Yannick Wurm

\\\\\\\\\\\\\\\\\\\
\\  http://yannick.poulet.org icq: 22044361
\\  idh at poulet.org  tel: ++33.6.16.41.71.92


From dmb at mrc-dunn.cam.ac.uk  Thu Sep  4 08:13:32 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Thu, 4 Sep 2003 13:13:32 +0100 (BST)
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>
Message-ID: <Pine.LNX.4.21.0309041308200.30496-100000@mail.mrc-dunn.cam.ac.uk>

On Thu, 4 Sep 2003, Yannick Wurm wrote:
> 
> And Microsoft Word runs on it too :)

;)

> 
> Are the results shown surprising? The code for A/G Blast and HMMer seem 
> to be optimized for the G5, whereas the standard vanilla versions where 
> used on the linux boxes. Could optimization reduce the bias against the 
> xeon?

I hope so!

Did they run blast with the --number_of_cpus option set to 2 or 4?

The kernel level hyperthreading on the Xeon simulates 2n cpu's, but
I am not sure if this 'really' works (i.e. 4*2Gh from a 2*2Gh board).

Also I found the --number_of_cpus to be sub optimal in terms of cpu
usage. I prefer to simply start 4 jobs at the same time, so maby
this would improve the benchmark considerably (I got from ~50%
average usage on each cpu to ~100%).

Did they write jobs to /dev/null ?

if not the IO could be a (hidden?) factor.

Cheers,


> 
> Yannick Wurm
> 
> \\\\\\\\\\\\\\\\\\\
> \\  http://yannick.poulet.org icq: 22044361
> \\  idh at poulet.org  tel: ++33.6.16.41.71.92
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From steletch at biomedicale.univ-paris5.fr  Thu Sep  4 09:01:48 2003
From: steletch at biomedicale.univ-paris5.fr (=?ISO-8859-1?Q?Teletch=E9a_St=E9phane?=)
Date: 04 Sep 2003 15:01:48 +0200
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>
References: <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>
Message-ID: <1062680508.11157.8.camel@pcumr70.biomedicale.univ-paris5.fr>

Le jeu 04/09/2003 ? 14:03, Yannick Wurm a ?crit :
> Hi,
> I don't have any hands-on experience with large-volume blasting, but 
> you might want to have a look at Apple's new G5 computers.
> The numbers shown in the "Performance Whitepaper" of june 2003 (you'll 
> find a link at http://www.apple.com/lae/g5/ ) are quite impressive. 
> Apple compared the performance of the dual 2GHz Power Mac G5 running 
> Apple/Genentech BLAST with a 3GHz Pentium 4-based system and a dual 
> 3.06GHz Xeon-based system, both running Red Hat Linux 9.0 and NCBI 
> BLAST.
> 
> A/G BLAST is an optimized version of NCBI
> BLAST developed by Apple in collaboration
> with Genentech. Optimized for dual PowerPC
> G5 processors, the Velocity Engine, and the
> symmetric multiprocessing capabilities of
> Mac OS X, A/G BLAST makes a wide variety
> of searches available at higher speeds.
> 
> According to the graph they show, using a word length of 40, the Dual 
> G5 ran 3 million nucleotides per second whereas the Linux boxes did 
> only about 0.75.
> 
> The same paper also states that HMMer runs 4 times faster on the dual 
> g5 than on the dual xeon.
> 
> And Microsoft Word runs on it too :)
> 
> Are the results shown surprising? The code for A/G Blast and HMMer seem 
> to be optimized for the G5, whereas the standard vanilla versions where 
> used on the linux boxes. Could optimization reduce the bias against the 
> xeon?
> 
> Yannick Wurm
> 

Personnaly speaking, i would not that easily accept a benchmark without
doing it myself first ...

Secopnd, why didn't they compare the dual-G5 to its real counterpart
(what you will get for the same price): Opteron or Itanium.

I'll bet they figures will be dramatically different. The dual-G5 is not
a working station but a server for desktop, so let's consider the server
solutions !

See some comparisons at :
http://www.alineos.com/Benchs/bench1.html

Stef


-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030904/a26586bb/attachment.sig>

From mgollery at unr.edu  Thu Sep  4 12:17:41 2003
From: mgollery at unr.edu (Martin Gollery)
Date: Thu,  4 Sep 2003 09:17:41 -0700
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>
References: <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>
Message-ID: <1062692261.3f5765a546a62@webmail.unr.edu>

The real key in this performance whitepaper may not be the processor or 
operating system. Note that the word size used here was 40, which will lead to 
dramatically fewer HSP extensions, and thus a much higher speed on any system. 
Of course, you might miss a lot of hits, depending on the data. Due to the 
larger word size, one might more accurately compare A/G blast with NCBI's 
megablast. On the other hand, if you would like speed with sensitivity check 
out PatternHunter from Bioinformatics Solutions, or if you have the money, 
TeraBlast from TimeLogic.

Still, I think that Apple is becoming an attractive solution for bioinformatics 
due to optimizations with the Altivec. The HMMpfam info looks promising, 
although I have not tested them myself.

Marty

Quoting Yannick Wurm <idh at poulet.org>:

> Hi,
> I don't have any hands-on experience with large-volume blasting, but 
> you might want to have a look at Apple's new G5 computers.
> The numbers shown in the "Performance Whitepaper" of june 2003 (you'll 
> find a link at http://www.apple.com/lae/g5/ ) are quite impressive. 
> Apple compared the performance of the dual 2GHz Power Mac G5 running 
> Apple/Genentech BLAST with a 3GHz Pentium 4-based system and a dual 
> 3.06GHz Xeon-based system, both running Red Hat Linux 9.0 and NCBI 
> BLAST.
> 
> A/G BLAST is an optimized version of NCBI
> BLAST developed by Apple in collaboration
> with Genentech. Optimized for dual PowerPC
> G5 processors, the Velocity Engine, and the
> symmetric multiprocessing capabilities of
> Mac OS X, A/G BLAST makes a wide variety
> of searches available at higher speeds.
> 
> According to the graph they show, using a word length of 40, the Dual 
> G5 ran 3 million nucleotides per second whereas the Linux boxes did 
> only about 0.75.
> 
> The same paper also states that HMMer runs 4 times faster on the dual 
> g5 than on the dual xeon.
> 
> And Microsoft Word runs on it too :)
> 
> Are the results shown surprising? The code for A/G Blast and HMMer seem 
> to be optimized for the G5, whereas the standard vanilla versions where 
> used on the linux boxes. Could optimization reduce the bias against the 
> xeon?
> 
> Yannick Wurm
> 
> \\\\\\\\\\\\\\\\\\\
> \\  http://yannick.poulet.org icq: 22044361
> \\  idh at poulet.org  tel: ++33.6.16.41.71.92
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


Martin Gollery
Associate Director of Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS330
New phone number! 775-784-7042


-------------------------------------------------
This mail sent through https://webmail.unr.edu


From luo_2005 at yahoo.com  Thu Sep  4 19:20:29 2003
From: luo_2005 at yahoo.com (Phil Luo)
Date: Thu, 4 Sep 2003 16:20:29 -0700 (PDT)
Subject: [BiO BB] orthologs vs in-paralogs
Message-ID: <20030904232029.18724.qmail@web20704.mail.yahoo.com>

Dear all,
 
As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two species that have directly evolved from a single gene in the last common ancestor are called orthologs. A set of homologous genes that have diverged from each other as a consequence of genetic duplication are called paralogs. Sometime those paralogs which arose from a duplication after the speciation event are called in-paralogs.
 
My question is how to distinguish the in-paralogs from orthologs. Which one is supposed to be more similar, in-paralogs or orthologs?
 
Best regards,
Phil


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030904/9169b471/attachment.html>

From dmb at mrc-dunn.cam.ac.uk  Fri Sep  5 02:38:56 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Fri, 5 Sep 2003 07:38:56 +0100 (BST)
Subject: [BiO BB] orthologs vs in-paralogs
In-Reply-To: <20030904232029.18724.qmail@web20704.mail.yahoo.com>
References: <20030904232029.18724.qmail@web20704.mail.yahoo.com>
Message-ID: <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk>

Phil Luo said:
> Dear all,
>
> As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two
> species that have directly evolved from a single gene in the last common ancestor
> are called orthologs. A set of homologous genes that have diverged from each other
> as a consequence of genetic duplication are called paralogs. Sometime those
> paralogs which arose from a duplication after the speciation event are called
> in-paralogs.
>
> My question is how to distinguish the in-paralogs from orthologs. Which one is
> supposed to be more similar, in-paralogs or orthologs?

Hi,
Good question! Maby someone on the sequence searching mailing list can help answer,

http://bioinformatics.org/mailman/listinfo/ssml-general


I know of some work trying to uncover 'lineage specific gene expansion' by Eugene
Koonin (sp?) at the NCBI. That sounds a bit like the in-paralogues you describe.
Also he and coworkers define an algorithm for predicting orthologous pairs, simply
'best hits' between genome 1 and 2.

Although I understand the definition of orthology and paralogy, I find the concepts
a bit confusing. I don't know what information you loose by simply talking about
gene families, and ignoring the within / between genome distinction.

At some level does't ortholog mean 'same gene', and paralog mean 'copy'?

Cheers,

> Best regards,
> Phil
>
>
> ---------------------------------
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software


From hz5 at njit.edu  Fri Sep  5 08:31:05 2003
From: hz5 at njit.edu (hz5 at njit.edu)
Date: Fri, 05 Sep 2003 08:31:05 -0400 (EDT)
Subject: [BiO BB] orthologs vs in-paralogs
In-Reply-To: <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk>
References: <20030904232029.18724.qmail@web20704.mail.yahoo.com> <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk>
Message-ID: <1062765065.3f5882091fc3a@webmail.njit.edu>

This is a good question, please drop me a note if you guys get some answers.

For all I know, besides the homolog part, the key difference is that ortholog 
is same function in different species, while paralog is different function in 
same species.

Please keep me posted!
Thanks!
haibo
//cheers

Quoting Dan Bolser <dmb at mrc-dunn.cam.ac.uk>:

> Phil Luo said:
> > Dear all,
> >
> > As we know ,there are two kinds of homolog, ortholog and paralog.
> Genes in two
> > species that have directly evolved from a single gene in the last
> common ancestor
> > are called orthologs. A set of homologous genes that have diverged
> from each other
> > as a consequence of genetic duplication are called paralogs. Sometime
> those
> > paralogs which arose from a duplication after the speciation event are
> called
> > in-paralogs.
> >
> > My question is how to distinguish the in-paralogs from orthologs.
> Which one is
> > supposed to be more similar, in-paralogs or orthologs?
> 
> Hi,
> Good question! Maby someone on the sequence searching mailing list can
> help answer,
> 
> http://bioinformatics.org/mailman/listinfo/ssml-general
> 
> 
> I know of some work trying to uncover 'lineage specific gene expansion'
> by Eugene
> Koonin (sp?) at the NCBI. That sounds a bit like the in-paralogues you
> describe.
> Also he and coworkers define an algorithm for predicting orthologous
> pairs, simply
> 'best hits' between genome 1 and 2.
> 
> Although I understand the definition of orthology and paralogy, I find
> the concepts
> a bit confusing. I don't know what information you loose by simply
> talking about
> gene families, and ignoring the within / between genome distinction.
> 
> At some level does't ortholog mean 'same gene', and paralog mean
> 'copy'?
> 
> Cheers,
> 
> > Best regards,
> > Phil
> >
> >
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! SiteBuilder - Free, easy-to-use web site design software
> 
> 
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


=========================================================
Haibo Zhang, PhD student
Computational Biology, NJIT & Rutgers University
Center for Applied Genomics, PHRI
http://afs13.njit.edu/~hz5


From boris.steipe at utoronto.ca  Fri Sep  5 08:48:15 2003
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Fri, 05 Sep 2003 08:48:15 -0400
Subject: [BiO BB] orthologs vs in-paralogs
References: <20030904232029.18724.qmail@web20704.mail.yahoo.com>
Message-ID: <3F58860E.7E591024@utoronto.ca>

Phil Luo wrote:
> 
> Dear all,
> 
> As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two
> species that have directly evolved from a single gene in the last common
> ancestor are called orthologs. A set of homologous genes that have diverged
> from each other as a consequence of genetic duplication are called paralogs.
> Sometime those paralogs which arose from a duplication after the speciation
> event are called in-paralogs.
> 
> My question is how to distinguish the in-paralogs from orthologs.

Paralogs have different functions. biochemistry, not bioinformatics.

> Which one is
> supposed to be more similar, in-paralogs or orthologs?


To the degree that the evolutionary rates remain the same, the difference will
be proportional to the time of separation from the common ancestor. For
orthologs this is the speciation event. For in-paralogs this is the duplication
event. Accordingly you would expect a situation in which pradoxically a protein
and it's paralog would be more similar than that protein and its ortholog in
another species.

That need not be universally true however, since the divergence of an in-paralog
(post-duplication) may occur under reduced selective pressure since the original
protein still fulfills its function. Thus the evolutionary rates need not be the same.

Accordingly: if you find paradoxical similarities as above, your best
explanation will be "in-paralogs". But if you have a duplication event in a
species (possible evidence could come from comparative genomics) you cannot
necessarily conclude that the proteins must be unusually similar. In fact,
looking for such events systematically and analysing divergence rates, would
make an interesting project to quantify the evolutionary pressure on genes after
duplication events.


Best regards,


Boris

---
Boris Steipe
University of Toronto
Program in Proteomics & Bioinformatics
Departments of Biochemistry & Molecular and Medical Genetics
http://biochemistry.utoronto.ca/steipe


From dmb at mrc-dunn.cam.ac.uk  Fri Sep  5 08:07:11 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Fri, 5 Sep 2003 13:07:11 +0100 (BST)
Subject: [BiO BB] orthologs vs in-paralogs
In-Reply-To: <1062765065.3f5882091fc3a@webmail.njit.edu>
References: <20030904232029.18724.qmail@web20704.mail.yahoo.com>
        <33048.80.4.6.223.1062743936.squirrel@www.mrc-dunn.cam.ac.uk>
        <1062765065.3f5882091fc3a@webmail.njit.edu>
Message-ID: <37350.80.4.6.223.1062763631.squirrel@www.mrc-dunn.cam.ac.uk>

> This is a good question, please drop me a note if you guys get some answers.
>
> For all I know, besides the homolog part, the key difference is that ortholog  is
> same function in different species, while paralog is different function in  same
> species.

Yup, I know this definition, but it doesn't address the idea or redundancy within a
genome. This is (apparently) a big issue for functional genomics.

Cheers


> Please keep me posted!
> Thanks!
> haibo
> //cheers
>
> Quoting Dan Bolser <dmb at mrc-dunn.cam.ac.uk>:
>
>> Phil Luo said:
>> > Dear all,
>> >
>> > As we know ,there are two kinds of homolog, ortholog and paralog.
>> Genes in two
>> > species that have directly evolved from a single gene in the last
>> common ancestor
>> > are called orthologs. A set of homologous genes that have diverged
>> from each other
>> > as a consequence of genetic duplication are called paralogs. Sometime
>> those
>> > paralogs which arose from a duplication after the speciation event are
>> called
>> > in-paralogs.
>> >
>> > My question is how to distinguish the in-paralogs from orthologs.
>> Which one is
>> > supposed to be more similar, in-paralogs or orthologs?
>>
>> Hi,
>> Good question! Maby someone on the sequence searching mailing list can help
>> answer,
>>
>> http://bioinformatics.org/mailman/listinfo/ssml-general
>>
>>
>> I know of some work trying to uncover 'lineage specific gene expansion' by
>> Eugene
>> Koonin (sp?) at the NCBI. That sounds a bit like the in-paralogues you describe.
>> Also he and coworkers define an algorithm for predicting orthologous pairs,
>> simply
>> 'best hits' between genome 1 and 2.
>>
>> Although I understand the definition of orthology and paralogy, I find the
>> concepts
>> a bit confusing. I don't know what information you loose by simply talking about
>> gene families, and ignoring the within / between genome distinction.
>>
>> At some level does't ortholog mean 'same gene', and paralog mean 'copy'?
>>
>> Cheers,
>>
>> > Best regards,
>> > Phil
>> >
>> >
>> > ---------------------------------
>> > Do you Yahoo!?
>> > Yahoo! SiteBuilder - Free, easy-to-use web site design software
>>
>>
>>
>> _______________________________________________
>> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>
>
>
>
> =========================================================
> Haibo Zhang, PhD student
> Computational Biology, NJIT & Rutgers University
> Center for Applied Genomics, PHRI
> http://afs13.njit.edu/~hz5


From dmb at mrc-dunn.cam.ac.uk  Fri Sep  5 08:15:18 2003
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Fri, 5 Sep 2003 13:15:18 +0100 (BST)
Subject: [BiO BB] orthologs vs in-paralogs
In-Reply-To: <3F58860E.7E591024@utoronto.ca>
References: <20030904232029.18724.qmail@web20704.mail.yahoo.com>
        <3F58860E.7E591024@utoronto.ca>
Message-ID: <37453.80.4.6.223.1062764118.squirrel@www.mrc-dunn.cam.ac.uk>

Boris Steipe said:
> Phil Luo wrote:
>>
>> Dear all,
>>
>> As we know ,there are two kinds of homolog, ortholog and paralog. Genes in two
>> species that have directly evolved from a single gene in the last common
>> ancestor are called orthologs. A set of homologous genes that have diverged from
>> each other as a consequence of genetic duplication are called paralogs. Sometime
>> those paralogs which arose from a duplication after the speciation event are
>> called in-paralogs.
>>
>> My question is how to distinguish the in-paralogs from orthologs.
>
> Paralogs have different functions. biochemistry, not bioinformatics.

How about redundant functions?

>> Which one is
>> supposed to be more similar, in-paralogs or orthologs?
>
>
> To the degree that the evolutionary rates remain the same, the difference will be
> proportional to the time of separation from the common ancestor. For orthologs
> this is the speciation event. For in-paralogs this is the duplication event.
> Accordingly you would expect a situation in which pradoxically a protein and it's
> paralog would be more similar than that protein and its ortholog in another
> species.

I think this should be common in 'lineage specific gene expansion' re: Eugene Koonin.

> That need not be universally true however, since the divergence of an in-paralog
> (post-duplication) may occur under reduced selective pressure since the original
> protein still fulfills its function. Thus the evolutionary rates need not be the
> same.

This has been argued to be one of the major driving forces for evolution (gene
function innovation). I am sorry but I can't remember the reference for this
concept.

> Accordingly: if you find paradoxical similarities as above, your best explanation
> will be "in-paralogs". But if you have a duplication event in a species (possible
> evidence could come from comparative genomics) you cannot necessarily conclude
> that the proteins must be unusually similar. In fact, looking for such events
> systematically and analysing divergence rates, would make an interesting project
> to quantify the evolutionary pressure on genes after duplication events.

This has been the focus of study for researchers such as Andras Wagner, looking at
the  innovation of interaction partners after duplication events using the
interactomes of different yeast strains and worm. Eugene Koonin et al have also
looked at this systematically, but I think a general (conceptually clean) framework
for this analysis would be a major step forward.


Cheers.

>
>
>
> Best regards,
>
>
> Boris
>
> ---
> Boris Steipe
> University of Toronto
> Program in Proteomics & Bioinformatics
> Departments of Biochemistry & Molecular and Medical Genetics
> http://biochemistry.utoronto.ca/steipe
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From shenhav at wicc.weizmann.ac.il  Thu Sep  4 06:07:04 2003
From: shenhav at wicc.weizmann.ac.il (Barak Shenhav)
Date: Thu, 4 Sep 2003 12:07:04 +0200
Subject: [BiO BB] Swissprot to Refseq entry mapping
Message-ID: <3F55F6AA@wiccweb>

Try GeneCards from Weizmann Institute of Science 
(http://bioinformatics.weizmann.ac.il/genecards/)
>===== Original Message From bio_bulletin_board at bioinformatics.org =====
>hi all
>
>I would like to know if there is any file which can
>map human proteins of swissprot to Locuslink or Refseq
>entries of NCBI or is there any round about way to
>achieve that.
>
>Thank you
>
>Regards
>M.K.Govind
>
>__________________________________
>Do you Yahoo!?
>Yahoo! SiteBuilder - Free, easy-to-use web site design software
>http://sitebuilder.yahoo.com
>_______________________________________________
>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

====================================
Barak Shenhav
Department of Molecular Genetics
Weizmann Institute of Science
+972 8 9343098 (office)
+972 8 9344487 (fax)
+972 5 2955550 (cellular)


From charles at moulinette.dyndns.org  Thu Sep  4 09:31:53 2003
From: charles at moulinette.dyndns.org (Charles Plessy)
Date: Thu, 4 Sep 2003 15:31:53 +0200
Subject: [BiO BB] remove CTL-M and Buying a bioinformatics workstation
In-Reply-To: <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>
References: <50273.129.171.111.5.1062616174.squirrel@domino.rsmas.miami.edu> <BD10D92F-DECF-11D7-8607-000393BE23AE@poulet.org>
Message-ID: <20030904133153.GC3999@plessy.org>

On Thu, Sep 04, 2003 at 02:03:04PM +0200, Yannick Wurm wrote:
> Hi,
> According to the graph they show, using a word length of 40, the Dual 
> G5 ran 3 million nucleotides per second whereas the Linux boxes did 
> only about 0.75.

It would be intersting in addition to compare OSX on G5 to Linux on
G5...

-- 
Charles Plessy


From tfiedler at rsmas.miami.edu  Thu Sep  4 17:12:39 2003
From: tfiedler at rsmas.miami.edu (Tristan Fiedler)
Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT)
Subject: [BiO BB] Unigene FormatDB error on Mac OS X
In-Reply-To: <20030904160136.DEB43D284F@www.bioinformatics.org>
References: <20030904160136.DEB43D284F@www.bioinformatics.org>
Message-ID: <50701.129.171.111.5.1062709959.squirrel@domino.rsmas.miami.edu>

Thanks all for the hardware tips!

I am currently setting up a blast run of marine sequences against Ciona
intestinalis from Unigene.  A couple of bugs, which  maybe someone has
overcome :


1.  In the file which I downloaded 'File: Cin.seq.uniq.Z', the various
entries in this FASTA format file have headers such as :

>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full
insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12
14
ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA
AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA
A


How can I get a more descriptive/functional definition of the various
unigene clusters?

2.  I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona'
command.  Is this correct?  In the formatdb logfile, how can the following
errors be corrected (if necessary) :

/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log

========================[ Sep 4, 2003  4:53 PM ]========================
Version 2.2.6 [Apr-09-2003]
Started database file "Cin.seq.uniq"
NOTE: CoreLib [002.003]
FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed
NOTE: CoreLib [002.003]
FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r")
failed
NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed
NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed
NOTE: [000.000] No number of link bits used found in config  file. Ignoring
NOTE: [000.000] No number of membership bits used found in config file.
Ignoring
Formatted 13699 sequences in volume 0

Thank you all very much for the assistance!!!

Cheers,

Tristan Fiedler


From tfiedler at rsmas.miami.edu  Thu Sep  4 17:12:39 2003
From: tfiedler at rsmas.miami.edu (Tristan Fiedler)
Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT)
Subject: [BiO BB] Unigene FormatDB error on Mac OS X
In-Reply-To: <20030904160136.DEB43D284F@www.bioinformatics.org>
References: <20030904160136.DEB43D284F@www.bioinformatics.org>
Message-ID: <50701.129.171.111.5.1062709959.squirrel@domino.rsmas.miami.edu>

Thanks all for the hardware tips!

I am currently setting up a blast run of marine sequences against Ciona
intestinalis from Unigene.  A couple of bugs, which  maybe someone has
overcome :


1.  In the file which I downloaded 'File: Cin.seq.uniq.Z', the various
entries in this FASTA format file have headers such as :

>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full
insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12
14
ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA
AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA
A


How can I get a more descriptive/functional definition of the various
unigene clusters?

2.  I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona'
command.  Is this correct?  In the formatdb logfile, how can the following
errors be corrected (if necessary) :

/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log

========================[ Sep 4, 2003  4:53 PM ]========================
Version 2.2.6 [Apr-09-2003]
Started database file "Cin.seq.uniq"
NOTE: CoreLib [002.003]
FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed
NOTE: CoreLib [002.003]
FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r")
failed
NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed
NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed
NOTE: [000.000] No number of link bits used found in config  file. Ignoring
NOTE: [000.000] No number of membership bits used found in config file.
Ignoring
Formatted 13699 sequences in volume 0

Thank you all very much for the assistance!!!

Cheers,

Tristan Fiedler


From lsrubin at wicc.weizmann.ac.il  Sun Sep  7 20:19:57 2003
From: lsrubin at wicc.weizmann.ac.il (Eitan Rubin)
Date: Mon, 8 Sep 2003 02:19:57 +0200
Subject: [BiO BB] Using blast to compare genomes: a warning
Message-ID: <3F56C9C2@wiccweb>

Hi,

  I used BLAST to compare genomes some time ago, and got lots of short, poor, 
statistiaclly significant similarities. I than met Altchul in a conferance, 
and asked him how to tell real similarities from chance ones. He said: "don't 
use BLAST for DNA. We did a poor job on the statistics for DNA - who thought 
people will actually use it?". When I asked what to do he said "use FASTA - 
bill did a better job with DNA".

  Eitan

>Message: 3
>Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT)
>From: "Tristan Fiedler" <tfiedler at rsmas.miami.edu>
>To: bio_bulletin_board at bioinformatics.org
>Cc: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] Unigene FormatDB error on Mac OS X
>Reply-To: bio_bulletin_board at bioinformatics.org
>
>Thanks all for the hardware tips!
>
>I am currently setting up a blast run of marine sequences against Ciona
>intestinalis from Unigene.  A couple of bugs, which  maybe someone has
>overcome :
>
>
>
>1.  In the file which I downloaded 'File: Cin.seq.uniq.Z', the various
>entries in this FASTA format file have headers such as :
>
>>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full
>insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12
>14
>ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA
>AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA
>A
>
>
>How can I get a more descriptive/functional definition of the various
>unigene clusters?
>
>2.  I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona'
>command.  Is this correct?  In the formatdb logfile, how can the following
>errors be corrected (if necessary) :
>
>/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log
>
>========================[ Sep 4, 2003  4:53 PM ]========================
>Version 2.2.6 [Apr-09-2003]
>Started database file "Cin.seq.uniq"
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r")
>failed
>NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed
>NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed
>NOTE: [000.000] No number of link bits used found in config  file. Ignoring
>NOTE: [000.000] No number of membership bits used found in config file.
>Ignoring
>Formatted 13699 sequences in volume 0
>
>Thank you all very much for the assistance!!!
>
>Cheers,
>
>Tristan Fiedler
>
>--__--__--
>
>Message: 4
>Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT)
>From: "Tristan Fiedler" <tfiedler at rsmas.miami.edu>
>To: bio_bulletin_board at bioinformatics.org
>Cc: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] Unigene FormatDB error on Mac OS X
>Reply-To: bio_bulletin_board at bioinformatics.org
>
>Thanks all for the hardware tips!
>
>I am currently setting up a blast run of marine sequences against Ciona
>intestinalis from Unigene.  A couple of bugs, which  maybe someone has
>overcome :
>
>
>
>1.  In the file which I downloaded 'File: Cin.seq.uniq.Z', the various
>entries in this FASTA format file have headers such as :
>
>>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full
>insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12
>14
>ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA
>AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA
>A
>
>
>How can I get a more descriptive/functional definition of the various
>unigene clusters?
>
>2.  I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona'
>command.  Is this correct?  In the formatdb logfile, how can the following
>errors be corrected (if necessary) :
>
>/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log
>
>========================[ Sep 4, 2003  4:53 PM ]========================
>Version 2.2.6 [Apr-09-2003]
>Started database file "Cin.seq.uniq"
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r")
>failed
>NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed
>NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed
>NOTE: [000.000] No number of link bits used found in config  file. Ignoring
>NOTE: [000.000] No number of membership bits used found in config file.
>Ignoring
>Formatted 13699 sequences in volume 0
>
>Thank you all very much for the assistance!!!
>
>Cheers,
>
>Tristan Fiedler
>
>
>--__--__--
>
>_______________________________________________
>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>
>End of BiO_Bulletin_Board Digest


From psijyoti at yahoo.com  Mon Sep  8 14:06:17 2003
From: psijyoti at yahoo.com (Jyoti Kapoor)
Date: Mon, 8 Sep 2003 11:06:17 -0700 (PDT)
Subject: [BiO BB] Queries regarding the certificate course in bioinformatics
Message-ID: <20030908180618.48336.qmail@web13207.mail.yahoo.com>

 
I am an Oracle database developer but have been looking for a job since quite some now but in vain. Hence, I have been thinking about changing my career-path. I am considering getting into bioinformatics or biotechnology and need some help with a few queries:
 
1. which is a better career option - bioinformatics or biotechnology?
2. what type of career options are available for each of the above fields?
3. what kind of companies recruit biotech. or bioinfo. professionals?
4. considering the downward trend in the economy,what is the job oppurtunity in the bay area if I take either the bioinformatics or the biotechnology certificate course?
5. Will a certificate course with my background help me get a job (in the bay area)?
6. what is the salary range for an entry level bioinformatics/biotech. professional?
 
Thanks in advance for all your time.


Jyoti


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030908/0c343056/attachment.html>

From priyaa_b at yahoo.com  Tue Sep  9 08:25:37 2003
From: priyaa_b at yahoo.com (Priya)
Date: Tue, 9 Sep 2003 05:25:37 -0700 (PDT)
Subject: [BiO BB] Re: [inbios] ?: paralogy group
In-Reply-To: <20030909101828.41441.qmail@web14901.mail.yahoo.com>
Message-ID: <20030909122537.18584.qmail@web41202.mail.yahoo.com>

Hi!

It would be helpful if someone can tell me the defination and meaning
of "paralogy group" in the context of genes and proteins.

Thanx.


priya

---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030909/1f0e6e49/attachment.html>

From MEC at Stowers-Institute.org  Tue Sep  9 10:16:31 2003
From: MEC at Stowers-Institute.org (Cook, Malcolm)
Date: Tue, 9 Sep 2003 09:16:31 -0500
Subject: [BiO BB] QBlast implementing
Message-ID: <CED81D34E37D5043A1211565277A51E59F7FA3@exchkc02.stowers-institute.org>

Implementation of the NCBI QBlast server is NOT publicly available, however the URL API which it implements is very well documented ( http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html ) 

I am considering implementing a blast server to support this API.

1) has anyone done this already who can share either code, tips, or recommend approaches

2) if there are other parties who might be interested in the results of the effort

Malcolm Cook
Database Applications Manager
Stowers Institute for Medical Research
1000 E 50th Street
Kansas City, MO 64110
tel: 816-926-4449
fax: (816) 926-2098


From landman at scalableinformatics.com  Tue Sep  9 10:55:14 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 9 Sep 2003 10:55:14 -0400 (EDT)
Subject: [BiO BB] QBlast implementing
In-Reply-To: <CED81D34E37D5043A1211565277A51E59F7FA3@exchkc02.stowers-institute.org>
Message-ID: <Pine.LNX.4.44.0309091054120.4701-100000@crunch.scalableinformatics.com>

Hi Malcom:

  I am working on some similar things for one of my company's products.  
Contact me offlist if you would like to discuss.

Joe

On Tue, 9 Sep 2003, Cook, Malcolm wrote:

> 
> Implementation of the NCBI QBlast server is NOT publicly available, however the URL API which it implements is very well documented ( http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html ) 
> 
> I am considering implementing a blast server to support this API.
> 
> 1) has anyone done this already who can share either code, tips, or recommend approaches
> 
> 2) if there are other parties who might be interested in the results of the effort
> 
> Malcolm Cook
> Database Applications Manager
> Stowers Institute for Medical Research
> 1000 E 50th Street
> Kansas City, MO 64110
> tel: 816-926-4449
> fax: (816) 926-2098
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From hjm at tacgi.com  Tue Sep  9 12:13:26 2003
From: hjm at tacgi.com (Harry Mangalam)
Date: Tue, 09 Sep 2003 09:13:26 -0700
Subject: [BiO BB] BLASTing SCO (re: Linux IP, IBM suite, shred algorithm, etc)
In-Reply-To: <Pine.LNX.4.44.0309091054120.4701-100000@crunch.scalableinformatics.com>
References: <Pine.LNX.4.44.0309091054120.4701-100000@crunch.scalableinformatics.com>
Message-ID: <3F5DFC26.4060703@tacgi.com>

I've been watching the SCO vs IBM suit with some interest and this piqued my 
interest.

Eric Raymond has apparently reworked some old 'shred' code which calculates MD5 
hashes for long (from the molbio perspective) words (~3 lines at a time) and 
then sorts the hashes to identify sections of the Linux source code tree  which 
are identical to those from SCO-owned System V Unix base.

This sounds a bit like the initial pass for BLAT, which generates hashes for 
much smaller words and uses the hashes in comparisons.

http://www.eweek.com/article2/0,4149,1257617,00.asp

Could BLAST not be used to faster & much more sensitively identify not only 
identical but similar sections of code?

It would have to be modified to do an 'all against all' approach and would have 
to  also take into account line numbers and file names, but here'a a good 
undergrad programming project for someone, with the possibility of getting some 
good press and creating a tool that will undoubtedly be used again in litigation 
(read: it could be worth real money)

Then again, the Raymond's shred code approach is probably good enough.

Comments?
-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hjm at tacgi.com
             <<plain text preferred>>


From jhaveri at usc.edu  Tue Sep  9 15:03:45 2003
From: jhaveri at usc.edu (jinal jhaveri)
Date: Tue, 09 Sep 2003 12:03:45 -0700
Subject: [BiO BB] NCBI Viewer
Message-ID: <819d6e818c44.818c44819d6e@usc.edu>

Hi there,

I am developing a zoom viewer for chromosomes. Can any one give tips on
some available software to use for that. I want this zoom viewer to be
online (same as the one ncbi has
(http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?org=arabid&chr=I) , i.e
the entrez one.

thank you
--Jinal


From tfiedler at rsmas.miami.edu  Tue Sep  9 17:00:55 2003
From: tfiedler at rsmas.miami.edu (Tristan Fiedler)
Date: Tue, 9 Sep 2003 17:00:55 -0400 (EDT)
Subject: [BiO BB] Poly A tail length - script help please
In-Reply-To: <20030904160136.DEB43D284F@www.bioinformatics.org>
References: <20030904160136.DEB43D284F@www.bioinformatics.org>
Message-ID: <52557.129.171.111.5.1063141255.squirrel@domino.rsmas.miami.edu>

Thanks for the scripting tips!  I have a 'counting' issue which I need to
quickly resolve.  A typical sequence input file (5 - 700 bases) looks like
:

AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA

I have over 500 files, named *.seq.  I would like to create a script which :

a.  runs through all the files,
b.  counts the length of the 'poly A' tail (defined as the longest stretch
of A or N)
c. sends the output to a file, eg.

25 1.seq
87 2.seq
13 3.seq

Example valid poly A tails :

AAAANANANANAAANNAAAAAA

AAAAAAAAAAAAAA

NNNNNNNNNNNNN

AAANNNNNNNNNNNAAAAAAAAA

Thank you so much for your expertise!

Tristan

-- 
Tristan J. Fiedler, Ph.D.
Postdoctoral Research Fellow
NIEHS Marine & Freshwater Biomedical Sciences Center
Rosenstiel School of Marine & Atmospheric Sciences
University of Miami

tfiedler at rsmas.miami.edu
t.fiedler at umiami.edu (alias)
305-361-4626


From landman at scalableinformatics.com  Tue Sep  9 19:57:34 2003
From: landman at scalableinformatics.com (Joseph Landman)
Date: Tue, 09 Sep 2003 19:57:34 -0400
Subject: [BiO BB] Poly A tail length - script help please
In-Reply-To: <52557.129.171.111.5.1063141255.squirrel@domino.rsmas.miami.edu>
References: <20030904160136.DEB43D284F@www.bioinformatics.org>
	 <52557.129.171.111.5.1063141255.squirrel@domino.rsmas.miami.edu>
Message-ID: <1063151854.10843.144.camel@protein.scalableinformatics.com>

First one is free ... 

        #!/usr/bin/perl
        
        use strict;
        
        my ($directory,$directory_handle,$file, at files,$sequence);
        my ($file_handle,$poly_a_tail,$rseq);
        
        $directory = "./";	# directory to open
        if (!(opendir $directory_handle,$directory))
           {
             die "FATAL ERROR: Unable to open directory = ".$directory."\n";
           }
           
        # select only the .seq files
        @files = grep { /\.seq$/ } readdir($directory_handle); 
        
        # loop over these selected files
        foreach $file (@files)
          {    
            # try to open the file
            if (!(open($file_handle,"< ".$file)))
               {
                 # if we cannot open it, warn the user, and skip to the next file
                 warn "Warning: unable to open file = ".$file."\. Skipping\.\n";
        	 next;
               }
              else
               {
                 # assume one line per file, or we will have to modify this
        	 chomp($sequence=<$file_handle>);
        	 # now time to bring out the heavy artillery
        	 $rseq=reverse $sequence;	# poly-a is now at the head
        	 $rseq =~ /^([AN]+)\w+$/;	# match A's and/or N's at the front
        	 $poly_a_tail = $1;		# return the match ...
        	 printf "%i %s\n",length($poly_a_tail),$file;	# tell the world ...
        	 close($file_handle);
               }
          }


On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote:
> Thanks for the scripting tips!  I have a 'counting' issue which I need to
> quickly resolve.  A typical sequence input file (5 - 700 bases) looks like
> :
> 
> AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA
> 
> I have over 500 files, named *.seq.  I would like to create a script which :
> 
> a.  runs through all the files,
> b.  counts the length of the 'poly A' tail (defined as the longest stretch
> of A or N)
> c. sends the output to a file, eg.
> 
> 25 1.seq
> 87 2.seq
> 13 3.seq
> 
> Example valid poly A tails :
> 
> AAAANANANANAAANNAAAAAA
> 
> AAAAAAAAAAAAAA
> 
> NNNNNNNNNNNNN
> 
> AAANNNNNNNNNNNAAAAAAAAA
> 
> Thank you so much for your expertise!
> 
> Tristan
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615


From MEC at Stowers-Institute.org  Wed Sep 10 16:12:16 2003
From: MEC at Stowers-Institute.org (Cook, Malcolm)
Date: Wed, 10 Sep 2003 15:12:16 -0500
Subject: [BiO BB] Poly A tail length - script help please
Message-ID: <CED81D34E37D5043A1211565277A51E5AE97FF@exchkc02.stowers-institute.org>

But that does not compute the 'longest stretch'.

The attached perl script does, and will allow you to write:

> polyfind [-all] *.seq > polyfind.results

Enjoy,

Malcolm Cook

> -----Original Message-----
> From: Joseph Landman [mailto:landman at scalableinformatics.com]
> Sent: Tuesday, September 09, 2003 6:58 PM
> To: BiO BB
> Cc: biodevelopers
> Subject: Re: [BiO BB] Poly A tail length - script help please
> 
> 
> First one is free ... 
> 
>         #!/usr/bin/perl
>         
>         use strict;
>         
>         my ($directory,$directory_handle,$file, at files,$sequence);
>         my ($file_handle,$poly_a_tail,$rseq);
>         
>         $directory = "./";	# directory to open
>         if (!(opendir $directory_handle,$directory))
>            {
>              die "FATAL ERROR: Unable to open directory = 
> ".$directory."\n";
>            }
>            
>         # select only the .seq files
>         @files = grep { /\.seq$/ } readdir($directory_handle); 
>         
>         # loop over these selected files
>         foreach $file (@files)
>           {    
>             # try to open the file
>             if (!(open($file_handle,"< ".$file)))
>                {
>                  # if we cannot open it, warn the user, and 
> skip to the next file
>                  warn "Warning: unable to open file = 
> ".$file."\. Skipping\.\n";
>         	 next;
>                }
>               else
>                {
>                  # assume one line per file, or we will have 
> to modify this
>         	 chomp($sequence=<$file_handle>);
>         	 # now time to bring out the heavy artillery
>         	 $rseq=reverse $sequence;	# poly-a is now 
> at the head
>         	 $rseq =~ /^([AN]+)\w+$/;	# match A's 
> and/or N's at the front
>         	 $poly_a_tail = $1;		# return the match ...
>         	 printf "%i %s\n",length($poly_a_tail),$file;	
> # tell the world ...
>         	 close($file_handle);
>                }
>           }
> 
> 
> 
> On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote:
> > Thanks for the scripting tips!  I have a 'counting' issue 
> which I need to
> > quickly resolve.  A typical sequence input file (5 - 700 
> bases) looks like
> > :
> > 
> > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA
> > 
> > I have over 500 files, named *.seq.  I would like to create 
> a script which :
> > 
> > a.  runs through all the files,
> > b.  counts the length of the 'poly A' tail (defined as the 
> longest stretch
> > of A or N)
> > c. sends the output to a file, eg.
> > 
> > 25 1.seq
> > 87 2.seq
> > 13 3.seq
> > 
> > Example valid poly A tails :
> > 
> > AAAANANANANAAANNAAAAAA
> > 
> > AAAAAAAAAAAAAA
> > 
> > NNNNNNNNNNNNN
> > 
> > AAANNNNNNNNNNNAAAAAAAAA
> > 
> > Thank you so much for your expertise!
> > 
> > Tristan
> -- 
> Joseph Landman, Ph.D
> Scalable Informatics LLC
> email: landman at scalableinformatics.com
>   web: http://scalableinformatics.com
> phone: +1 734 612 4615
> 
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: polyafind
Type: application/octet-stream
Size: 3438 bytes
Desc: polyafind
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030910/7473c37a/attachment.obj>

From landman at scalableinformatics.com  Wed Sep 10 04:23:51 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 10 Sep 2003 04:23:51 -0400
Subject: [Biodevelopers] RE: [BiO BB] Poly A tail length - script help
	please
In-Reply-To: <CED81D34E37D5043A1211565277A51E5AE97FF@exchkc02.stowers-institute.org>
References: 	 <CED81D34E37D5043A1211565277A51E5AE97FF@exchkc02.stowers-institute.org>
Message-ID: <1063182231.5267.1.camel@squash.scalableinformatics.com>

Malcom

  Good catch.  I paid attention to the tail part, not the longest
sequence part.  Should be easy to modify the regex, and generate an
length sorted array of matches, but as you have already solved the
(correct) problem ...

Joe

On Wed, 2003-09-10 at 16:12, Cook, Malcolm wrote:
> But that does not compute the 'longest stretch'.
> 
> The attached perl script does, and will allow you to write:
> 
> > polyfind [-all] *.seq > polyfind.results
> 
> Enjoy,
> 
> Malcolm Cook
> 
> > -----Original Message-----
> > From: Joseph Landman [mailto:landman at scalableinformatics.com]
> > Sent: Tuesday, September 09, 2003 6:58 PM
> > To: BiO BB
> > Cc: biodevelopers
> > Subject: Re: [BiO BB] Poly A tail length - script help please
> > 
> > 
> > First one is free ... 
> > 
> >         #!/usr/bin/perl
> >         
> >         use strict;
> >         
> >         my ($directory,$directory_handle,$file, at files,$sequence);
> >         my ($file_handle,$poly_a_tail,$rseq);
> >         
> >         $directory = "./";	# directory to open
> >         if (!(opendir $directory_handle,$directory))
> >            {
> >              die "FATAL ERROR: Unable to open directory = 
> > ".$directory."\n";
> >            }
> >            
> >         # select only the .seq files
> >         @files = grep { /\.seq$/ } readdir($directory_handle); 
> >         
> >         # loop over these selected files
> >         foreach $file (@files)
> >           {    
> >             # try to open the file
> >             if (!(open($file_handle,"< ".$file)))
> >                {
> >                  # if we cannot open it, warn the user, and 
> > skip to the next file
> >                  warn "Warning: unable to open file = 
> > ".$file."\. Skipping\.\n";
> >         	 next;
> >                }
> >               else
> >                {
> >                  # assume one line per file, or we will have 
> > to modify this
> >         	 chomp($sequence=<$file_handle>);
> >         	 # now time to bring out the heavy artillery
> >         	 $rseq=reverse $sequence;	# poly-a is now 
> > at the head
> >         	 $rseq =~ /^([AN]+)\w+$/;	# match A's 
> > and/or N's at the front
> >         	 $poly_a_tail = $1;		# return the match ...
> >         	 printf "%i %s\n",length($poly_a_tail),$file;	
> > # tell the world ...
> >         	 close($file_handle);
> >                }
> >           }
> > 
> > 
> > 
> > On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote:
> > > Thanks for the scripting tips!  I have a 'counting' issue 
> > which I need to
> > > quickly resolve.  A typical sequence input file (5 - 700 
> > bases) looks like
> > > :
> > > 
> > > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA
> > > 
> > > I have over 500 files, named *.seq.  I would like to create 
> > a script which :
> > > 
> > > a.  runs through all the files,
> > > b.  counts the length of the 'poly A' tail (defined as the 
> > longest stretch
> > > of A or N)
> > > c. sends the output to a file, eg.
> > > 
> > > 25 1.seq
> > > 87 2.seq
> > > 13 3.seq
> > > 
> > > Example valid poly A tails :
> > > 
> > > AAAANANANANAAANNAAAAAA
> > > 
> > > AAAAAAAAAAAAAA
> > > 
> > > NNNNNNNNNNNNN
> > > 
> > > AAANNNNNNNNNNNAAAAAAAAA
> > > 
> > > Thank you so much for your expertise!
> > > 
> > > Tristan
> > -- 
> > Joseph Landman, Ph.D
> > Scalable Informatics LLC
> > email: landman at scalableinformatics.com
> >   web: http://scalableinformatics.com
> > phone: +1 734 612 4615
> > 
> > 
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > 
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615


From luo_2005 at yahoo.com  Thu Sep 11 17:09:55 2003
From: luo_2005 at yahoo.com (Phil Luo)
Date: Thu, 11 Sep 2003 14:09:55 -0700 (PDT)
Subject: [BiO BB] About Homolog Gene Database
Message-ID: <20030911210955.64091.qmail@web60205.mail.yahoo.com>

Dear all,
 
I know there are some database of Homologous Gene,such as COG ,HOBACGEN etc.
But they are all based on the computational method prediction. I was wondering where I can find the homolog information, whcih are more accurate and based on biological experiment.
Because I wanna do homolog analysis, I need some real data, especiallly for those popular genome, like E.coli.
 
You help is really appreciated.
 
Best Regards,
Phil
 

---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030911/f06a8070/attachment.html>

From mourad12345678 at yahoo.com  Wed Sep 10 11:14:48 2003
From: mourad12345678 at yahoo.com (Mourad Elloumi)
Date: Wed, 10 Sep 2003 08:14:48 -0700 (PDT)
Subject: [BiO BB] 4th IEEE Symposium on Bioinformatics and Bioengineering
Message-ID: <20030910151448.16753.qmail@web12301.mail.yahoo.com>

  
                   CALL FOR PAPERS 

       Fourth IEEE Symposium on Bioinformatics and 
                   Bioengineering 

                     (BIBE'04) 
 
                  May 19-21, 2004, 
               Taichung, Taiwan, ROC 

             http://bibe2004.ece.uci.edu/ 

       Sponsored by the IEEE Computer Society 
      Co-sponsored by IEEE Neural Network Society 

The BIBE Symposium provides a common platform for the
cross fertilization of ideas, 
and to help shape knowledge and scientific
achievements by bridging these two very 
important and complementary disciplines into an
interactive and attractive forum. 
Keeping this objective in mind, BIBE solicits original
contributions in the 
following non-exclusive list of areas: 

Biomedical Informatics and Computation 
Bio-molecular and Phylogenetic Databases, Query
Languages, Interoperability, 
Bio-Ontology and Data Mining, System Biology,
Identification and Classification of Genes, 
Sequence Search and Alignment, Protein Structure
Prediction and Molecular Simulation, 
Molecular Evolution and Phylogeny, Functional
Genomics, Proteomics, 
Drug Discovery Gene Expression Analysis, Biolanguages,
Bioinformatics Engineering, 
Data Visualization, Signaling and Computation
Biomedical Data Engineering, 
Medical Image Processing (Segmentation, Registration,
Fusion), Telemedicine, 
Modeling and Simulation, Biomedical Imaging 

Bio-Engineering 
Biological Systems and Models, Engineering Models in
Biomedicine, Biomedical Sensors, 
Computer Assisted Intervention Systems and Robotics,
Bionic Human, Cell Engineering, 
Molecular and Cellular Systems, Body's and Cell's
Bio-signatures, Tissue Engineering, Biomaterials 

Paper Submissions 
The written and spoken language of BIBE2004 is
English. Authors should submit a full paper 
via electronic submission <bibe2004 at ece.uci.edu>. 
Full papers must not exceed 20 pages printed using at
least 11-point type and double spacing. 
All papers should in PDF or PostScript format. The
paper should include a 200-word abstract, 
a list of keywords (research areas), and author's
phone number and e-mail address. 
The Conference Proceedings will be published by the
IEEE Computer Society Press. A number of 
the papers presented at the conference will be
selected after review process for a possible 
publication in Int. Journal on Bioinformatics
Engineering (IJBE) and Int. Journal on AI Tools
(IJAIT) journals. 

Important Dates 
January 15, 2004 Submission deadline 
February 28, 2004 Notification of acceptance 
March 20, 2004 Camera-ready copy of accepted papers
and author registration due 

General Co-Chairs 
Satoru Miyano, University of Tokyo, 
Jeffrey J. P. Tsai, University of Illinois, Chicago, 
C. Y. Kao, Natl. Taiwan Univ. 

Program Chair 
Phillip C.Y. Sheu, University of California, Irvine 

Program Co-Chairs & Committee Members 
Ruediger W. Brause, JW Goethe-Univ., Germany, Du
Zhang, CSU, Sacramento, George Karypis, 
Univ. of Minnesota, Phoebe Chen Queensland Univ.
Technology, Australia, Ming-Jing Hwang, 
Academia Sinica, Chuan-Yi Tang, Natl. Tsing Hua Univ.,
Taiwan C. S. Wang, TARI, 
Jorng-Tzong Horng, Natl. Central Univ., Fong-Rong Hsu,
Taichung Healthcare and Management 
University, Taiwan Yuhwa Lo, UC San Diego, Limsoon
Wong, Institute for Infocomm Research, 
Singapore, Cheng Li, Harvard. Metin Akay, Dartmouth
Sang Yup Lee, KAIST, Korea, Arvind Bansal, 
Kent State Univ., Tao Jiang, UC Riverside, Philip E.
Bourne, UC San Diego, Robert L. Sah, 
UC San Diego, Yuval Shahar, Ben-Gurion Univ. of the
Negev and Stanford Univ., 
Kun-Mao Chao, Natl. Taiwan University, Wen-Lian Hsu
Academia Sinica Arif Ghafoor, Purdue University, 
Huerta, Michael, Natl. Institutes of Health, James C.
Gee, Univ. of Pennsylvania, 
Joerg Meyer, UC Irvine, Ghassan S. Kassab, UC Irvine,
Jung-Hsin Lin, Natl. Taiwan University, 
Stefano Lonardi, UC Riverside, Shugo Nakamura, Univ.
of Tokyo, Japan, Andrew F. Laine, Columbia., 
Huimin Zhao, Univ. of Illinois, Urbana Champaign,
Ronney B Panerai Leicester Royal, Infirmary, UK, 
Jinn-Moon Yang, Natl. Chiao Tung Univ. Jan-Ming Ho,
Academia Sinica, Maryellen L. Giger, Univ. of Chicago,

Jim Brody, UC Irvine, Andrew McCulloch, UC San Diego,
Yuh-Jyh Hu, Natl. Chiao Tung University, Taiwan, 
Mia K. Markey Univ. of Texas at Austin, Chuan-Hsiung
Chang, Natl. Yang-Ming University, Taiwan, 
Stanley M. Finkelstein, Univ. of Minnesota, Xia
Shunren, Zhejiang Univ., China, 
Zina Ben Miled, IUPU Indianapolis, Vittorio Christini,
UC Irvine Wesley Chu, UCLA, 
Gary Huber, UC San Diego, Shankar Subramaniam, UC San
Diego, William Tang, UC Irvine, 
Vasant Honavar Iowa State University, Alfredo Colosimo
Univ. of Rome ??La Sapienza", Italy, 
Victor Maojo, Univ. of Madrid, Spain, Gisbert
Schneider, JW Goethe-Univ., Germany, 
Graham Kemp, Chalmers University of Technology,
Sweden, Hakan Ferhatosmanoglu, Ohio State Univ., 
Tatsuya Akutsu, Kyoto Univ., Japan 

Industrial Chair 
Chung-Cheng Liu, Industrial Technology Research
Institute, Taiwan 

Publicity Chair 
Stephen J.H. Yang, Natl. Central Univ., Taiwan 

Publication Chair 
Taehyung Wang, California State University, Northridge


Web Chair 
Donghua Deng, University of California, Irvine 

Financial Co-Chairs 
Rong-Ming Chen and Shih-Nung Chen, Taichung Healthcare
and Management University, Taiwan 

Local Arrangement Co-Chairs 
Han-Wen Hsiao and Anthony Y. H. Liao , Taichung
Healthcare and Management Universit, Taiwan 

Steering Committee Chair 
Nikolaos Bourbakis, ITRI, Wright State University 


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com


From vkode78 at yahoo.com  Fri Sep 12 01:07:05 2003
From: vkode78 at yahoo.com (Venu Kode)
Date: Thu, 11 Sep 2003 22:07:05 -0700 (PDT)
Subject: [BiO BB] Implementing HMM models in Hardware (FPGA)
Message-ID: <20030912050705.48283.qmail@web40508.mail.yahoo.com>

Hello everyone,
 
I am graduate student in computer engineering doing my thesis in the following topic:
 
"Matching Protein sequences with HMM models in FPGAs ( Field Programmable Gate Arrays ) using Run Time Reconfiguration"
 
If you are familiar with Decypher tool from TimeLogic, my work involves something similar to that.  Decypher is a very comprehensive and expensive tool and sure it does deliver excellent performance compared to the software tools such as HMMR, SAM or the other tools. My work falls in between the software solution and that of Decypher.
 
Specifically, when developed my tool would be considerably faster than that of the software tools currently available and also be affordable at a fraction of the cost of using Decypher. 
 
I have just started digging into Bioinformatics and have read quite a number of papers and all, but I am still a little confused and would like any comments or suggestions from you:
 
1) Does my tool make any sense at all? 
2) What is the current customer base like for this technology? 
3) What sort of companies do the work of matching protein/DNA sequences with existing models? 
4) Is there a need for a less comprehensive and less expensive tool as opposed to Decypher for customers who want to get it done a lot less cheaper but wouldnt mind the extra penalty in performace ( ofcourse will be very much better than that of the software searching)
5) Any comments, questions, suggestions?
6) Any pointers for me in terms of websites or resources.
 
I would very much appreciate your comments
 
TIA
Kode


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030911/6f6cd5c5/attachment.html>

From prathibha_562 at yahoo.co.in  Fri Sep 12 09:51:22 2003
From: prathibha_562 at yahoo.co.in (=?iso-8859-1?q?prathibha=20bharathi?=)
Date: Fri, 12 Sep 2003 14:51:22 +0100 (BST)
Subject: [BiO BB] i am a graduate student requesting  help for my project work on protein sequence analysis
In-Reply-To: <20030912050705.48283.qmail@web40508.mail.yahoo.com>
Message-ID: <20030912135122.24608.qmail@web8102.in.yahoo.com>


hai all....................
               and my sincere wishes to seniors and experts in this very much interesting field........
 
              i am a graduate student doing my final year b.tech at sree vidyaniketan engg college in computer science.
 
for my project work(in final semester)....i have chosen this field and selected "protein sequence analysis" as the title for my project.
 
in this-- i planning to build a simple database of protein sequences(a simple prototype database) and i am writing my own tool for the sequence analysis
( a simple but useful one)
 
and in this tool.......
       i am planning to use dynamic programming method for finding the related proteins and then i am also plotting the dot matrix showing the similarity along with the info regarding the proteins for seqs having more than 20% seq similarity(ie.above twilight zone) .
and after that i am also planning to perform phylogenetic analysis on the result dataset of my own tool(for this i have downloaded PHYLIP).
 
and after that i am also planning to do protein structure prediction and for this i am planning to use RASMOL.
 
for converting one database format to another i am going to useREADSEQ.
 
so................i am requesting you all to please help me in this project work and send me your valuable suggestions and opinions about my project.......i will be eagerly waiting for your reply........................................and also i hope i can get response to my msg as soon as possible.
 
and also for all these i have chosen java for implementing my ideas and also suggest me which database model (relational or object oriented data model) i have to use for building simple database of mine or can i do all these things simply by using files.............................
 
i am praying the god ,sincerely,to help me...........and to be with me throughout this project work and forever........................                                   
                               thank you one and all
                                           from 
                                      prathibha
                                       

Yahoo! India Matrimony: Find your partner online.Post your profile.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030912/37232c9d/attachment.html>

From mgollery at unr.edu  Fri Sep 12 13:14:36 2003
From: mgollery at unr.edu (Martin Gollery)
Date: Fri, 12 Sep 2003 10:14:36 -0700
Subject: [BiO BB] Implementing HMM models in Hardware (FPGA)
In-Reply-To: <20030912050705.48283.qmail@web40508.mail.yahoo.com>
References: <20030912050705.48283.qmail@web40508.mail.yahoo.com>
Message-ID: <1063386876.3f61fefcc8c5a@webmail.unr.edu>

Hi Venu,
  Yes, this idea makes sense. Anyone who uses InterProscan wishes it were 
faster! The current customer base includes all the big genome centers, 
universities, many biotechs and larger Pharmas. Other people have done projects 
in the field- Kestral at UCSC has been around for years, I don't know if it has 
been updated recently. Hokiegene out of Virginia is a newer system that shows 
great performance. There are several other commercial ventures that are in 
stealth mode right now.
   HMM's really lend themselves well to acceleration of this type. Perhaps you 
could focus on accelerating SAM instead of HMMer, as many people prefer it and 
TimeLogic, Compugen and Paracel do not accelerate SAM. Also, nobody (to my 
knowledge) has accelerated ClustalW, so this might be something else to 
consider.

Best of luck! Sounds like an interesting project!

Marty
   

3. Quoting Venu Kode <vkode78 at yahoo.com>:

> Hello everyone,
>  
> I am graduate student in computer engineering doing my thesis in the
> following topic:
>  
> "Matching Protein sequences with HMM models in FPGAs ( Field Programmable
> Gate Arrays ) using Run Time Reconfiguration"
>  
> If you are familiar with Decypher tool from TimeLogic, my work involves
> something similar to that.  Decypher is a very comprehensive and expensive
> tool and sure it does deliver excellent performance compared to the software
> tools such as HMMR, SAM or the other tools. My work falls in between the
> software solution and that of Decypher.
>  
> Specifically, when developed my tool would be considerably faster than that
> of the software tools currently available and also be affordable at a
> fraction of the cost of using Decypher. 
>  
> I have just started digging into Bioinformatics and have read quite a number
> of papers and all, but I am still a little confused and would like any
> comments or suggestions from you:
>  
> 1) Does my tool make any sense at all? 
> 2) What is the current customer base like for this technology? 
> 3) What sort of companies do the work of matching protein/DNA sequences with
> existing models? 
> 4) Is there a need for a less comprehensive and less expensive tool as
> opposed to Decypher for customers who want to get it done a lot less cheaper
> but wouldnt mind the extra penalty in performace ( ofcourse will be very much
> better than that of the software searching)
> 5) Any comments, questions, suggestions?
> 6) Any pointers for me in terms of websites or resources.
>  
> I would very much appreciate your comments
>  
> TIA
> Kode
> 
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software


Martin Gollery
Associate Director of Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS330
New phone number! 775-784-7042


-------------------------------------------------
This mail sent through https://webmail.unr.edu


From Eitan.Rubin at weizmann.ac.il  Sat Sep 13 18:23:21 2003
From: Eitan.Rubin at weizmann.ac.il (Eitan Rubin)
Date: Sun, 14 Sep 2003 00:23:21 +0200
Subject: [BiO BB] RE: Hardware implemntation of HMMs
References: <20030913160139.825B9D2610@www.bioinformatics.org>
Message-ID: <001101c37a45$a55d19a0$0101c80a@weizmannnhc4zn>

Hi,

  As far as I know, Compugen sells hardware accelerators for HMMs (see
http://www.cgen.com). Their website specifically mentions HMMs acceleration.

  Eitan
------------------------------------------------------------------------
Eitan Rubin, PhD
Head of Bioinformatics and Biological Computing
Dept. Biological Services
Weizmann Institute of Science
Tel: +972-8-9343456
Fax: +972-8-9346006
----- Original Message ----- 
From: <bio_bulletin_board-request at bioinformatics.org>
To: <bio_bulletin_board at bioinformatics.org>
Sent: Saturday, September 13, 2003 6:01 PM
Subject: BiO_Bulletin_Board digest, Vol 1 #521 - 1 msg


> When replying, PLEASE edit your Subject line so it is more specific
> than "Re: BiO_Bulletin_Board digest, Vol..."
>
>
> Today's Topics:
>
>    1. Re: Implementing HMM models in Hardware (FPGA) (Martin Gollery)
>
> --__--__--
>
> Message: 1
> Date: Fri, 12 Sep 2003 10:14:36 -0700
> From: Martin Gollery <mgollery at unr.edu>
> To: bio_bulletin_board at bioinformatics.org
> Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA)
> Reply-To: bio_bulletin_board at bioinformatics.org
>
> Hi Venu,
>   Yes, this idea makes sense. Anyone who uses InterProscan wishes it were
> faster! The current customer base includes all the big genome centers,
> universities, many biotechs and larger Pharmas. Other people have done
projects
> in the field- Kestral at UCSC has been around for years, I don't know if
it has
> been updated recently. Hokiegene out of Virginia is a newer system that
shows
> great performance. There are several other commercial ventures that are in
> stealth mode right now.
>    HMM's really lend themselves well to acceleration of this type. Perhaps
you
> could focus on accelerating SAM instead of HMMer, as many people prefer it
and
> TimeLogic, Compugen and Paracel do not accelerate SAM. Also, nobody (to my
> knowledge) has accelerated ClustalW, so this might be something else to
> consider.
>
> Best of luck! Sounds like an interesting project!
>
> Marty
>
>
> 3. Quoting Venu Kode <vkode78 at yahoo.com>:
>
> > Hello everyone,
> >
> > I am graduate student in computer engineering doing my thesis in the
> > following topic:
> >
> > "Matching Protein sequences with HMM models in FPGAs ( Field
Programmable
> > Gate Arrays ) using Run Time Reconfiguration"
> >
> > If you are familiar with Decypher tool from TimeLogic, my work involves
> > something similar to that.  Decypher is a very comprehensive and
expensive
> > tool and sure it does deliver excellent performance compared to the
software
> > tools such as HMMR, SAM or the other tools. My work falls in between the
> > software solution and that of Decypher.
> >
> > Specifically, when developed my tool would be considerably faster than
that
> > of the software tools currently available and also be affordable at a
> > fraction of the cost of using Decypher.
> >
> > I have just started digging into Bioinformatics and have read quite a
number
> > of papers and all, but I am still a little confused and would like any
> > comments or suggestions from you:
> >
> > 1) Does my tool make any sense at all?
> > 2) What is the current customer base like for this technology?
> > 3) What sort of companies do the work of matching protein/DNA sequences
with
> > existing models?
> > 4) Is there a need for a less comprehensive and less expensive tool as
> > opposed to Decypher for customers who want to get it done a lot less
cheaper
> > but wouldnt mind the extra penalty in performace ( ofcourse will be very
much
> > better than that of the software searching)
> > 5) Any comments, questions, suggestions?
> > 6) Any pointers for me in terms of websites or resources.
> >
> > I would very much appreciate your comments
> >
> > TIA
> > Kode
> >
> >
> >
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! SiteBuilder - Free, easy-to-use web site design software
>
>
> Martin Gollery
> Associate Director of Bioinformatics
> University of Nevada at Reno
> Dept. of Biochemistry / MS330
> New phone number! 775-784-7042
>
>
> -------------------------------------------------
> This mail sent through https://webmail.unr.edu
>
>
>
> --__--__--
>
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>
> End of BiO_Bulletin_Board Digest
>


From johnjakson at yahoo.com  Sun Sep 14 03:00:07 2003
From: johnjakson at yahoo.com (John Jakson)
Date: Sun, 14 Sep 2003 00:00:07 -0700 (PDT)
Subject: [BiO BB] Implementing HMM models in Hardware (FPGA)
In-Reply-To: <20030912050705.48283.qmail@web40508.mail.yahoo.com>
Message-ID: <20030914070007.4364.qmail@web13104.mail.yahoo.com>


Hi Venu

Interesting to see others interested in applying FPGAs to Bioinformatics. FPGAs don't get much mention here.

I am not convinced the Bio industry really cares for EE solutions it doesn't understand. Linux clusters are bad enough but what the hell are FPGAs. As an EE VLSI/FPGA hardhat visitor at the BioWorld show, held here in Boston not that long ago all I saw was disinterest and plenty of tower server racks. Not one HW company showed up with anything but Linux clusters or the SGI/IBM/HP/... equivalent. TimeLogic & the 1 or 2 other (defunct ?) accelerator companies were noshow. On talking with the floor folks I found no interest or basic understanding of possible HW alternatives.

The issue comes down to how the problem is stated and how it can be implemented in a solution that most Bio SW types can understand. That means whatever the engine is, it must just run C code, simple as that, preferably the free stuff from NCBI. That always leads to the same solution, clusters of ever faster & ever hotter farms of todays x86. Any rational computer scientist knows this is crazy, and that dedicated HW should be built. TimeLogic says it very well on their web site. In crypto, video or DSP processing, it is relatively easy to turn C code into HW since they are all math intensive and are likely created by the same EEs. 

It may come as a surprise to SW types but HW is routinely modeled in C, but that code is used only to double check the design written in a decent HW description languages like Verilog or VHDL both of which are implicitly parallel languages. There is usually some formal mathematical model often written in Matlab for the real heavy stuff. Its also interesting that the Matlab code usually floating point intensive & the final ASIC/FPGA solutions are not expected to produce identical results since HW is best built integer fashion. One might regard the current Bio C codes as just simulations of HW that hasn't been built yet since few know how to recode them in HW language. TimeLogic did a few but not in a way that can be easily duplicated across the industry.

To turn C code into really fast HW requires understanding what the C code is really doing and having permission to make subtle but harmless changes to it to allow the really big speed ups. That means eliminating floating point. If the Bio author of such SW is also a HW expert (of which there are probably only a handfull or even 0 in the whole world) then equivalent algorithms could be used that are relatively simply to map onto HW structures. I don't see the Bio world hiring too many HW EEs either, we are far too different culturally and we usually don't have Phds, esp not from the right schools.

There are other ways to turn C code into HW, maybe use a C based HW language such as HandelC which is based on Occam & CSP. And there's the clue. If the SW is broken up into the constituent parallel processes that are naturally there but impossible to describe in plain C, then it becomes almost trivial to map those parallel processes onto FPGA fabric or even something like a Transputer farm. The only difference is the granularity. FPGAs are hot today but can only readily be engineered by HW types because their most efficient use requires detailed understanding of pipelines and combinatorial logic and basic cpu design. Transputers if they still existed would be the natural way to go because they are ameniable to both SW & HW engineers but they still worked best when SW & HW were both understood. Occam was just a way to describe parallel processes that decribed HW in a funny syntax. Transputers only died out because the implementation fell far behind x86 performance and was sin
 gle
 sourced & underfunded. Most Transputer projects & users ultimately switched to standard DSPs & FPGA leaving the SW user base behind.

Another approach would be to use one of the cpu farms on a chip such as Clearspeed or PicoChip or BOPs (RIP) who have developed risc cpus that can be upto 420 instances on a chip running at 100MHz plus. Interesting to see if those devices can escape cell phone basestations.

So I have taken my passive interest in this subject back to the drawing board to recreate a modern FPGA hosted Transputer that would naturally execute sequential C code, or parallel Occam code & even Verilog code. That means that if code can be partially migrated from seq C to par Occam style C (ie HandelC) then to Verilog ( a C'ish like HW language), the same code still runs on the same cpu (but a little slower perhaps). Extra process scheduling HW is needed to support very fine grained concurrency in a modern Transputer and also a logic simulator. The big pay off is that properly parallized code once in Verilog form still runs either as compiled source code on a farm of cpus using message passing and links, or it can be synthesized with industry standard HW tools back onto the FPGA fabric for the desired speed ups. In effect, sequential procedures in C code can be morphed into on chip HW coproceesors using the reconfigurable features of many FPGAs. Stable FPGA coprocessor e
 ngines
 can then be turned into much faster and cheaper ASICs in return for nasty upfront NRE. Such solutions could go much farther than current TimeLogics products for many industries beside Bio.

Xilinx & Intel can give us a clue here. A cluster cpu node based on a P4 at say 3GHz might run to $2K per node depending on whats there even though the fastest P4 chip is always say $600. An FPGA RISC cpu node based on MicroBlaze runs at maybe only 125MHz but will cost about $1.40 per node in volume plus extra support. Now if the cpu can be farmed by adding those Transputer extensions, the 24x clock difference doesn't looks so bad compared to the est 400+ fold cpu cost difference. Also a lot of slower cpus each with local RLDRAM don't have the memory latency that P4s suffer from ie 1 DRAM cycle is a few cpu cycle instead of hundreds, and distributed bandwidth is much easier to manage.

Its also interesting to see the changes at TimeLogic, the departure of Jim and the merger with a company that I see has no obvious HW background.

Regards

John Jakson

sorry for long rant


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030914/38522ba1/attachment.html>

From alokkumar_ait at rediffmail.com  Mon Sep 15 01:40:28 2003
From: alokkumar_ait at rediffmail.com (Alok  Kumar)
Date: 15 Sep 2003 05:40:28 -0000
Subject: [BiO BB] Re: BiO_Bulletin_Board digest, Vol 1 #518 - 5 msgs
Message-ID: <20030915054028.15697.qmail@webmail18.rediffmail.com>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/eb0172f3/attachment.ksh>

From prathibha_562 at yahoo.co.in  Mon Sep 15 09:30:15 2003
From: prathibha_562 at yahoo.co.in (=?iso-8859-1?q?prathibha=20bharathi?=)
Date: Mon, 15 Sep 2003 14:30:15 +0100 (BST)
Subject: [BiO BB] feeling discouraging after seeing this much poor response
Message-ID: <20030915133015.92588.qmail@web8104.in.yahoo.com>

i am feeling discouraging after seeing this much poor response..........
and i am also thinking about dropping my idea of doing my project on protein seq analysis..........................bye all.............................


Yahoo! India Matrimony: Find your partner online.Post your profile.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/106d66ad/attachment.html>

From val at vtek.com  Mon Sep 15 10:00:12 2003
From: val at vtek.com (val)
Date: Mon, 15 Sep 2003 10:00:12 -0400
Subject: [BiO BB] feeling discouraging after seeing this much poor response
References: <20030915133015.92588.qmail@web8104.in.yahoo.com>
Message-ID: <266201c37b91$b1b09540$6400a8c0@vt1000>

    Yes, a good point.  Drop it, and look around to
find a really exciting idea to invest your time..
    And keep on enjoying life..
my very best,
val

----- Original Message -----
From: prathibha bharathi
To: bio_bulletin_board at bioinformatics.org
Sent: Monday, September 15, 2003 9:30 AM
Subject: [BiO BB] feeling discouraging after seeing this much poor response


i am feeling discouraging after seeing this much poor response..........
and i am also thinking about dropping my idea of doing my project on protein
seq analysis..........................bye all.............................
Yahoo! India Matrimony: Find your partner online. Post your profile.


From deletto at unisa.it  Mon Sep 15 10:14:44 2003
From: deletto at unisa.it (deletto at unisa.it)
Date: Mon, 15 Sep 2003 16:14:44 +0200
Subject: [BiO BB] functional clustering among Affymetrix data
In-Reply-To: <266201c37b91$b1b09540$6400a8c0@vt1000>
References: <20030915133015.92588.qmail@web8104.in.yahoo.com> <266201c37b91$b1b09540$6400a8c0@vt1000>
Message-ID: <1063635284.3f65c954e8941@webmail.unisa.it>

Dear All,
I am sorry of distrurbing you (anyone of you!!!) with a very trivial question:
I am wondering whether it is available a simple tool online I could find useful
for my purpose: I would like to cluseter a data set (collected from Affym.
GeneChip U34A,B & C) regarding on the biological functions. In other words, I'd
like to draw a statistical graph (like a nice cake) where I can put in all my
data set, selecting all genes with a clear and known function (biological,
physiological and for compartment localization).
Could anyone (of you) help me, showing how I can purchase this (hard almost for
me) task?
I will appreciate it too much :::)))

Thanks in advance, 
                   all the best
              Davide (phD student)


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/


From boris.steipe at utoronto.ca  Mon Sep 15 10:28:39 2003
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 15 Sep 2003 10:28:39 -0400
Subject: [BiO BB] functional clustering among Affymetrix data
References: <20030915133015.92588.qmail@web8104.in.yahoo.com> <266201c37b91$b1b09540$6400a8c0@vt1000> <1063635284.3f65c954e8941@webmail.unisa.it>
Message-ID: <3F65CC95.2ECEE923@utoronto.ca>

deletto at unisa.it wrote:
> 
[...]
> I would like to cluster a data set (collected from Affym.
> GeneChip U34A,B & C) regarding on the biological functions. In other words, I'd
> like to draw a statistical graph (like a nice cake) where I can put in all my
> data set, selecting all genes with a clear and known function (biological,
> physiological and for compartment localization).
[...]


Have you tried the GenePublisher site ?
<http://www.cbs.dtu.dk/services/GeneMachine>


Best regards,


Boris

---
Boris Steipe
University of Toronto
Program in Proteomics & Bioinformatics
Departments of Biochemistry & Molecular and Medical Genetics
http://biochemistry.utoronto.ca/steipe


From mgollery at unr.edu  Mon Sep 15 13:12:04 2003
From: mgollery at unr.edu (Martin Gollery)
Date: Mon, 15 Sep 2003 10:12:04 -0700
Subject: [BiO BB] RE: Hardware implemntation of HMMs
In-Reply-To: <001101c37a45$a55d19a0$0101c80a@weizmannnhc4zn>
References: <20030913160139.825B9D2610@www.bioinformatics.org> <001101c37a45$a55d19a0$0101c80a@weizmannnhc4zn>
Message-ID: <1063645924.3f65f2e4ba007@webmail.unr.edu>

Yes, they do. They accelerate HMMer models, not SAM. 

Marty

Quoting Eitan Rubin <Eitan.Rubin at weizmann.ac.il>:

> Hi,
> 
>   As far as I know, Compugen sells hardware accelerators for HMMs (see
> http://www.cgen.com). Their website specifically mentions HMMs
> acceleration.
> 
>   Eitan
> ------------------------------------------------------------------------
> Eitan Rubin, PhD
> Head of Bioinformatics and Biological Computing
> Dept. Biological Services
> Weizmann Institute of Science
> Tel: +972-8-9343456
> Fax: +972-8-9346006
> ----- Original Message ----- 
> From: <bio_bulletin_board-request at bioinformatics.org>
> To: <bio_bulletin_board at bioinformatics.org>
> Sent: Saturday, September 13, 2003 6:01 PM
> Subject: BiO_Bulletin_Board digest, Vol 1 #521 - 1 msg
> 
> 
> > When replying, PLEASE edit your Subject line so it is more specific
> > than "Re: BiO_Bulletin_Board digest, Vol..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Implementing HMM models in Hardware (FPGA) (Martin Gollery)
> >
> > --__--__--
> >
> > Message: 1
> > Date: Fri, 12 Sep 2003 10:14:36 -0700
> > From: Martin Gollery <mgollery at unr.edu>
> > To: bio_bulletin_board at bioinformatics.org
> > Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA)
> > Reply-To: bio_bulletin_board at bioinformatics.org
> >
> > Hi Venu,
> >   Yes, this idea makes sense. Anyone who uses InterProscan wishes it were
> > faster! The current customer base includes all the big genome centers,
> > universities, many biotechs and larger Pharmas. Other people have done
> projects
> > in the field- Kestral at UCSC has been around for years, I don't know if
> it has
> > been updated recently. Hokiegene out of Virginia is a newer system that
> shows
> > great performance. There are several other commercial ventures that are
> in
> > stealth mode right now.
> >    HMM's really lend themselves well to acceleration of this type.
> Perhaps
> you
> > could focus on accelerating SAM instead of HMMer, as many people prefer
> it
> and
> > TimeLogic, Compugen and Paracel do not accelerate SAM. Also, nobody (to
> my
> > knowledge) has accelerated ClustalW, so this might be something else to
> > consider.
> >
> > Best of luck! Sounds like an interesting project!
> >
> > Marty
> >
> >
> > 3. Quoting Venu Kode <vkode78 at yahoo.com>:
> >
> > > Hello everyone,
> > >
> > > I am graduate student in computer engineering doing my thesis in the
> > > following topic:
> > >
> > > "Matching Protein sequences with HMM models in FPGAs ( Field
> Programmable
> > > Gate Arrays ) using Run Time Reconfiguration"
> > >
> > > If you are familiar with Decypher tool from TimeLogic, my work involves
> > > something similar to that.  Decypher is a very comprehensive and
> expensive
> > > tool and sure it does deliver excellent performance compared to the
> software
> > > tools such as HMMR, SAM or the other tools. My work falls in between
> the
> > > software solution and that of Decypher.
> > >
> > > Specifically, when developed my tool would be considerably faster than
> that
> > > of the software tools currently available and also be affordable at a
> > > fraction of the cost of using Decypher.
> > >
> > > I have just started digging into Bioinformatics and have read quite a
> number
> > > of papers and all, but I am still a little confused and would like any
> > > comments or suggestions from you:
> > >
> > > 1) Does my tool make any sense at all?
> > > 2) What is the current customer base like for this technology?
> > > 3) What sort of companies do the work of matching protein/DNA sequences
> with
> > > existing models?
> > > 4) Is there a need for a less comprehensive and less expensive tool as
> > > opposed to Decypher for customers who want to get it done a lot less
> cheaper
> > > but wouldnt mind the extra penalty in performace ( ofcourse will be
> very
> much
> > > better than that of the software searching)
> > > 5) Any comments, questions, suggestions?
> > > 6) Any pointers for me in terms of websites or resources.
> > >
> > > I would very much appreciate your comments
> > >
> > > TIA
> > > Kode
> > >
> > >
> > >
> > > ---------------------------------
> > > Do you Yahoo!?
> > > Yahoo! SiteBuilder - Free, easy-to-use web site design software
> >
> >
> > Martin Gollery
> > Associate Director of Bioinformatics
> > University of Nevada at Reno
> > Dept. of Biochemistry / MS330
> > New phone number! 775-784-7042
> >
> >
> > -------------------------------------------------
> > This mail sent through https://webmail.unr.edu
> >
> >
> >
> > --__--__--
> >
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> >
> >
> > End of BiO_Bulletin_Board Digest
> >
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


Martin Gollery
Associate Director of Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS330
New phone number! 775-784-7042


-------------------------------------------------
This mail sent through https://webmail.unr.edu


From val at vtek.com  Mon Sep 15 14:57:22 2003
From: val at vtek.com (val)
Date: Mon, 15 Sep 2003 14:57:22 -0400
Subject: [BiO BB] Implementing HMM models in Hardware (FPGA)
References: <20030914070007.4364.qmail@web13104.mail.yahoo.com>
Message-ID: <271401c37bbb$343ab7b0$6400a8c0@vt1000>

Hi John/All:
    Thanx John for an interesting and refreshing post.
Your points sound very reasonable to me, although this is a
CS/cpu side of the story.  What about other side, biochip side,
a direction which might be taken more comfortable than
HW accelerators.  In other words, a computational acceleration
seems to be a good thing, but this is just a fragment of the
whole cell analysis *pipeline*.
    Indeed, a final goal of bioinformatics and generally in-silico
cell analysis is to understand cell mechanisms/processes
and then based on that proceeed to drug design/discovery activities.
>From that perspective, further evolution and advancements in biochip
design and functionality would be a step in right direction.
And i mean silicon functionality, when talking about biochips,
and related data.  Silicon designed for (floating point)
computing, including multiprocessor and cluster options, is still
very much silicon designed ~50 years ago and having little to do
with cell mechanisms analysis, understanding its result and then
using it for biomedical applications.
    So when designing silicon functionality, why don't start right
from using silicon to implement a whole cell analysis pipeline?
Silicon - but not just a computational one, rather a *biotechnology*
(bt) silicon.  That is, silicon directly interfaced electrically
with cells (in culture, 'a real sample').  The interface would
include an *input plane" (sensor plane) and an "output plane"
(driving plane).  And a recognition and storage logic in-between.
This is indeed a quite known "system/lab-on-the-chip" approach, with
the lab directly interfaced with a sample, including
(on the following phases) electrical driving facilities designed
to move and/or immobilize cells, perform transfection,
electroporation and other cell modification operations.
    Of course, such an active biochip would be a massively
parallel processor, and can be called a biotechnology (bt)processor
(vs. computational processor) since it directly implements
a programmable cell analysis technology pipeline - input, processing
and modification.  Optical fluorescent binding patterns can also
be measured with such a chip.  Its obvious advantage is that dynamic
analysis in time can be performed on the same chip, say, yeast life cycle
dynamics with a fine time resolution (say, seconds and less instead of
minutes).
    What seems to be a really good news is that such a silicon can and needs
to be designed as an *array*, a massively parallel, fine-grain architecture
with a relatively simple microcell (vs. spagetti-like x86s).  If the total
number of transistors on
the "lab-on-the-chip" is ~10B (which is what possible now), a grain
(microprocessor) in the mega-array (1000x1000 microprocessors) may have up
to
10K transistors which is quite enough to implement the basic input/output
and
processing functionality at a grain level.  For a ~1 sq.in. chip, a grain
would be of 250 um size.  The input plane of a grain might have up to 32x32
sensors, so that linear spatial resolution for cell analysis would be ~8 um
which is Ok for mid-size and large cells (on average, an animal cell size is
~10um).
    So, i guess my point is that it does not make a lot of sense to
accelerate,
optimize, etc a fragment of the pipeline without looking at an integrated
cell/tissue analysis pipeline - how/where silicon functionality can be
applied.
cheers,
val

----- Original Message -----
From: John Jakson
To: bio_bulletin_board at bioinformatics.org
Sent: Sunday, September 14, 2003 3:00 AM
Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA)


Hi Venu
Interesting to see others interested in applying FPGAs to Bioinformatics.
FPGAs don't get much mention here.
I am not convinced the Bio industry really cares for EE solutions it doesn't
understand. Linux clusters are bad enough but what the hell are FPGAs. As an
EE VLSI/FPGA hardhat visitor at the BioWorld show, held here in Boston not
that long ago all I saw was disinterest and plenty of tower server racks.
Not one HW company showed up with anything but Linux clusters or the
SGI/IBM/HP/... equivalent. TimeLogic & the 1 or 2 other (defunct ?)
accelerator companies were noshow. On talking with the floor folks I found
no interest or basic understanding of possible HW alternatives.
The issue comes down to how the problem is stated and how it can be
implemented in a solution that most Bio SW types can understand. That means
whatever the engine is, it must just run C code, simple as that, preferably
the free stuff from NCBI. That always leads to the same solution, clusters
of ever faster & ever hotter farms of todays x86. Any rational computer
scientist knows this is crazy, and that dedicated HW should be built.
TimeLogic says it very well on their web site. In crypto, video or DSP
processing, it is relatively easy to turn C code into HW since they are all
math intensive and are likely created by the same EEs.
It may come as a surprise to SW types but HW is routinely modeled in C, but
that code is used only to double check the design written in a decent HW
description languages like Verilog or VHDL both of which are implicitly
parallel languages. There is usually some formal mathematical model often
written in Matlab for the real heavy stuff. Its also interesting that the
Matlab code usually floating point intensive & the final ASIC/FPGA solutions
are not expected to produce identical results since HW is best built integer
fashion. One might regard the current Bio C codes as just simulations of HW
that hasn't been built yet since few know how to recode them in HW language.
TimeLogic did a few but not in a way that can be easily duplicated across
the industry.
To turn C code into really fast HW requires understanding what the C code is
really doing and having permission to make subtle but harmless changes to it
to allow the really big speed ups. That means eliminating floating point. If
the Bio author of such SW is also a HW expert (of which there are probably
only a handfull or even 0 in the whole world) then equivalent algorithms
could be used that are relatively simply to map onto HW structures. I don't
see the Bio world hiring too many HW EEs either, we are far too different
culturally and we usually don't have Phds, esp not from the right schools.
There are other ways to turn C code into HW, maybe use a C based HW language
such as HandelC which is based on Occam & CSP. And there's the clue. If the
SW is broken up into the constituent parallel processes that are naturally
there but impossible to describe in plain C, then it becomes almost trivial
to map those parallel processes onto FPGA fabric or even something like a
Transputer farm. The only difference is the granularity. FPGAs are hot today
but can only readily be engineered by HW types because their most efficient
use requires detailed understanding of pipelines and combinatorial logic and
basic cpu design. Transputers if they still existed would be the natural way
to go because they are ameniable to both SW & HW engineers but they still
worked best when SW & HW were both understood. Occam was just a way to
describe parallel processes that decribed HW in a funny syntax. Transputers
only died out because the implementation fell far behind x86 performa nce
and was single sourced & underfunded. Most Transputer projects & users
ultimately switched to standard DSPs & FPGA leaving the SW user base behind.
Another approach would be to use one of the cpu farms on a chip such as
Clearspeed or PicoChip or BOPs (RIP) who have developed risc cpus that can
be upto 420 instances on a chip running at 100MHz plus. Interesting to see
if those devices can escape cell phone basestations.
So I have taken my passive interest in this subject back to the drawing
board to recreate a modern FPGA hosted Transputer that would naturally
execute sequential C code, or parallel Occam code & even Verilog code. That
means that if code can be partially migrated from seq C to par Occam style C
(ie HandelC) then to Verilog ( a C'ish like HW language), the same code
still runs on the same cpu (but a little slower perhaps). Extra process
scheduling HW is needed to support very fine grained concurrency in a modern
Transputer and also a logic simulator. The big pay off is that properly
parallized code once in Verilog form still runs either as compiled source
code on a farm of cpus using message passing and links, or it can be
synthesized with industry standard HW tools back onto the FPGA fabric for
the desired speed ups. In effect, sequential procedures in C code can be
morphed into on chip HW coproceesors using the reconfigurable features of
many FPGAs. Stable FPGA coproc essor engines can then be turned into much
faster and cheaper ASICs in return for nasty upfront NRE. Such solutions
could go much farther than current TimeLogics products for many industries
beside Bio.
Xilinx & Intel can give us a clue here. A cluster cpu node based on a P4 at
say 3GHz might run to $2K per node depending on whats there even though the
fastest P4 chip is always say $600. An FPGA RISC cpu node based on
MicroBlaze runs at maybe only 125MHz but will cost about $1.40 per node in
volume plus extra support. Now if the cpu can be farmed by adding those
Transputer extensions, the 24x clock difference doesn't looks so bad
compared to the est 400+ fold cpu cost difference. Also a lot of slower cpus
each with local RLDRAM don't have the memory latency that P4s suffer from ie
1 DRAM cycle is a few cpu cycle instead of hundreds, and distributed
bandwidth is much easier to manage.
Its also interesting to see the changes at TimeLogic, the departure of Jim
and the merger with a company that I see has no obvious HW background.
Regards
John Jakson
sorry for long rant


Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software


From vkode78 at yahoo.com  Mon Sep 15 16:37:23 2003
From: vkode78 at yahoo.com (Venu Kode)
Date: Mon, 15 Sep 2003 13:37:23 -0700 (PDT)
Subject: [BiO BB] Implementing HMM models in Hardware (FPGA)
In-Reply-To: <271401c37bbb$343ab7b0$6400a8c0@vt1000>
Message-ID: <20030915203723.32578.qmail@web40512.mail.yahoo.com>

Mark, John, and Val,  Thank you all for your perspectives. I couldnt have asked for any thing better. I am sure one day (soon enough) there will be a happy family of EEs, CSs and Bioinformatics all on the same page. Compared to you guys, I am a novice both in Hardware design and the Bio side. It was just great to get all these persepctives from insiders. And I thank you all for it. But for what it is worth, here is my take on it.
 
Clusters seems to be the buzz word in the Bioinformatics world these days. They want to build bigger and better clusters. And I dont blame them. There is a huge loads of data that need to be processed in these massive parallel implementations. Here is probably the cost involved in these clusters:
 
1) Each Node ~ $600 
2) Bottlenecks, Overhead relating to the parallel implementation
 
This almost is analogous to using Brute force using multi purpose processors, and parallel implementations and all this with huge cost. I am sure there can be a better solution to all this. John mentioned Biochips, that sure would be nice but it certainly wont be a reality until a decade from now ( ignorant guess, please pardon me). With the current status quo sticking to computational processors, one of the alternatives could be Taaddaaa FPGA!!!!
 
FPGAs as Hardware Accelerators:
1) Run Time Reconfigurable(RTR)
2) High Density Devices that can implement highly parallel implemenations
 
There will be a huge cost/performance difference between clusters and array of FPGAs.
Computatations like Viterbi Decoding, Forward & Backward algorithms, and Log odd scores all of these can be very efficiently implemeted on FPGA because of the Look up Table architecture of FPGAs. Thats just the beginning. And these devices are run time reconfigurable. So I can have the Processing Engines ( PEs) which I call fixed logic which work on the data ( HMM profiles) which I call the Reconfigurable Logic(RC). These PEs will be working on the RC data, and I can independently re configure just the RC to load new data ( next HMM profile ) using RTR and ofcourse this is just one PE and RC. and depending on the size of the FPGA device used I can have more than one PE and RC all working in parallel. 
 
Having said that, the trick is to make sure that the PEs are always busy, and the reconfiguration delay, and the computing cost involved in the host processor, all justfy the Cost/Performance parameter. And since everything is generated on the fly, there has to be an optimal way of scheduling reconfiguration. And that can be hard as well.
And ofcourse this eliminates the need for dedicated memory ON/OFF chip, at the cost of the reconfiguration delay. Key to all this is Cost/Performance factor. 
 
Please feel free to throw in your comments,suggestions & questions. I sure would like to know if there any issues that you see with the route I am taking. I hope to have a website soon where I can post my progress. 
 
Thanks again ,
Venu
val <val at vtek.com> wrote:
Hi John/All:
Thanx John for an interesting and refreshing post.
Your points sound very reasonable to me, although this is a
CS/cpu side of the story. What about other side, biochip side,
a direction which might be taken more comfortable than
HW accelerators. In other words, a computational acceleration
seems to be a good thing, but this is just a fragment of the
whole cell analysis *pipeline*.
Indeed, a final goal of bioinformatics and generally in-silico
cell analysis is to understand cell mechanisms/processes
and then based on that proceeed to drug design/discovery activities.
>From that perspective, further evolution and advancements in biochip
design and functionality would be a step in right direction.
And i mean silicon functionality, when talking about biochips,
and related data. Silicon designed for (floating point)
computing, including multiprocessor and cluster options, is still
very much silicon designed ~50 years ago and having little to do
with cell mechanisms analysis, understanding its result and then
using it for biomedical applications.
So when designing silicon functionality, why don't start right
from using silicon to implement a whole cell analysis pipeline?
Silicon - but not just a computational one, rather a *biotechnology*
(bt) silicon. That is, silicon directly interfaced electrically
with cells (in culture, 'a real sample'). The interface would
include an *input plane" (sensor plane) and an "output plane"
(driving plane). And a recognition and storage logic in-between.
This is indeed a quite known "system/lab-on-the-chip" approach, with
the lab directly interfaced with a sample, including
(on the following phases) electrical driving facilities designed
to move and/or immobilize cells, perform transfection,
electroporation and other cell modification operations.
Of course, such an active biochip would be a massively
parallel processor, and can be called a biotechnology (bt)processor
(vs. computational processor) since it directly implements
a programmable cell analysis technology pipeline - input, processing
and modification. Optical fluorescent binding patterns can also
be measured with such a chip. Its obvious advantage is that dynamic
analysis in time can be performed on the same chip, say, yeast life cycle
dynamics with a fine time resolution (say, seconds and less instead of
minutes).
What seems to be a really good news is that such a silicon can and needs
to be designed as an *array*, a massively parallel, fine-grain architecture
with a relatively simple microcell (vs. spagetti-like x86s). If the total
number of transistors on
the "lab-on-the-chip" is ~10B (which is what possible now), a grain
(microprocessor) in the mega-array (1000x1000 microprocessors) may have up
to
10K transistors which is quite enough to implement the basic input/output
and
processing functionality at a grain level. For a ~1 sq.in. chip, a grain
would be of 250 um size. The input plane of a grain might have up to 32x32
sensors, so that linear spatial resolution for cell analysis would be ~8 um
which is Ok for mid-size and large cells (on average, an animal cell size is
~10um).
So, i guess my point is that it does not make a lot of sense to
accelerate,
optimize, etc a fragment of the pipeline without looking at an integrated
cell/tissue analysis pipeline - how/where silicon functionality can be
applied.
cheers,
val

----- Original Message -----
From: John Jakson
To: bio_bulletin_board at bioinformatics.org
Sent: Sunday, September 14, 2003 3:00 AM
Subject: Re: [BiO BB] Implementing HMM models in Hardware (FPGA)


Hi Venu
Interesting to see others interested in applying FPGAs to Bioinformatics.
FPGAs don't get much mention here.
I am not convinced the Bio industry really cares for EE solutions it doesn't
understand. Linux clusters are bad enough but what the hell are FPGAs. As an
EE VLSI/FPGA hardhat visitor at the BioWorld show, held here in Boston not
that long ago all I saw was disinterest and plenty of tower server racks.
Not one HW company showed up with anything but Linux clusters or the
SGI/IBM/HP/... equivalent. TimeLogic & the 1 or 2 other (defunct ?)
accelerator companies were noshow. On talking with the floor folks I found
no interest or basic understanding of possible HW alternatives.
The issue comes down to how the problem is stated and how it can be
implemented in a solution that most Bio SW types can understand. That means
whatever the engine is, it must just run C code, simple as that, preferably
the free stuff from NCBI. That always leads to the same solution, clusters
of ever faster & ever hotter farms of todays x86. Any rational computer
scientist knows this is crazy, and that dedicated HW should be built.
TimeLogic says it very well on their web site. In crypto, video or DSP
processing, it is relatively easy to turn C code into HW since they are all
math intensive and are likely created by the same EEs.
It may come as a surprise to SW types but HW is routinely modeled in C, but
that code is used only to double check the design written in a decent HW
description languages like Verilog or VHDL both of which are implicitly
parallel languages. There is usually some formal mathematical model often
written in Matlab for the real heavy stuff. Its also interesting that the
Matlab code usually floating point intensive & the final ASIC/FPGA solutions
are not expected to produce identical results since HW is best built integer
fashion. One might regard the current Bio C codes as just simulations of HW
that hasn't been built yet since few know how to recode them in HW language.
TimeLogic did a few but not in a way that can be easily duplicated across
the industry.
To turn C code into really fast HW requires understanding what the C code is
really doing and having permission to make subtle but harmless changes to it
to allow the really big speed ups. That means eliminating floating point. If
the Bio author of such SW is also a HW expert (of which there are probably
only a handfull or even 0 in the whole world) then equivalent algorithms
could be used that are relatively simply to map onto HW structures. I don't
see the Bio world hiring too many HW EEs either, we are far too different
culturally and we usually don't have Phds, esp not from the right schools.
There are other ways to turn C code into HW, maybe use a C based HW language
such as HandelC which is based on Occam & CSP. And there's the clue. If the
SW is broken up into the constituent parallel processes that are naturally
there but impossible to describe in plain C, then it becomes almost trivial
to map those parallel processes onto FPGA fabric or even something like a
Transputer farm. The only difference is the granularity. FPGAs are hot today
but can only readily be engineered by HW types because their most efficient
use requires detailed understanding of pipelines and combinatorial logic and
basic cpu design. Transputers if they still existed would be the natural way
to go because they are ameniable to both SW & HW engineers but they still
worked best when SW & HW were both understood. Occam was just a way to
describe parallel processes that decribed HW in a funny syntax. Transputers
only died out because the implementation fell far behind x86 performa nce
and was single sourced & underfunded. Most Transputer projects & users
ultimately switched to standard DSPs & FPGA leaving the SW user base behind.
Another approach would be to use one of the cpu farms on a chip such as
Clearspeed or PicoChip or BOPs (RIP) who have developed risc cpus that can
be upto 420 instances on a chip running at 100MHz plus. Interesting to see
if those devices can escape cell phone basestations.
So I have taken my passive interest in this subject back to the drawing
board to recreate a modern FPGA hosted Transputer that would naturally
execute sequential C code, or parallel Occam code & even Verilog code. That
means that if code can be partially migrated from seq C to par Occam style C
(ie HandelC) then to Verilog ( a C'ish like HW language), the same code
still runs on the same cpu (but a little slower perhaps). Extra process
scheduling HW is needed to support very fine grained concurrency in a modern
Transputer and also a logic simulator. The big pay off is that properly
parallized code once in Verilog form still runs either as compiled source
code on a farm of cpus using message passing and links, or it can be
synthesized with industry standard HW tools back onto the FPGA fabric for
the desired speed ups. In effect, sequential procedures in C code can be
morphed into on chip HW coproceesors using the reconfigurable features of
many FPGAs. Stable FPGA coproc essor engines can then be turned into much
faster and cheaper ASICs in return for nasty upfront NRE. Such solutions
could go much farther than current TimeLogics products for many industries
beside Bio.
Xilinx & Intel can give us a clue here. A cluster cpu node based on a P4 at
say 3GHz might run to $2K per node depending on whats there even though the
fastest P4 chip is always say $600. An FPGA RISC cpu node based on
MicroBlaze runs at maybe only 125MHz but will cost about $1.40 per node in
volume plus extra support. Now if the cpu can be farmed by adding those
Transputer extensions, the 24x clock difference doesn't looks so bad
compared to the est 400+ fold cpu cost difference. Also a lot of slower cpus
each with local RLDRAM don't have the memory latency that P4s suffer from ie
1 DRAM cycle is a few cpu cycle instead of hundreds, and distributed
bandwidth is much easier to manage.
Its also interesting to see the changes at TimeLogic, the departure of Jim
and the merger with a company that I see has no obvious HW background.
Regards
John Jakson
sorry for long rant


Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software

_______________________________________________
BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/1df8d539/attachment.html>

From madhavi+bioml at cs.cmu.edu  Sat Sep 13 14:51:31 2003
From: madhavi+bioml at cs.cmu.edu (Madhavi Ganapathiraju)
Date: Sat, 13 Sep 2003 14:51:31 -0400
Subject: [BiO BB] BLC2003: Call for papers
Message-ID: <5.1.0.14.2.20030913143134.028e28b8@128.2.254.145>

This seemed an appropriate forum to post this information

Madhavi
=========================================
Biological Language Conference : Call for papers

Scope:
Integration of language technologies in bioinformatics/computational 
biology research
Protein sequences from different organisms may be viewed as texts written 
in different languages. The mapping of protein sequence to their structure, 
dynamics and function then becomes analogous to the mapping of words to 
meaning in natural languages. This analogy can be exploited by application 
of statistical language modeling and text classification techniques to 
biological sequences, thereby generating testable hypotheses regarding the 
fundamental building blocks of "protein sequence language".

The biology-language analogy enables novel applications of language 
technologies to the biology domain, but is to a great extent overlapping 
with existing other computational biology/bioinformatics applications. The 
purpose of the Biological Language Conference is to facilitate scientific 
exchange between researchers using the language analogy approach directly 
and researchers using other approaches.

We invite papers in the following areas of interest:
------------------------------------------------------------
secondary structure prediction
tertiary structure prediction
repetitive fold prediction
membrane protein-specific prediction challenges
protein folding/misfolding
conformational changes
genome evolution/comparison
sequence alignment
protein family classification
immune system
protein-protein interactions
protein/gene networks and pathways

Because of the challenge in bioinformatics research of involvement of 
non-biology domain experts in biology research, we also encourage 
submission of papers describing new approaches to cross-disciplinary education.

Venue and dates:

November 20-21, 2003 at Carnegie Mellon University in Pittsburgh, USA

The conference is organized by:
        Profs. Raj Reddy and Judith Klein-Seetharaman
        of the NSF-funded Center for Biological Language Modeling 
(http://www.cs.cmu.edu/~blmt).

Papers should be a maximum of 15 pages and will be peer-reviewed. All 
accepted papers will appear in the conference proceedings. Papers should be 
prepared according to the guidelines 
(http://flan.blm.cs.cmu.edu/meeting2003/template.doc) and submitted online 
here. Further information on the conference will be available at 
http://flan.blm.cs.cmu.edu/meeting2003/.


September 20th, 2003
   (Optional) White-paper abstract and indication of intention to submit a 
paper by email to judithks at cs.cmu.edu

October 20th, 2003
   Deadline for electronic paper submission

November 1st, 2003
   Notification of acceptance

November 10th, 2003
   Final camera-ready manuscript due

November 10th, 2003
   Registration deadline


Contact
Judith Klein-Seetharaman,
Language Technologies Institute,
School of Computer Science,
Carnegie Melon University,
Pittsburgh 15213 PA
USA
email: judithks at cs.cmu.edu,
phone: 412 383 7325,
fax: 412 648 1945.


From umajumdar at locuz.com  Mon Sep 15 11:00:06 2003
From: umajumdar at locuz.com (Uttam Majumdar)
Date: Mon, 15 Sep 2003 20:30:06 +0530
Subject: [BiO BB] feeling discouraging after seeing this much poor response
References: <20030915133015.92588.qmail@web8104.in.yahoo.com>
Message-ID: <002501c37b9a$0dd579a0$3fc0c0c0@UMAJUMDAR>

PLS UNSUBSCRIBE...
  ----- Original Message ----- 
  From: prathibha bharathi 
  To: bio_bulletin_board at bioinformatics.org 
  Sent: Monday, September 15, 2003 7:00 PM
  Subject: [BiO BB] feeling discouraging after seeing this much poor response


  i am feeling discouraging after seeing this much poor response..........
  and i am also thinking about dropping my idea of doing my project on protein seq analysis..........................bye all.............................
  Yahoo! India Matrimony: Find your partner online. Post your profile.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/92fdf3f6/attachment.html>

From patents at patentworks.com  Mon Sep 15 11:11:53 2003
From: patents at patentworks.com (Goneau & Lemens)
Date: Mon, 15 Sep 2003 11:11:53 -0400
Subject: [BiO BB] To whom it may concern
Message-ID: <00e701c37b9b$b281bc60$6c00a8c0@denis1>


I am looking for    Genbank accession number AB047240

Do not know where to go, Can somebody help me.

Thanks in advance


Denis.


Goneau & Lemens
Intellectual Property Library
PatentWorks Group
Main Floor Reception
170 St-Joseph Boul.
Hull, Quebec J8Y 3W9

Phone (819)772-2770
Fax (819)772-0061

email: patents at patentworks.com
web: www.patentworks.com

____________________________
This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law.  If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/c79ff207/attachment.html>

From lejm3 at hermes.cam.ac.uk  Mon Sep 15 12:23:30 2003
From: lejm3 at hermes.cam.ac.uk (Lucy McWilliam)
Date: Mon, 15 Sep 2003 17:23:30 +0100 (BST)
Subject: [BiO BB] Re: functional clustering among Affymetrix data
Message-ID: <Pine.SOL.4.58.0309151721180.19302@green.csi.cam.ac.uk>

deletto at unisa.it wrote:

> I am wondering whether it is available a simple tool online I could find
> useful for my purpose: I would like to cluseter a data set (collected
> from Affym. GeneChip U34A,B & C) regarding on the biological functions.
> In other words, I'd like to draw a statistical graph (like a
> nice cake) where I can put in all my data set, selecting all genes with
> a clear and known function (biological, physiological and for
> compartment localization).

You can use the Gene Ontology mining tool in the analysis section of
Affymetrix's website (requires free registration).

http://www.affymetrix.com/analysis/index.affx


--
Lucy McWilliam
FlyChip Microarray Facility
Department of Genetics
University of Cambridge
http://www.flychip.org.uk/


From idoerg at burnham.org  Mon Sep 15 17:45:34 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Mon, 15 Sep 2003 14:45:34 -0700
Subject: [BiO BB] To whom it may concern
In-Reply-To: <00e701c37b9b$b281bc60$6c00a8c0@denis1>
References: <00e701c37b9b$b281bc60$6c00a8c0@denis1>
Message-ID: <3F6632FE.3090503@burnham.org>

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AB047240

Goneau & Lemens wrote:
>  
>  
> I am looking for    Genbank accession number AB047240
> Do not know where to go, Can somebody help me.
>  
> Thanks in advance
>  
>  
> Denis.
>  
>  
>  
> Goneau & Lemens
> Intellectual Property Library
> PatentWorks Group
> Main Floor Reception
> 170 St-Joseph Boul.
> Hull, Quebec J8Y 3W9
>  
> Phone (819)772-2770
> Fax (819)772-0061
>  
> email: patents at patentworks.com <mailto:patents at patentworks.com>
> web: www.patentworks.com <http://www.patentworks.com>
>  
> ____________________________
> This message (including any attachments) contains confidential 
> information intended for a specific individual and purpose, and is 
> protected by law.  If you are not the intended recipient, you should 
> delete this message and are hereby notified that any disclosure, 
> copying, or distribution of this message, or the taking of any action 
> based on it, is strictly prohibited.

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From drjohn08318 at yahoo.com  Mon Sep 15 18:12:01 2003
From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.)
Date: Mon, 15 Sep 2003 15:12:01 -0700 (PDT)
Subject: [BiO BB] To whom it may concern
In-Reply-To: <3F6632FE.3090503@burnham.org>
Message-ID: <20030915221201.45455.qmail@web14405.mail.yahoo.com>

If you're looking for just the information related to this clone, go to ncbi.nlm.nih.gov.  Type in the accession number. If you want the actual gene itself, you can use a company such as Napro Genomics to isolate it for you.  Incidentally, they do terrific work....highly recommended!
 
JGH

Iddo Friedberg <idoerg at burnham.org> wrote:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AB047240

Goneau & Lemens wrote:
> 
> 
> I am looking for Genbank accession number AB047240
> Do not know where to go, Can somebody help me.
> 
> Thanks in advance
> 
> 
> Denis.
> 
> 
> 
> Goneau & Lemens
> Intellectual Property Library
> PatentWorks Group
> Main Floor Reception
> 170 St-Joseph Boul.
> Hull, Quebec J8Y 3W9
> 
> Phone (819)772-2770
> Fax (819)772-0061
> 
> email: patents at patentworks.com 
> web: www.patentworks.com 
> 
> ____________________________
> This message (including any attachments) contains confidential 
> information intended for a specific individual and purpose, and is 
> protected by law. If you are not the intended recipient, you should 
> delete this message and are hereby notified that any disclosure, 
> copying, or distribution of this message, or the taking of any action 
> based on it, is strictly prohibited.

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo

_______________________________________________
BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/a5a69d21/attachment.html>

From drjohn08318 at yahoo.com  Mon Sep 15 18:15:46 2003
From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.)
Date: Mon, 15 Sep 2003 15:15:46 -0700 (PDT)
Subject: [BiO BB] To whom it may concern
In-Reply-To: <00e701c37b9b$b281bc60$6c00a8c0@denis1>
Message-ID: <20030915221546.37223.qmail@web14410.mail.yahoo.com>

I just went to the ncbi website, and hit the Entrez tab.  After typing in the accession #, this is the journal article I found.
 
1: Sugimoto J, Matsuura N, Kinjo Y, Takasu N, Oda T, Jinno Y.Related Articles, Links 
Transcriptionally active HERV-K genes: identification, isolation, and chromosomal mapping.
Genomics. 2001 Mar 1;72(2):137-44. 
PMID: 11401426 [PubMed - indexed for MEDLINE]

 
JGH


Goneau & Lemens <patents at patentworks.com> wrote:
 
 
I am looking for    Genbank accession number AB047240

Do not know where to go, Can somebody help me.
 
Thanks in advance
 
 
Denis.
 
 
Goneau & Lemens
Intellectual Property Library
PatentWorks Group
Main Floor Reception
170 St-Joseph Boul.
Hull, Quebec J8Y 3W9
 
Phone (819)772-2770
Fax (819)772-0061
 
email: patents at patentworks.com
web: www.patentworks.com
 
____________________________
This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law.  If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited.


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/61463753/attachment.html>

From drjohn08318 at yahoo.com  Mon Sep 15 18:18:44 2003
From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.)
Date: Mon, 15 Sep 2003 15:18:44 -0700 (PDT)
Subject: [BiO BB] To whom it may concern
In-Reply-To: <20030915221201.45455.qmail@web14405.mail.yahoo.com>
Message-ID: <20030915221844.6315.qmail@web14407.mail.yahoo.com>

For those who might be interested in their gene cloning/gene editing service, here is the link.
 
www.Naprobio.com

"John G. Hoey, Ph.D." <drjohn08318 at yahoo.com> wrote:If you're looking for just the information related to this clone, go to ncbi.nlm.nih.gov.  Type in the accession number. If you want the actual gene itself, you can use a company such as Napro Genomics to isolate it for you.  Incidentally, they do terrific work....highly recommended!
 
JGH

Iddo Friedberg <idoerg at burnham.org> wrote:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=AB047240

Goneau & Lemens wrote:
> 
> 
> I am looking for Genbank accession number AB047240
> Do not know where to go, Can somebody help me.
> 
> Thanks in advance
> 
> 
> Denis.
> 
> 
> 
> Goneau & Lemens
> Intellectual Property Library
> PatentWorks Group
> Main Floor Reception
> 170 St-Joseph Boul.
> Hull, Quebec J8Y 3W9
> 
> Phone (819)772-2770
> Fax (819)772-0061
> 
> email: patents at patentworks.com 
> web: www.patentworks.com 
> 
> ____________________________
> This message (including any attachments) contains confidential 
> information intended for a specific individual and purpose, and is 
> protected by law. If you are not the intended recipient, you should 
> delete this message and are hereby notified that any disclosure, 
> copying, or distribution of this message, or the taking of any action 
> based on it, is strictly prohibited.

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo

_______________________________________________
BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030915/da31daaa/attachment.html>

From pooja at igc.gulbenkian.pt  Tue Sep 16 07:53:34 2003
From: pooja at igc.gulbenkian.pt (pooja at igc.gulbenkian.pt)
Date: Tue, 16 Sep 2003 12:53:34 +0100 (WEST)
Subject: [BiO BB] To whom it may concern
In-Reply-To: <20030915221546.37223.qmail@web14410.mail.yahoo.com>
References: <00e701c37b9b$b281bc60$6c00a8c0@denis1> 
     <20030915221546.37223.qmail@web14410.mail.yahoo.com>
Message-ID:      <2787.193.126.26.74.1063713214.squirrel@webmail.igc.gulbenkian.pt>

Please change the search option from PubMed to Nucleotide and try
searching again.
Hope this helps.
-Pooja
> I just went to the ncbi website, and hit the Entrez tab.  After typing in
> the accession #, this is the journal article I found.
>
> 1: Sugimoto J, Matsuura N, Kinjo Y, Takasu N, Oda T, Jinno Y.Related
> Articles, Links
> Transcriptionally active HERV-K genes: identification, isolation, and
> chromosomal mapping.
> Genomics. 2001 Mar 1;72(2):137-44.
> PMID: 11401426 [PubMed - indexed for MEDLINE]
>
>
>
> JGH
>
>
> Goneau & Lemens <patents at patentworks.com> wrote:
>
>
> I am looking for    Genbank accession number AB047240
>
> Do not know where to go, Can somebody help me.
>
> Thanks in advance
>
>
> Denis.
>
>
>
> Goneau & Lemens
> Intellectual Property Library
> PatentWorks Group
> Main Floor Reception
> 170 St-Joseph Boul.
> Hull, Quebec J8Y 3W9
>
> Phone (819)772-2770
> Fax (819)772-0061
>
> email: patents at patentworks.com
> web: www.patentworks.com
>
> ____________________________
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law.
> If you are not the intended recipient, you should delete this message and
> are hereby notified that any disclosure, copying, or distribution of this
> message, or the taking of any action based on it, is strictly prohibited.
>
>
> ---------------------------------
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software


From Ingrid.Marchal at univ-lille1.fr  Tue Sep 16 13:54:14 2003
From: Ingrid.Marchal at univ-lille1.fr (Ingrid.Marchal at univ-lille1.fr)
Date: Tue, 16 Sep 2003 19:54:14 +0200 (MET DST)
Subject: [BiO BB] Find a region of identity in a set of sequences
Message-ID: <200309161754.TAA30283@cri.univ-lille1.fr>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030916/a8d75d59/attachment.ksh>

From moyc at comcast.net  Wed Sep 17 12:18:04 2003
From: moyc at comcast.net (Chris Moy)
Date: 17 Sep 2003 12:18:04 -0400
Subject: [BiO BB] Re: Find a region of identity in a set of sequences
Message-ID: <1063815484.2214.2.camel@laptop>

You may want to try PipMaker (Percent Identity Plot). It is located at
http://bio.cse.psu.edu/pipmaker/. I have not used it but it may be what
you are looking for.


From drjohn08318 at yahoo.com  Wed Sep 17 14:37:32 2003
From: drjohn08318 at yahoo.com (John G. Hoey, Ph.D.)
Date: Wed, 17 Sep 2003 11:37:32 -0700 (PDT)
Subject: [BiO BB] Find a region of identity in a set of sequences
In-Reply-To: <200309161754.TAA30283@cri.univ-lille1.fr>
Message-ID: <20030917183732.69102.qmail@web14405.mail.yahoo.com>

Sequencher will do this for you.  Also, if you have access to Vector NTI, you can do it with this program.
 
drjohn

Ingrid.Marchal at univ-lille1.fr wrote:
Hi,

I am looking for a program that takes a set of sequences and outputs the region(s) of identity (not similarity), if there is any. I am pretty sure that such a program exists, but I can't remember which.

Anyone has an idea ? I would be very pleased if I don't have to recode it !
Thanks a lot
Ingrid 


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030917/588d0b61/attachment.html>

From cat_worth at hotmail.com  Wed Sep 17 16:48:00 2003
From: cat_worth at hotmail.com (Catherine Worth)
Date: Wed, 17 Sep 2003 20:48:00 +0000
Subject: [BiO BB] Re: Find a region of identity in a set of sequences
Message-ID: <BAY2-F25nRL3BOEAFYX0001808c@hotmail.com>

An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030917/98f9d778/attachment.html>

From hjm at tacgi.com  Mon Sep 22 16:01:54 2003
From: hjm at tacgi.com (Harry Mangalam)
Date: Mon, 22 Sep 2003 13:01:54 -0700
Subject: [BiO BB] HMM in Hardware (FPGA)
In-Reply-To: <20030914070007.4364.qmail@web13104.mail.yahoo.com>
References: <20030914070007.4364.qmail@web13104.mail.yahoo.com>
Message-ID: <3F6F5532.3040406@tacgi.com>

Re the previous thread about FPGAs, there's a nice intro to the process (and a 
bunch of good URLs) in the latest Linux Journal:

http://www.linuxjournal.com/article.php?sid=6857


Also a reference to Open Source tools and FPGAs here:
http://www.linuxdevices.com/articles/AT6411901280.html


-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (v&f) - hjm at tacgi.com
             <<plain text preferred>>


From Reichelt at gbf.de  Thu Sep 25 03:18:35 2003
From: Reichelt at gbf.de (Joachim Reichelt)
Date: Thu, 25 Sep 2003 09:18:35 +0200
Subject: [BiO BB] Blast from C or C++
Message-ID: <3F7296CB.1020806@gbf.de>

Dear all,

We want to submit jobs to QBlast at ncbi out of a C/C++ Program without 
installing perl.
We tried it in Qt but we cannot rely on this version. Too often the job 
got lost.

Joachim


From lhaifeng at dso.org.sg  Thu Sep 25 22:50:30 2003
From: lhaifeng at dso.org.sg (Liu Haifeng)
Date: Fri, 26 Sep 2003 10:50:30 +0800
Subject: [BiO BB] protein families
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAtLCII0a1PEWB9xUFtGpvPsKAAAAQAAAA5jvizldpQkiGkrqmd43IWQEAAAAA@biotechrecruiter.org>
Message-ID: <000e01c383d8$f32dde70$706712ac@GENETHON>

Hi,

I am trying to find a collection of protein sequences  which have been
correctly assigned to different families.   Anybody can suggest where to
obtain such kind of data?  PDB and Swiss-Prot seem to provide sequences but
without family information.

Would appreciate your help, thanks a lot!

Sincerely

Haifeng Liu


From idoerg at burnham.org  Fri Sep 26 02:51:18 2003
From: idoerg at burnham.org (Iddo Friedberg)
Date: Thu, 25 Sep 2003 23:51:18 -0700
Subject: [BiO BB] protein families
In-Reply-To: <000e01c383d8$f32dde70$706712ac@GENETHON>
Message-ID: <Pine.SGI.4.10.10309252343460.664310-100000@pines2.ljcrf.edu>

Depends on what you mean by "family"

CATH and SCOP assign proteins in a hierarchical manner to classes, folds,
superfamilies, families based on sequence based (SCOP) and structure based
(CATH) similarities. Both are manually or semi-manually curated. FSSP does
the same task automatically. All three are in a high rate of agreement
(75-80%) regarding their calssifications. Of course, you are limited to
the proteins in PDB only: the ones for which there are solved structures.

If you wish to consider more sequecnes, then other assignemetns are
possible, depending on your purpose. Pfam is a good example.

I suggest you get a good bioinformatics textbook, such as David Mount's,
or Baxevantis, and look through the different classification schemes, and
choose the one which suits your purpose best.

./I

--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo

On Fri, 26 Sep 2003, Liu Haifeng wrote:

> Hi,
> 
> I am trying to find a collection of protein sequences  which have been
> correctly assigned to different families.   Anybody can suggest where to
> obtain such kind of data?  PDB and Swiss-Prot seem to provide sequences but
> without family information.
> 
> Would appreciate your help, thanks a lot!
> 
> Sincerely
> 
> Haifeng Liu
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From pooja at igc.gulbenkian.pt  Fri Sep 26 05:57:17 2003
From: pooja at igc.gulbenkian.pt (pooja at igc.gulbenkian.pt)
Date: Fri, 26 Sep 2003 10:57:17 +0100 (WEST)
Subject: [BiO BB] protein families
In-Reply-To: <Pine.SGI.4.10.10309252343460.664310-100000@pines2.ljcrf.edu>
References: <000e01c383d8$f32dde70$706712ac@GENETHON> 
     <Pine.SGI.4.10.10309252343460.664310-100000@pines2.ljcrf.edu>
Message-ID:      <34487.194.117.22.137.1064570237.squirrel@webmail.igc.gulbenkian.pt>

Hi,
You can find all the proteins classified in superfamily, at any of the
following resources;
1. SuperFamily -  Classification is based on profile hidden Markove Model
that represents all proteins of known structure based on SCOP (Structural
Classification Of Proteins).

2. PIR SuperFamily (PIRSF) - The classification system is based on
evolutionary relationship of whole proteins.

If you are interested in identifying a possible family for an
uncharacterized protein ( as presently I am looking for) you may be
interested to try  InterPro.
Interpro is a joint effort of proteins sequence databases like SWISS-PROT
and TrEMBL, functional sites, motifs and domain databases, PRINTS, Pfam,
and ProDom, and resources for Protein families, like PIRSF and
SUPERFAMILY.    Features like functional sites, motifs and domains which
are exptracted from known protein sequences and known protein families are
applied to unknown protein sequence while making the prediction.

But the sequences I am trying with, InterPro always leaves me with the
same family status I have started from......  Unknown protein or
Hypothetical protein !

So I am also looking for a good protein family prediction tool, or Gene
Family Prediction tool.  I will be very greatful if someone from the list
can give me some insight

I will soon look into the suggested Bioinformatics readings. May be
helpful for me as well.

Thank you.

Regards,
-Pooja


>
> Depends on what you mean by "family"
>
> CATH and SCOP assign proteins in a hierarchical manner to classes, folds,
> superfamilies, families based on sequence based (SCOP) and structure based
> (CATH) similarities. Both are manually or semi-manually curated. FSSP does
> the same task automatically. All three are in a high rate of agreement
> (75-80%) regarding their calssifications. Of course, you are limited to
> the proteins in PDB only: the ones for which there are solved structures.
>
> If you wish to consider more sequecnes, then other assignemetns are
> possible, depending on your purpose. Pfam is a good example.
>
> I suggest you get a good bioinformatics textbook, such as David Mount's,
> or Baxevantis, and look through the different classification schemes, and
> choose the one which suits your purpose best.
>
> ./I
>
> --
> Iddo Friedberg, Ph.D.
> The Burnham Institute
> 10901 N. Torrey Pines Rd.
> La Jolla, CA 92037, USA
> Tel: +1 (858) 646 3100 x3516
> Fax: +1 (858) 646 3171
> http://ffas.ljcrf.edu/~iddo
>
> On Fri, 26 Sep 2003, Liu Haifeng wrote:
>
>> Hi,
>>
>> I am trying to find a collection of protein sequences  which have been
>> correctly assigned to different families.   Anybody can suggest where to
>> obtain such kind of data?  PDB and Swiss-Prot seem to provide sequences
>> but
>> without family information.
>>
>> Would appreciate your help, thanks a lot!
>>
>> Sincerely
>>
>> Haifeng Liu
>>
>> _______________________________________________
>> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>
>
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


From hz5 at njit.edu  Fri Sep 26 08:32:00 2003
From: hz5 at njit.edu (hz5 at njit.edu)
Date: Fri, 26 Sep 2003 08:32:00 -0400 (EDT)
Subject: [BiO BB] protein families
In-Reply-To: <000e01c383d8$f32dde70$706712ac@GENETHON>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAtLCII0a1PEWB9xUFtGpvPsKAAAAQAAAA5jvizldpQkiGkrqmd43IWQEAAAAA@biotechrecruiter.org> <000e01c383d8$f32dde70$706712ac@GENETHON>
Message-ID: <1064579520.3f7431c01d534@webmail.njit.edu>

Pfam, SMART, ProDom, you can find links here:
http://afs13.njit.edu/~hz5/biolink.html#pro

Quoting Liu Haifeng <lhaifeng at dso.org.sg>:

> Hi,
> 
> I am trying to find a collection of protein sequences  which have been
> correctly assigned to different families.   Anybody can suggest where
> to
> obtain such kind of data?  PDB and Swiss-Prot seem to provide sequences
> but
> without family information.
> 
> Would appreciate your help, thanks a lot!
> 
> Sincerely
> 
> Haifeng Liu
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


=========================================================
Haibo Zhang, PhD student
Computational Biology, NJIT & Rutgers University
Center for Applied Genomics, PHRI
http://afs13.njit.edu/~hz5


From gilbertd at bio.indiana.edu  Mon Sep 29 12:41:50 2003
From: gilbertd at bio.indiana.edu (Don Gilbert)
Date: Mon, 29 Sep 2003 11:41:50 -0500 (EST)
Subject: [BiO BB] Bioinformatics software reviews for Briefings in Bioinformatics
Message-ID: <200309291641.h8TGfo719895@cricket.bio.indiana.edu>

The journal Briefings in Bioinformatics is seeking reviews of
bioinformatics software.  If you have a favorite program,  or have
compared two or more programs and would like to write about their
benefits and drawbacks, please contact me.

The target audience includes a range of biologists and
bioinformaticians, from academia, goverment and industry. A review
should be about practical aspects of the software, and be helpful to
this range of readers. One caveat is that reviewers should not
have associations with the software reviewed, and approach it
impartially.
 
Also suggestions for software to be reviewed are welcome.

See more at
http://www.henrystewart.com/journals/bib/
http://marmot.bio.indiana.edu/bibsoft/ -- software reviews

Don Gilbert, software editor, BiB.
gilbertd at indiana.edu
bioinformatics, biology dept., indiana university, bloomington, in 47405 USA


From pabloj79 at yahoo.com.ar  Tue Sep 30 17:49:07 2003
From: pabloj79 at yahoo.com.ar (=?iso-8859-1?q?pablo=20gonzalez?=)
Date: Tue, 30 Sep 2003 18:49:07 -0300 (ART)
Subject: [BiO BB] searching information
In-Reply-To: <3F6B1913.2080408@bioinformatics.org>
Message-ID: <20030930214908.66913.qmail@web41905.mail.yahoo.com>

Mr. Jeff W. Bizzarro:
thank you for aswer my letter. I?m working with pseudomonas and  I would like to ask you about how can i do to reduce the number of hypothetical proteins ( if it?s possible) and obtein more information about known proteins, when i use BLAST program. 
 
Yours sincerely Pablo J. Gonzalez
Universidad Nacional de Rio Cuarto - C?rdoba - ARGENTINA
 
 
"J.W. Bizzaro" <jeff at bioinformatics.org> wrote:
Hi Pablo.

You're probably looking for sequence alignment tools such as BLAST or 
Clustal. You may want to post the question with more detail (what 
specifically do you want to do) to the bio_bulletin_board at bioinformatics.org

Cheers.
Jeff

Pablo J.Gonzalez wrote:
> I am argentinian and i am studing microbiology and i want 
> information about how to use the bioinformatics tools,(for 
> search homologies between microorganism sequences and 
> proteins) becausse the bad economic situation i can?t buy a 
> book for learn the keys for a good handling of the tools 
> that offer internet, so, i will be gratefull for 
> information about free sites for consult the information 
> what i want.
> Thank you for your time.
> PABLO J. GONZALEZ


-- 
J.W. Bizzaro jeff at bioinformatics.org
President, Bioinformatics.Org http://bioinformatics.org/~jeff
"As we enjoy great advantages from the inventions of others, we
should be glad of an opportunity to serve others by any invention
of ours; and this we should do freely and generously."
-- Benjamin Franklin
--


---------------------------------
Internet GRATIS es Yahoo! Conexi?n.
Usuario: yahoo; contrase?a: yahoo
Desde Buenos Aires: 4004-1010
M?s ciudades: clic aqu?.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20030930/1def99ab/attachment.html>