From jiye at eden.rutgers.edu  Mon Mar  1 16:26:43 2004
From: jiye at eden.rutgers.edu (jiye at eden.rutgers.edu)
Date: Mon, 1 Mar 2004 16:26:43 -0500 (EST)
Subject: [BiO BB] confusion about the psi-blast profile
Message-ID: <3633168464jiye@eden.rutgers.edu>

hi, 

I'm a graduate student at Rutgers Univ. NJ, USA. I'm seeking 
some kind help on a question I have recently regarding to the 
profile from psi-blast. 

I use the standlone blast program and run the blastpgp for 3 
iterations and then the makemat to get the position specific
scoring matrix. The commands I use are like

blastpgp -d protein/nr -i test.seq -o test.rst -j 3 -C test.chk
makemat -P test -d protein/nr

And I put the following sequence in the test.seq, the nr protein
database is downloaded from the ncbi web site. 

> S0_Sinorhizobium meliloti aa_pep18
mggfidiqapleqegtkavvrnwlrkigdpvksgdplveletdkvtqevs
apadgvlaeilmrngddatpgavlgrigseaagaghaphyspavrhaaee
ygldpatvtgtgrggrvtradmdraftarqegpasvaaeagdrgaapksr
riphsgmraaiaehmlnsvttaphvtavfeadfsavmrhrdehgkrlaad
gtklsytayvvsacvaamravpevnsrwhedaletfddinigvgislgdk
glvvpvihraqdlslaeiaarlqdlttrarsnalsradvtggtftisnhg
asgsllaapiiinqpqsailgvgkldkrvivrevdgadtiqirpmayvsl
tidhraldghqtnawlthfvrvietwpk

The result score I read back from the test.mtx is like: 

-32768  -291  -32768  -341  -521  -407  -185  -480  -359  -64  -338  
80  980  -425  -457  -238  -338  -351  -261  -113  -343  -100  -296  -
32768  -32768  -397  

-32768  153  -32768  -361  -298  -178  -500  423  -364  -495  -291  -
502  -408  -216  -352  -288  -360  477  -70  -412  -504  -100  -448  -
32768  -32768  -397  
.....
.....

Since I have no experience with psi-blast before, I'm not very sure
it's right or not. But I feel that something is wrong. The minimum
number -32678 is repeatedly appears at all position of column 1, 3, 
19, and 20. And other numbers also look either too small or too large. 
However, I couldn't find where is the problem. I also tried the database
of swissprot, the result is similar. It's really appreciated if someone
can tell me whether there is anything wrong with such kind of result. 

Best regards, 

-Jiankuan


From B.A.T.Svensson at lumc.nl  Mon Mar  1 20:13:28 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: Tue, 2 Mar 2004 02:13:28 +0100 
Subject: [BiO BB] RE: Normalization WAS: database design question
Message-ID: <D291F33C586C8E48B95C26F8C805513A01A3D970@mail5.lumc.nl>

> Normalization is the process of designing a good data model.

Please, explain this statement.


From jrambla at hotmail.com  Tue Mar  2 04:28:30 2004
From: jrambla at hotmail.com (JRambla)
Date: Tue, 2 Mar 2004 10:28:30 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <D291F33C586C8E48B95C26F8C805513A01A3D970@mail5.lumc.nl>
Message-ID: <BAY2-DAV45DjutrCXJD00014c70@hotmail.com>

Hi all,

Normalization is a concept that comes from relational database theory,
created by the recently deceased Dr.Edgar F. Codd, a mathematician at IBM.
That theory is the base of all SQL-whatever world.

Normalization is a group of rules (5, if my memory is right) to apply to
table design in order, basically, to eliminate redundancy on data. That
redundancy will arise in the form of embarrassing, and sometimes hard to
find, consistency problems on data stored in the database.

As was suggested, following those rules is a good starting point to design a
database.

You can find a good introduction in

http://www.sequoia.be/consult/method/english.htm

Hope this helps,

Jordi Rambla
Barcelona (Spain)

-----Mensaje original-----
De: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson,
B.A.T. (HKG)
Enviado el: martes, 02 de marzo de 2004 2:13
Para: 'bio_bulletin_board at bioinformatics.org '
Asunto: [BiO BB] RE: Normalization WAS: database design question

> Normalization is the process of designing a good data model.

Please, explain this statement.
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From B.A.T.Svensson at lumc.nl  Tue Mar  2 05:46:01 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: Tue, 2 Mar 2004 11:46:01 +0100 
Subject: [BiO BB] RE: Normalization WAS: database design question
Message-ID: <D291F33C586C8E48B95C26F8C805513A01A3D977@mail5.lumc.nl>

Thank u for you suggested readings, but I did seek an elaboration on
why (high/er?) normalization should be regarded as a good design?

-----Original Message-----
From: JRambla
To: bio_bulletin_board at bioinformatics.org
Sent: 2004-03-02 10:28
Subject: RE: [BiO BB] RE: Normalization WAS: database design question

Hi all,

Normalization is a concept that comes from relational database theory,
created by the recently deceased Dr.Edgar F. Codd, a mathematician at
IBM. That theory is the base of all SQL-whatever world.

Normalization is a group of rules (5, if my memory is right) to apply to
table design in order, basically, to eliminate redundancy on data. That
redundancy will arise in the form of embarrassing, and sometimes hard to
find, consistency problems on data stored in the database.

As was suggested, following those rules is a good starting point to
design a database.

You can find a good introduction in

http://www.sequoia.be/consult/method/english.htm

Hope this helps,

Jordi Rambla
Barcelona (Spain)

-----Mensaje original-----
De: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de
Svensson,
B.A.T. (HKG)
Enviado el: martes, 02 de marzo de 2004 2:13
Para: 'bio_bulletin_board at bioinformatics.org '
Asunto: [BiO BB] RE: Normalization WAS: database design question

> Normalization is the process of designing a good data model.

Please, explain this statement.
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From jrambla at hotmail.com  Tue Mar  2 08:21:01 2004
From: jrambla at hotmail.com (JRambla)
Date: Tue, 2 Mar 2004 14:21:01 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <D291F33C586C8E48B95C26F8C805513A01A3D977@mail5.lumc.nl>
Message-ID: <BAY2-DAV46mfYzH5xGM00014ffb@hotmail.com>

Hi,

According to my experience (near 20 years now) in designing/consulting about
enterprise databases:

- Normalization is good/desirable in all online systems (like ones where
several users can be reading and updating data simultaneously), usually
called OLTP systems. Exceptions are not significant at all.
- De-normalization is good (indeed mandatory) for datawarehouse & data
mining systems where grouping, sorting and summarized data is the real
interest. This is due to performance reasons associated to intensive
calculations. Also, we will apply de-normalization in history files or logs,
where you actually need a snapshot of relationships and data in the moment
of the entry.

The kind of database I remember starting the thread is a sequence database.
I will classify it in the first group, although I have little experience in
that field nowadays.

As I mentioned in the previous e-mail, normalization (usually only the
higher levels count as normalized) means not allowing repetitive data to
live in the system. I.e. not copying customer address data in every invoice
in the Invoices table.

That way any change to the data is done only in the "master" record, and you
don't need to keep track of all places where those data can be copied
before. Keeping track of data copies is, usually, a tricky and error prone
affair. So, you keep out of it as much as you can.

Opposing to that, using record keys (primary and foreign keys) is good,
because you define relationships at database design time, and database
engine helps enforcing those relationships when entering data.

Further details or more concrete questions will allow being more specific.

Hope this clarifies a bit more.

Jordi Rambla
Barcelona (Spain)


-----Mensaje original-----
De: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson,
B.A.T. (HKG)
Enviado el: martes, 02 de marzo de 2004 11:46
Para: 'bio_bulletin_board at bioinformatics.org'
Asunto: RE: [BiO BB] RE: Normalization WAS: database design question

Thank u for you suggested readings, but I did seek an elaboration on
why (high/er?) normalization should be regarded as a good design?

-----Original Message-----
From: JRambla
To: bio_bulletin_board at bioinformatics.org
Sent: 2004-03-02 10:28
Subject: RE: [BiO BB] RE: Normalization WAS: database design question

Hi all,

Normalization is a concept that comes from relational database theory,
created by the recently deceased Dr.Edgar F. Codd, a mathematician at
IBM. That theory is the base of all SQL-whatever world.

Normalization is a group of rules (5, if my memory is right) to apply to
table design in order, basically, to eliminate redundancy on data. That
redundancy will arise in the form of embarrassing, and sometimes hard to
find, consistency problems on data stored in the database.

As was suggested, following those rules is a good starting point to
design a database.

You can find a good introduction in

http://www.sequoia.be/consult/method/english.htm

Hope this helps,

Jordi Rambla
Barcelona (Spain)

-----Mensaje original-----
De: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de
Svensson,
B.A.T. (HKG)
Enviado el: martes, 02 de marzo de 2004 2:13
Para: 'bio_bulletin_board at bioinformatics.org '
Asunto: [BiO BB] RE: Normalization WAS: database design question

> Normalization is the process of designing a good data model.

Please, explain this statement.
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From B.A.T.Svensson at lumc.nl  Tue Mar  2 08:57:02 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: 02 Mar 2004 14:57:02 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <BAY2-DAV46mfYzH5xGM00014ffb@hotmail.com>
References: <BAY2-DAV46mfYzH5xGM00014ffb@hotmail.com>
Message-ID: <1078235822.26146.24.camel@ander>

The only problem with (de)normalization theory is that
it is not that very useful for everyday practical purposes.


On Tue, 2004-03-02 at 14:21, JRambla wrote:
> Hi,
> 
> According to my experience (near 20 years now) in designing/consulting about
> enterprise databases:
> 
> - Normalization is good/desirable in all online systems (like ones where
> several users can be reading and updating data simultaneously), usually
> called OLTP systems. Exceptions are not significant at all.
> - De-normalization is good (indeed mandatory) for datawarehouse & data
> mining systems where grouping, sorting and summarized data is the real
> interest. This is due to performance reasons associated to intensive
> calculations. Also, we will apply de-normalization in history files or logs,
> where you actually need a snapshot of relationships and data in the moment
> of the entry.
> 
> The kind of database I remember starting the thread is a sequence database.
> I will classify it in the first group, although I have little experience in
> that field nowadays.
> 
> As I mentioned in the previous e-mail, normalization (usually only the
> higher levels count as normalized) means not allowing repetitive data to
> live in the system. I.e. not copying customer address data in every invoice
> in the Invoices table.
> 
> That way any change to the data is done only in the "master" record, and you
> don't need to keep track of all places where those data can be copied
> before. Keeping track of data copies is, usually, a tricky and error prone
> affair. So, you keep out of it as much as you can.
> 
> Opposing to that, using record keys (primary and foreign keys) is good,
> because you define relationships at database design time, and database
> engine helps enforcing those relationships when entering data.
> 
> Further details or more concrete questions will allow being more specific.
> 
> Hope this clarifies a bit more.
> 
> Jordi Rambla
> Barcelona (Spain)
> 
> 
> 
> -----Mensaje original-----
> De: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson,
> B.A.T. (HKG)
> Enviado el: martes, 02 de marzo de 2004 11:46
> Para: 'bio_bulletin_board at bioinformatics.org'
> Asunto: RE: [BiO BB] RE: Normalization WAS: database design question
> 
> Thank u for you suggested readings, but I did seek an elaboration on
> why (high/er?) normalization should be regarded as a good design?
> 
> -----Original Message-----
> From: JRambla
> To: bio_bulletin_board at bioinformatics.org
> Sent: 2004-03-02 10:28
> Subject: RE: [BiO BB] RE: Normalization WAS: database design question
> 
> Hi all,
> 
> Normalization is a concept that comes from relational database theory,
> created by the recently deceased Dr.Edgar F. Codd, a mathematician at
> IBM. That theory is the base of all SQL-whatever world.
> 
> Normalization is a group of rules (5, if my memory is right) to apply to
> table design in order, basically, to eliminate redundancy on data. That
> redundancy will arise in the form of embarrassing, and sometimes hard to
> find, consistency problems on data stored in the database.
> 
> As was suggested, following those rules is a good starting point to
> design a database.
> 
> You can find a good introduction in
> 
> http://www.sequoia.be/consult/method/english.htm
> 
> Hope this helps,
> 
> Jordi Rambla
> Barcelona (Spain)
> 
> -----Mensaje original-----
> De: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de
> Svensson,
> B.A.T. (HKG)
> Enviado el: martes, 02 de marzo de 2004 2:13
> Para: 'bio_bulletin_board at bioinformatics.org '
> Asunto: [BiO BB] RE: Normalization WAS: database design question
> 
> > Normalization is the process of designing a good data model.
> 
> Please, explain this statement.
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From jrambla at hotmail.com  Tue Mar  2 09:29:40 2004
From: jrambla at hotmail.com (JRambla)
Date: Tue, 2 Mar 2004 15:29:40 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <1078235822.26146.24.camel@ander>
Message-ID: <BAY2-DAV50SSf7roC0U00014fd1@hotmail.com>

Although I'm not sure I understand your comment, normalization must be
eventually an attitude, an inner practice, essential for a good design
having a long lifetime. Is like a good laboratory practice, you can obtain
results without it, but it's a lot more probably to have good results having
it by default.

They can look a bit abstract, but when understood they're very practical,
close to a methodology.
However, you're right that they're not a checklist, step by step guide.

Jordi Rambla
Barcelona (Spain)

-----Mensaje original-----
De: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson,
B.A.T. (HKG)
Enviado el: martes, 02 de marzo de 2004 14:57
Para: Bio Bulletin
Asunto: RE: [BiO BB] RE: Normalization WAS: database design question

The only problem with (de)normalization theory is that
it is not that very useful for everyday practical purposes.


On Tue, 2004-03-02 at 14:21, JRambla wrote:
> Hi,
> 
> According to my experience (near 20 years now) in designing/consulting
about
> enterprise databases:
> 
> - Normalization is good/desirable in all online systems (like ones where
> several users can be reading and updating data simultaneously), usually
> called OLTP systems. Exceptions are not significant at all.
> - De-normalization is good (indeed mandatory) for datawarehouse & data
> mining systems where grouping, sorting and summarized data is the real
> interest. This is due to performance reasons associated to intensive
> calculations. Also, we will apply de-normalization in history files or
logs,
> where you actually need a snapshot of relationships and data in the moment
> of the entry.
> 
> The kind of database I remember starting the thread is a sequence
database.
> I will classify it in the first group, although I have little experience
in
> that field nowadays.
> 
> As I mentioned in the previous e-mail, normalization (usually only the
> higher levels count as normalized) means not allowing repetitive data to
> live in the system. I.e. not copying customer address data in every
invoice
> in the Invoices table.
> 
> That way any change to the data is done only in the "master" record, and
you
> don't need to keep track of all places where those data can be copied
> before. Keeping track of data copies is, usually, a tricky and error prone
> affair. So, you keep out of it as much as you can.
> 
> Opposing to that, using record keys (primary and foreign keys) is good,
> because you define relationships at database design time, and database
> engine helps enforcing those relationships when entering data.
> 
> Further details or more concrete questions will allow being more specific.
> 
> Hope this clarifies a bit more.
> 
> Jordi Rambla
> Barcelona (Spain)
> 
> 
> 
> -----Mensaje original-----
> De: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de
Svensson,
> B.A.T. (HKG)
> Enviado el: martes, 02 de marzo de 2004 11:46
> Para: 'bio_bulletin_board at bioinformatics.org'
> Asunto: RE: [BiO BB] RE: Normalization WAS: database design question
> 
> Thank u for you suggested readings, but I did seek an elaboration on
> why (high/er?) normalization should be regarded as a good design?
> 
> -----Original Message-----
> From: JRambla
> To: bio_bulletin_board at bioinformatics.org
> Sent: 2004-03-02 10:28
> Subject: RE: [BiO BB] RE: Normalization WAS: database design question
> 
> Hi all,
> 
> Normalization is a concept that comes from relational database theory,
> created by the recently deceased Dr.Edgar F. Codd, a mathematician at
> IBM. That theory is the base of all SQL-whatever world.
> 
> Normalization is a group of rules (5, if my memory is right) to apply to
> table design in order, basically, to eliminate redundancy on data. That
> redundancy will arise in the form of embarrassing, and sometimes hard to
> find, consistency problems on data stored in the database.
> 
> As was suggested, following those rules is a good starting point to
> design a database.
> 
> You can find a good introduction in
> 
> http://www.sequoia.be/consult/method/english.htm
> 
> Hope this helps,
> 
> Jordi Rambla
> Barcelona (Spain)
> 
> -----Mensaje original-----
> De: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de
> Svensson,
> B.A.T. (HKG)
> Enviado el: martes, 02 de marzo de 2004 2:13
> Para: 'bio_bulletin_board at bioinformatics.org '
> Asunto: [BiO BB] RE: Normalization WAS: database design question
> 
> > Normalization is the process of designing a good data model.
> 
> Please, explain this statement.
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From B.A.T.Svensson at lumc.nl  Tue Mar  2 10:29:23 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: 02 Mar 2004 16:29:23 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <BAY2-DAV50SSf7roC0U00014fd1@hotmail.com>
References: <BAY2-DAV50SSf7roC0U00014fd1@hotmail.com>
Message-ID: <1078241362.26146.117.camel@ander>

2nd order tensors calcus in fluid mechanics is also "easy"
once you get used with the concept. However the difference
we talking about here is that tensor calcus is simplifying
the task of understanding the problem, while normalization
theory does not, rather the opposite.

Any student of computer science who has been forced to do
formal deduction with horn clauses know that this is the
most rigorous and silly way to conclude the most trivial
and obvious facts, and as a matter of fact normalization
theory is based on this kind of logical reasoning.


P.S.
Mentioning Barcelona, I just returned from there yesterday
after a visit of four days. It a very fine city, and I were
blessed with a very nice Monday - though the snow last
Sunday was less appreciated. ;)
D.S.


On Tue, 2004-03-02 at 15:29, JRambla wrote:
> Although I'm not sure I understand your comment, normalization must be
> eventually an attitude, an inner practice, essential for a good design
> having a long lifetime. Is like a good laboratory practice, you can obtain
> results without it, but it's a lot more probably to have good results having
> it by default.
> 
> They can look a bit abstract, but when understood they're very practical,
> close to a methodology.
> However, you're right that they're not a checklist, step by step guide.
> 
> Jordi Rambla
> Barcelona (Spain)
> 
> -----Mensaje original-----
> De: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson,
> B.A.T. (HKG)
> Enviado el: martes, 02 de marzo de 2004 14:57
> Para: Bio Bulletin
> Asunto: RE: [BiO BB] RE: Normalization WAS: database design question
> 
> The only problem with (de)normalization theory is that
> it is not that very useful for everyday practical purposes.
> 
> 
> On Tue, 2004-03-02 at 14:21, JRambla wrote:
> > Hi,
> > 
> > According to my experience (near 20 years now) in designing/consulting
> about
> > enterprise databases:
> > 
> > - Normalization is good/desirable in all online systems (like ones where
> > several users can be reading and updating data simultaneously), usually
> > called OLTP systems. Exceptions are not significant at all.
> > - De-normalization is good (indeed mandatory) for datawarehouse & data
> > mining systems where grouping, sorting and summarized data is the real
> > interest. This is due to performance reasons associated to intensive
> > calculations. Also, we will apply de-normalization in history files or
> logs,
> > where you actually need a snapshot of relationships and data in the moment
> > of the entry.
> > 
> > The kind of database I remember starting the thread is a sequence
> database.
> > I will classify it in the first group, although I have little experience
> in
> > that field nowadays.
> > 
> > As I mentioned in the previous e-mail, normalization (usually only the
> > higher levels count as normalized) means not allowing repetitive data to
> > live in the system. I.e. not copying customer address data in every
> invoice
> > in the Invoices table.
> > 
> > That way any change to the data is done only in the "master" record, and
> you
> > don't need to keep track of all places where those data can be copied
> > before. Keeping track of data copies is, usually, a tricky and error prone
> > affair. So, you keep out of it as much as you can.
> > 
> > Opposing to that, using record keys (primary and foreign keys) is good,
> > because you define relationships at database design time, and database
> > engine helps enforcing those relationships when entering data.
> > 
> > Further details or more concrete questions will allow being more specific.
> > 
> > Hope this clarifies a bit more.
> > 
> > Jordi Rambla
> > Barcelona (Spain)
> > 
> > 
> > 
> > -----Mensaje original-----
> > De: bio_bulletin_board-admin at bioinformatics.org
> > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de
> Svensson,
> > B.A.T. (HKG)
> > Enviado el: martes, 02 de marzo de 2004 11:46
> > Para: 'bio_bulletin_board at bioinformatics.org'
> > Asunto: RE: [BiO BB] RE: Normalization WAS: database design question
> > 
> > Thank u for you suggested readings, but I did seek an elaboration on
> > why (high/er?) normalization should be regarded as a good design?
> > 
> > -----Original Message-----
> > From: JRambla
> > To: bio_bulletin_board at bioinformatics.org
> > Sent: 2004-03-02 10:28
> > Subject: RE: [BiO BB] RE: Normalization WAS: database design question
> > 
> > Hi all,
> > 
> > Normalization is a concept that comes from relational database theory,
> > created by the recently deceased Dr.Edgar F. Codd, a mathematician at
> > IBM. That theory is the base of all SQL-whatever world.
> > 
> > Normalization is a group of rules (5, if my memory is right) to apply to
> > table design in order, basically, to eliminate redundancy on data. That
> > redundancy will arise in the form of embarrassing, and sometimes hard to
> > find, consistency problems on data stored in the database.
> > 
> > As was suggested, following those rules is a good starting point to
> > design a database.
> > 
> > You can find a good introduction in
> > 
> > http://www.sequoia.be/consult/method/english.htm
> > 
> > Hope this helps,
> > 
> > Jordi Rambla
> > Barcelona (Spain)
> > 
> > -----Mensaje original-----
> > De: bio_bulletin_board-admin at bioinformatics.org
> > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de
> > Svensson,
> > B.A.T. (HKG)
> > Enviado el: martes, 02 de marzo de 2004 2:13
> > Para: 'bio_bulletin_board at bioinformatics.org '
> > Asunto: [BiO BB] RE: Normalization WAS: database design question
> > 
> > > Normalization is the process of designing a good data model.
> > 
> > Please, explain this statement.
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From mgruenb at gmx.net  Tue Mar  2 10:50:41 2004
From: mgruenb at gmx.net (Michael Gruenberger)
Date: Tue, 02 Mar 2004 15:50:41 +0000
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <1078241362.26146.117.camel@ander>
References: <BAY2-DAV50SSf7roC0U00014fd1@hotmail.com>
	 <1078241362.26146.117.camel@ander>
Message-ID: <1078242641.3392.67.camel@vogel>

I don't quite understand your reasoning: You claimed that normalization
has no practical use and that is because it's hard to understand?!

Normalization is not meant to simplify database design, but rather to
formalise it and to give guidelines as to what works and what doesn't.

I completely agree with Jordi, normalization just becomes part of your
'inner feeling' of what works and what doesn't after you've done it a
couple of times. Of course it isn't applicaple in all cases, but it
surely is better than no guidelines at all, especially for people who
have never designed a database before and who are looking for some
guidelines to point them in the right direction. And there are ways to
explain normalization in simple terms and with good examples.

So if you are saying normalization hasn't got any practical use, what
would you suggest to a newcomer to database design?

On Tue, 2004-03-02 at 15:29, Svensson, B.A.T. (HKG) wrote:
> 2nd order tensors calcus in fluid mechanics is also "easy"
> once you get used with the concept. However the difference
> we talking about here is that tensor calcus is simplifying
> the task of understanding the problem, while normalization
> theory does not, rather the opposite.
> 
> Any student of computer science who has been forced to do
> formal deduction with horn clauses know that this is the
> most rigorous and silly way to conclude the most trivial
> and obvious facts, and as a matter of fact normalization
> theory is based on this kind of logical reasoning.
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040302/ed3913d1/attachment.sig>

From B.A.T.Svensson at lumc.nl  Tue Mar  2 12:00:45 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: 02 Mar 2004 18:00:45 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <1078242641.3392.67.camel@vogel>
References: <1078242641.3392.67.camel@vogel>
Message-ID: <1078246845.26152.259.camel@ander>

On Tue, 2004-03-02 at 16:50, Michael Gruenberger wrote:
> I don't quite understand your reasoning: You claimed that normalization
> has no practical use and that is because it's hard to understand?!

That's not what I tried to say; Applying the theory makes things
more complicated then they are. In short: it's a waste of time
in everyday work.

Also my apologizes that I forgot to mentioning/stress my point:

I don't suggest to through out the baby with the water. By
no means /learning/ the theory is no waste of time, however
trying to /apply/ the theory in everyday work is a) a waste
of time and b) a misunderstanding of the concept to be used.

It might be a bit to though for a beginner to first start
out with these subjects, there are other things that first
better be learned (i.e. principles op pf programming, etc)
then the these issues will, in time, be solved by them self.

However the original question was about why normalization
equals a good design. As already stressed by Jordi, this
can't be held as a simple truth. 

Except for the answers already provided: high normalization
optimize storage efficiency and insert/update speed, while
low normalization address optimization of reading speed,
but makes inserts/update a more complex task.

Related with these problems is that any extreme normal from
will create difficulties on its own. To low and updates becomes
a night mare for a database programmer, to high and the very
most simplistic query approaches monstrous constructions.

I most case one needs to find a balance between these two
factors, because pulling toward one end will only create
problem in the other end. I think BCNF (Boey-Codd Normal
Form) tried to address these issues, but can't tell for sure.

However, this balance is found with experience (system stress
testing) and commons sense judgment - no data decomposition
course will teach you how top do this for a particular real system.

One way to work around this is to enforce a list of constraints
protected with a set of additional triggers in the database,
but even this has an tendency to make otherwise a simple data
model become a dinosaur in its implementation. And in addition
nothing comes for free a performance hit will be put on top
on this - sometimes it might be worth it, other times not.

There is no way to establishing what is meant with a good design.
This is depending on the project requirements, and in the end it
up to the designer and users of the system to judge on it.

In any case normalization theory wont tell you how to deal with it,
and at the end of the day it wont even tell you if you made any good
decisions what so ever.

> Normalization is not meant to simplify database design, but rather to
> formalise it and to give guidelines as to what works and what doesn't.


> I completely agree with Jordi, normalization just becomes part of your
> 'inner feeling' of what works and what doesn't after you've done it a
> couple of times. Of course it isn't applicaple in all cases, but it
> surely is better than no guidelines at all, especially for people who
> have never designed a database before and who are looking for some
> guidelines to point them in the right direction. And there are ways to
> explain normalization in simple terms and with good examples.

I do not object this.

> So if you are saying normalization hasn't got any practical use,
>  what would you suggest to a newcomer to database design?

To learn the theory, and then forget it.

Data decomposition can be taught from the the books, however
there is no book that tells you how to decompose a real problem
in general, since decomposition can be done in several ways fore
the very same data set, i.e. how to decompose is dependent on the
question you want to ask to the data set.

For a new comer, I do not think learning abstract things like
relational algebra and normalization theory is the best end to
start at. Programming and design is difficult as it already is,
mainly because it does not reflect the way we humans normally
solve a problem (that's why for instance programming is tricky
to learn in the first place).

But in any case; design&programming is a skill learned, and it
takes many years to get them. Nobody would suggest that anybody
can be a surgeon just because everybody can walk into the closets
store and by a knife - but it seams like most people believes
anybody can do database development just because the platforms
is available. That's is simply not the case.

So what do I suggest? Well, we all seams to tend to forget all
the difficulties and troubles we had to go through when we once
had to learn to learn to walk.

One may get oneself a good education, either by study your self,
or by formal education, and then add on some few years of practical
experience, or just buy the knowledge from somebody who knows how
to do it. That's my advice, because it does not exist a simple
way to learn these things in "24 hours" or "7 days" courses,
that's just a illusion.


> On Tue, 2004-03-02 at 15:29, Svensson, B.A.T. (HKG) wrote:
> > 2nd order tensors calcus in fluid mechanics is also "easy"
> > once you get used with the concept. However the difference
> > we talking about here is that tensor calcus is simplifying
> > the task of understanding the problem, while normalization
> > theory does not, rather the opposite.
> > 
> > Any student of computer science who has been forced to do
> > formal deduction with horn clauses know that this is the
> > most rigorous and silly way to conclude the most trivial
> > and obvious facts, and as a matter of fact normalization
> > theory is based on this kind of logical reasoning.
> > 
> 


From mgruenb at gmx.net  Tue Mar  2 12:50:21 2004
From: mgruenb at gmx.net (Michael Gruenberger)
Date: Tue, 02 Mar 2004 17:50:21 +0000
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <1078246845.26152.259.camel@ander>
References: <1078242641.3392.67.camel@vogel>
	 <1078246845.26152.259.camel@ander>
Message-ID: <1078249821.3394.105.camel@vogel>

Thanks for the clarifications! I pretty much agree with everything you
said. Of course you need a lot of experience to be able to design good
data models and once you have the experience you probably don't have to
go through the formal process of normalizing a database, because you
just instinctively know what works best.

But still in order to get the experience you have to start somewhere and
you have to give a beginner an entry point and showing them
normalization with some good examples has worked for me in most cases...

This thread started by someone asking for some guidance on how to design
his database and I still think that pointing that guy to normalization
was a good idea. It would be interesting to get some feedback though and
to know whether reading about normalization helped that guy!

Cheers,

Michael.
On Tue, 2004-03-02 at 17:00, Svensson, B.A.T. (HKG) wrote:

> That's not what I tried to say; Applying the theory makes things
> more complicated then they are. In short: it's a waste of time
> in everyday work.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040302/2f2020f9/attachment.sig>

From jrambla at hotmail.com  Tue Mar  2 13:36:28 2004
From: jrambla at hotmail.com (JRambla)
Date: Tue, 2 Mar 2004 19:36:28 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <1078249821.3394.105.camel@vogel>
Message-ID: <BAY2-DAV58ceam2e0cd0001536f@hotmail.com>

Svensson,

For me, learning normalization is like learning to drive.
When you're experienced, you can make a comfortable use of driving rules:
semaphore timing, how to measure incoming cars speed, safe car to car
distance according to speed, when your car will fit in a parking space and
so on. 

However, if you're a beginner, you better stay with strict rules and driving
professor rules-of-thumb. 

I will advise any beginner to stay with normalization except if the problem
domain mandates otherwise. From my experience, when the designer wants to be
smarter than that, they fail, 99% of times. This means having to redesign
the database after lots of data coherence pains. Most of times, you will
need to edit some data, also.

Some of your issues with normalization (too much joins, etc.) can be
addressed by database objects like views. That's a principle for me: if you
stay normalized, database engine can probably help you with your troubles,
if you stay apart, you're usually on your own. You're a "desperado".

When not talking to beginners, and they can't hear us, we can use another
principle: "queries drives database design". You must design your database
(or a version of it) optimizing against the queries you will perform the
most usually or the most critical.

In my humble opinion,

Jordi Rambla
Barcelona (Spain)

-----Mensaje original-----
De: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Michael
Gruenberger
Enviado el: martes, 02 de marzo de 2004 18:50
Para: bio_bulletin_board at bioinformatics.org
Asunto: RE: [BiO BB] RE: Normalization WAS: database design question

Thanks for the clarifications! I pretty much agree with everything you
said. Of course you need a lot of experience to be able to design good
data models and once you have the experience you probably don't have to
go through the formal process of normalizing a database, because you
just instinctively know what works best.

But still in order to get the experience you have to start somewhere and
you have to give a beginner an entry point and showing them
normalization with some good examples has worked for me in most cases...

This thread started by someone asking for some guidance on how to design
his database and I still think that pointing that guy to normalization
was a good idea. It would be interesting to get some feedback though and
to know whether reading about normalization helped that guy!

Cheers,

Michael.
On Tue, 2004-03-02 at 17:00, Svensson, B.A.T. (HKG) wrote:

> That's not what I tried to say; Applying the theory makes things
> more complicated then they are. In short: it's a waste of time
> in everyday work.


From B.A.T.Svensson at lumc.nl  Wed Mar  3 07:23:07 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: 03 Mar 2004 13:23:07 +0100
Subject: [BiO BB] RE: Normalization WAS: database design question
In-Reply-To: <BAY2-DAV58ceam2e0cd0001536f@hotmail.com>
References: <BAY2-DAV58ceam2e0cd0001536f@hotmail.com>
Message-ID: <1078316587.26146.383.camel@ander>

Dear Jordi,

It seams to me that your "beginner" is a quite advance fellow(?).
I still argue that one need to crawl before one can start to walk.

Even in advanced text book on database system, normalization
(i.e. a database design issues) are not described until the
basic of database system has been covered. Like conceptual
data models, schema's, keys attributes, relationships, roles,
indexes, domains, anomalies, data definition/manipulation
languages, etc, etc, etc...

In my opinion starting with normalization without learning
the basic concept of the above will result in nothing, since
you wont know what it is you want to decompose anyway.

But on the other hand learning the above will implicit teach
you how to do normalization - namely with the common sense
approach. Normalization theory is just a formal way to (read:
a way to make a machines be able to) do what we human already
know by heart and guts.

But even before learning these concept you still need other
foundation to stand on. In my opinion it is simply not enough
to buy an a desktop DBMS and think everything will solve it
self once one know normalization theory - it a little bit
tricker than that.

Put this another way; let say the novice know how to do
normalization. Further assume the novice have normalized
some data set. And now then? What to do next? How do the
novice get raw data in the right format to load it into
the RDBMS? And what means should be used to load data?
Does data need to be updated/replaced frequently - are
we talking genomic databases then we are into a nightmare.
But even if these things are are achieved how will the
novice know how to query the data? Etc, etc, etc...

Yes, there are many ways to Rome, but they are all long...


  * * *


You also claim there is 1 to 100 success ratio in diverting
from normalization. As you might understand, referring to your
experience can't be regarded as a solid proof of your claim.

Especially when there are unanswered question mark, as to what
normalization you are referring to, what is being normalized,
the expertise level of the designer, a comparison with the
unknown, etc, etc... 


The 1st primary goal of /any/ software development is:


		*** MAKE IT WORK! ***


This is the primary goal for the beginner to achieve, because
it doesn't matter how fancy your design is if it does not work
at the end of the day. By creating things that works, the novice
will learn from experience what will work smoothly and what does
not, and eventually become a prof.


If the novice just get the data into an RDBMS - in what-
so-ever-normal-form, he will eventually be able to construct
a query that can solves a specific problem for him, simply
because humans are creative beings. However an expert may
say that "the solution was a bit unusual".


In fact quite a lot novices creates, or uses, one single tab
delimited files (with God know what kind of normalization forms),
dump them into some kind of database manger (Excel, MS Access),
and then happily ignore any database design issues and just uses
a "flat" query to get what ever data they need. And this works
in most simplistic cases for the novices. In this way there will
never be a need to changes the layout, just pick what they need,
if something missing, another column will fix that.


I assume you never counted these guys in your 99% failure ratio?


  * * *

On database views:

as you know, view are only predefined queries in the system,
and is only of limited help (a database system is dynamic
in time), since they wont address the issues I already been
mentioning in longer terms.


  * * * 


I am satisfied with your last remark, that queries drives the
design - that is to identify the mini-world we want to model.

And when one design like this, a "good design" will no longer
be measured by the normalization level of the system, but by
the real usability of the system, i.e. according to the
primary goal: make it work!


Kind regards,

	//Anders


On Tue, 2004-03-02 at 19:36, JRambla wrote:
> Svensson,
> 
> For me, learning normalization is like learning to drive.
> When you're experienced, you can make a comfortable use of driving rules:
> semaphore timing, how to measure incoming cars speed, safe car to car
> distance according to speed, when your car will fit in a parking space and
> so on. 
> 
> However, if you're a beginner, you better stay with strict rules and driving
> professor rules-of-thumb. 
> 
> I will advise any beginner to stay with normalization except if the problem
> domain mandates otherwise. From my experience, when the designer wants to be
> smarter than that, they fail, 99% of times. This means having to redesign
> the database after lots of data coherence pains. Most of times, you will
> need to edit some data, also.
> 
> Some of your issues with normalization (too much joins, etc.) can be
> addressed by database objects like views. That's a principle for me: if you
> stay normalized, database engine can probably help you with your troubles,
> if you stay apart, you're usually on your own. You're a "desperado".
> 
> When not talking to beginners, and they can't hear us, we can use another
> principle: "queries drives database design". You must design your database
> (or a version of it) optimizing against the queries you will perform the
> most usually or the most critical.
> 
> In my humble opinion,
> 
> Jordi Rambla
> Barcelona (Spain)
> 
> -----Mensaje original-----
> De: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Michael
> Gruenberger
> Enviado el: martes, 02 de marzo de 2004 18:50
> Para: bio_bulletin_board at bioinformatics.org
> Asunto: RE: [BiO BB] RE: Normalization WAS: database design question
> 
> Thanks for the clarifications! I pretty much agree with everything you
> said. Of course you need a lot of experience to be able to design good
> data models and once you have the experience you probably don't have to
> go through the formal process of normalizing a database, because you
> just instinctively know what works best.
> 
> But still in order to get the experience you have to start somewhere and
> you have to give a beginner an entry point and showing them
> normalization with some good examples has worked for me in most cases...
> 
> This thread started by someone asking for some guidance on how to design
> his database and I still think that pointing that guy to normalization
> was a good idea. It would be interesting to get some feedback though and
> to know whether reading about normalization helped that guy!
> 
> Cheers,
> 
> Michael.
> On Tue, 2004-03-02 at 17:00, Svensson, B.A.T. (HKG) wrote:
> 
> > That's not what I tried to say; Applying the theory makes things
> > more complicated then they are. In short: it's a waste of time
> > in everyday work.
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From roy at colibase.bham.ac.uk  Wed Mar  3 07:49:12 2004
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Wed, 03 Mar 2004 12:49:12 +0000
Subject: [BiO BB] clustalw
In-Reply-To: <20040303122417.78F24D1F1D@www.bioinformatics.org>
References: <20040303122417.78F24D1F1D@www.bioinformatics.org>
Message-ID: <4045D448.2050405@colibase.bham.ac.uk>

> Has anyone used Clustalw in the profile alignment mode? I tried to and found
> that none of the menu items except the first one work. I downloaded the latest
> version (1.83 for DOS) but it seems to have the same problem. If anyone has
> successfully used Clustalw in the profile alignment mode could you please let
> me know how you went about it?

A problem with ClustalW is that error messages sometimes get hidden, as
the menu is redisplayed. I'm guessing there is some problem with your
second profile/sequence set (such as a duplicated sequence name). This
would prevent the second set from loading correctly, and hence disable
the other options. Try scrolling up in your DOS window after attempting
to load the second set, and you should be able to see the error message
and fix the problem. It should say "(loaded)" next to options 1 and 2 if
you have been successful.

Roy.

--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow,
Division of Immunity and Infection,
University of Birmingham, UK

http://colibase.bham.ac.uk


From john_abraham_bio at yahoo.com  Mon Mar  8 10:04:38 2004
From: john_abraham_bio at yahoo.com (John Abraham)
Date: Mon, 8 Mar 2004 07:04:38 -0800 (PST)
Subject: [BiO BB] Universal Primer
Message-ID: <20040308150438.56831.qmail@web60805.mail.yahoo.com>

Hi 
I am looking at universal primer design (either using a single rRNA gene or cocktail of rRNA genes for both prokaryotes and eukaryotes) by using traditional Bioinformatics tools
Does any of the group has experience in it.Any suggestion and reference in this regard to proceed in the above direction is highly helpful for me
Looking for members and experts valuble inputs
Thanks 
John


---------------------------------
Do you Yahoo!?
Yahoo! Search - Find what you?re looking for faster.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040308/e17e01f1/attachment.html>

From kalita at pikespeak.uccs.edu  Tue Mar  9 22:27:55 2004
From: kalita at pikespeak.uccs.edu (J Kalita)
Date: Tue, 9 Mar 2004 20:27:55 -0700 (MST)
Subject: [BiO BB] A question about BLOSUM matrices
Message-ID: <52031.128.198.168.105.1078889275.squirrel@pikespeak.uccs.edu>

Hello,

I have a question about BLOSUM matrices such as the one you see
on the Web page at
http://eta.embl-heidelberg.de:8000/misc/mat/blosum80.html.
Can anyone please explain to me what the last four columns and last four
rows labeled "B", "Z", "X" and "*" are?

Thank you!

Jugal Kalita
Computer Science Department
University of Colorado at Colorado Springs


From pankaj at nii.res.in  Wed Mar 10 00:06:12 2004
From: pankaj at nii.res.in (Pankaj)
Date: Wed, 10 Mar 2004 10:36:12 +0530 (IST)
Subject: [BiO BB] structural bioinformatics
Message-ID: <E1B0vv2-0005zh-00@mail.nii.res.in>

Hi all,
i want to know about any good strcutural bioinformatics tutorial/book
which talks (from preliminary to detail) about the basics of
structural bioinformatics like: 
what is molecular dynamics
what is molecular mechanics
various softwares for them 
etc
if any1 knows about about it kindly tell me
thanks in advance
Pankaj


From idoerg at burnham.org  Wed Mar 10 02:56:11 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Tue, 9 Mar 2004 23:56:11 -0800
Subject: [BiO BB] A question about BLOSUM matrices
In-Reply-To: <52031.128.198.168.105.1078889275.squirrel@pikespeak.uccs.edu>
Message-ID: <Pine.SGI.4.10.10403092343270.18412894-100000@pines2.ljcrf.edu>

Hi Jugal,

B: aspartic acid / asparagine
Z: glutamic acid / glutamine
X: unknown residue
*: anything which is not in {ABCDEFGHIKLMNPQRSTVWXYZ}. Alternatively, it
might be the "translation" of an end codon.


--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo

On Tue, 9 Mar 2004, J Kalita wrote:

> Hello,
> 
> I have a question about BLOSUM matrices such as the one you see
> on the Web page at
> http://eta.embl-heidelberg.de:8000/misc/mat/blosum80.html.
> Can anyone please explain to me what the last four columns and last four
> rows labeled "B", "Z", "X" and "*" are?
> 
> Thank you!
> 
> Jugal Kalita
> Computer Science Department
> University of Colorado at Colorado Springs
> 
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From idoerg at burnham.org  Wed Mar 10 03:04:22 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Wed, 10 Mar 2004 00:04:22 -0800
Subject: [BiO BB] structural bioinformatics
In-Reply-To: <E1B0vv2-0005zh-00@mail.nii.res.in>
Message-ID: <Pine.SGI.4.10.10403092358260.18412894-100000@pines2.ljcrf.edu>

Try: 

Structural Bioinformatics H. Weissig & P. Bourne eds.

http://www.amazon.com/exec/obidos/tg/detail/-/0471201995/qid=1078905457/sr=1-1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books

Good basic coverage. Each chapter written by a different
researcher in the field.

If you would like to learn more about protein physics (as implied by your
question):

Protein Physics (A course of lectures) A. Finkelstein and O. Ptitsyn

Not much bioinformatics per-se, but a treasure trove for learning about
the physical principles of folding, structure & function.

http://www.amazon.com/exec/obidos/tg/detail/-/0122567811/qid=1078905680/sr=1-1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books

Hope this helps,

./I

--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo

On Wed, 10 Mar 2004, Pankaj wrote:

> Hi all,
> i want to know about any good strcutural bioinformatics tutorial/book
> which talks (from preliminary to detail) about the basics of
> structural bioinformatics like: 
> what is molecular dynamics
> what is molecular mechanics
> various softwares for them 
> etc
> if any1 knows about about it kindly tell me
> thanks in advance
> Pankaj
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From lon at bio-code.com  Wed Mar 10 03:32:58 2004
From: lon at bio-code.com (L. James)
Date: Wed, 10 Mar 2004 00:32:58 -0800
Subject: [BiO BB] open source bioinformatics project | Caltech
In-Reply-To: <Pine.SGI.4.10.10403092358260.18412894-100000@pines2.ljcrf.edu>
Message-ID: <BJEGKHOJKHPLDCCBKJMFOEAFDAAA.lon@bio-code.com>

Anybody using Cartwheel - an open source bioinformatics project out of
Caltech?

lon james
managing director
postgres, inc.
san francisco, ca 94110
415-573-9192
lon at efcodd.org
lon at bio-code.com
bizscience at jabber.com

-----Original Message-----
From: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of Iddo
Friedberg
Sent: Wednesday, March 10, 2004 12:04 AM
To: bio_bulletin_board at bioinformatics.org
Subject: Re: [BiO BB] structural bioinformatics


Try:

Structural Bioinformatics H. Weissig & P. Bourne eds.

http://www.amazon.com/exec/obidos/tg/detail/-/0471201995/qid=1078905457/sr=1
-1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books

Good basic coverage. Each chapter written by a different
researcher in the field.

If you would like to learn more about protein physics (as implied by your
question):

Protein Physics (A course of lectures) A. Finkelstein and O. Ptitsyn

Not much bioinformatics per-se, but a treasure trove for learning about
the physical principles of folding, structure & function.

http://www.amazon.com/exec/obidos/tg/detail/-/0122567811/qid=1078905680/sr=1
-1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books

Hope this helps,

./I

--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo

On Wed, 10 Mar 2004, Pankaj wrote:

> Hi all,
> i want to know about any good strcutural bioinformatics tutorial/book
> which talks (from preliminary to detail) about the basics of
> structural bioinformatics like:
> what is molecular dynamics
> what is molecular mechanics
> various softwares for them
> etc
> if any1 knows about about it kindly tell me
> thanks in advance
> Pankaj
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>

_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From dana.reichmann at weizmann.ac.il  Wed Mar 10 12:20:37 2004
From: dana.reichmann at weizmann.ac.il (Dana Reichmann)
Date: Wed, 10 Mar 2004 19:20:37 +0200
Subject: [BiO BB] computational mutagenesis
Message-ID: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il>

Hi all,

I am looking for good programs and algorithms for computational 
mutagenesis that can define hot spots in protein-protein interactions. 
I am interesting in different approach  such as thermodynamical, 
structural etc. I want to understand what kind of approach are better, 
what is maximal prediction % of each.

Thanks a lot,
Dana
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 354 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040310/0804de8a/attachment.bin>

From idoerg at burnham.org  Wed Mar 10 12:50:36 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Wed, 10 Mar 2004 09:50:36 -0800
Subject: [BiO BB] computational mutagenesis
In-Reply-To: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il>
References: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il>
Message-ID: <404F556C.10600@burnham.org>

Dana,

Have you looked at the CAPRI site (http://capri.ebi.ac.uk/)
and papers in Proteins?

(http://www3.interscience.wiley.com/cgi-bin/jissue/104531597)

Iddo

Dana Reichmann wrote:
> Hi all,
> 
> I am looking for good programs and algorithms for computational 
> mutagenesis that can define hot spots in *protein-protein *interactions. 
> I am interesting in different approach such as thermodynamical, 
> structural etc. I want to understand what kind of approach are better, 
> what is maximal prediction % of each.
> 
> Thanks a lot,
> Dana

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo


From dana.reichmann at weizmann.ac.il  Wed Mar 10 13:30:30 2004
From: dana.reichmann at weizmann.ac.il (Dana Reichmann)
Date: Wed, 10 Mar 2004 20:30:30 +0200
Subject: [BiO BB] computational mutagenesis
In-Reply-To: <404F556C.10600@burnham.org>
References: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il> <404F556C.10600@burnham.org>
Message-ID: <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il>

Iddo hi,

thanks for the suggestion, i'll check it,

I am more interesting on evaluation of changes upon mutation (structure 
and thermodynamic changes) when the structure of a wild type complex is 
known. Something similar to FOLDEFF program (from L.Serrano group) or 
Rosetta from D. Baker.

Thanks,
Dana


On Mar 10, 2004, at 7:50 PM, Iddo Friedberg wrote:

> Dana,
>
> Have you looked at the CAPRI site (http://capri.ebi.ac.uk/)
> and papers in Proteins?
>
> (http://www3.interscience.wiley.com/cgi-bin/jissue/104531597)
>
> Iddo
>
> Dana Reichmann wrote:
>> Hi all,
>> I am looking for good programs and algorithms for computational 
>> mutagenesis that can define hot spots in *protein-protein 
>> *interactions. I am interesting in different approach such as 
>> thermodynamical, structural etc. I want to understand what kind of 
>> approach are better, what is maximal prediction % of each.
>> Thanks a lot,
>> Dana
>
> -- 
> Iddo Friedberg, Ph.D.
> The Burnham Institute
> 10901 N. Torrey Pines Rd.
> La Jolla, CA 92037
> USA
> Tel: +1 (858) 646 3100 x3516
> Fax: +1 (858) 713 9930
> http://ffas.ljcrf.edu/~iddo
>


From idoerg at burnham.org  Wed Mar 10 13:45:59 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Wed, 10 Mar 2004 10:45:59 -0800
Subject: [BiO BB] computational mutagenesis
In-Reply-To: <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il>
References: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il> <404F556C.10600@burnham.org> <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il>
Message-ID: <404F6267.5040806@burnham.org>

Dana,

Andrei Sali has been doing some work with exending MODELLER to include 
protein complexes. You might want to check his site to see what's new on 
this front.

./I

Dana Reichmann wrote:

> Iddo hi,
>
> thanks for the suggestion, i'll check it,
>
> I am more interesting on evaluation of changes upon mutation 
> (structure and thermodynamic changes) when the structure of a wild 
> type complex is known. Something similar to FOLDEFF program (from 
> L.Serrano group) or Rosetta from D. Baker.
>
> Thanks,
> Dana
>
>
> On Mar 10, 2004, at 7:50 PM, Iddo Friedberg wrote:
>
>> Dana,
>>
>> Have you looked at the CAPRI site (http://capri.ebi.ac.uk/)
>> and papers in Proteins?
>>
>> (http://www3.interscience.wiley.com/cgi-bin/jissue/104531597)
>>
>> Iddo
>>
>> Dana Reichmann wrote:
>>
>>> Hi all,
>>> I am looking for good programs and algorithms for computational 
>>> mutagenesis that can define hot spots in *protein-protein 
>>> *interactions. I am interesting in different approach such as 
>>> thermodynamical, structural etc. I want to understand what kind of 
>>> approach are better, what is maximal prediction % of each.
>>> Thanks a lot,
>>> Dana
>>
>>
>> -- 
>> Iddo Friedberg, Ph.D.
>> The Burnham Institute
>> 10901 N. Torrey Pines Rd.
>> La Jolla, CA 92037
>> USA
>> Tel: +1 (858) 646 3100 x3516
>> Fax: +1 (858) 713 9930
>> http://ffas.ljcrf.edu/~iddo
>>
>
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>


-- 
--

Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 646 3171
http://ffas.ljcrf.edu/~iddo


From dmb at mrc-dunn.cam.ac.uk  Wed Mar 10 18:38:47 2004
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Wed, 10 Mar 2004 23:38:47 +0000 (GMT)
Subject: [BiO BB] computational mutagenesis
In-Reply-To: <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il>
Message-ID: <Pine.LNX.4.21.0403102337110.20364-100000@mail.mrc-dunn.cam.ac.uk>


I heard about a technique called 'computational alanine scanning' which
simulates an experimental technique to probe protein-protein interaction
(I think). Sounds like a good thing to play with / compare with other
measures.

Dan

On Wed, 10 Mar 2004, Dana Reichmann wrote:

> Iddo hi,
> 
> thanks for the suggestion, i'll check it,
> 
> I am more interesting on evaluation of changes upon mutation (structure 
> and thermodynamic changes) when the structure of a wild type complex is 
> known. Something similar to FOLDEFF program (from L.Serrano group) or 
> Rosetta from D. Baker.
> 
> Thanks,
> Dana
> 
> 
> On Mar 10, 2004, at 7:50 PM, Iddo Friedberg wrote:
> 
> > Dana,
> >
> > Have you looked at the CAPRI site (http://capri.ebi.ac.uk/)
> > and papers in Proteins?
> >
> > (http://www3.interscience.wiley.com/cgi-bin/jissue/104531597)
> >
> > Iddo
> >
> > Dana Reichmann wrote:
> >> Hi all,
> >> I am looking for good programs and algorithms for computational 
> >> mutagenesis that can define hot spots in *protein-protein 
> >> *interactions. I am interesting in different approach such as 
> >> thermodynamical, structural etc. I want to understand what kind of 
> >> approach are better, what is maximal prediction % of each.
> >> Thanks a lot,
> >> Dana
> >
> > -- 
> > Iddo Friedberg, Ph.D.
> > The Burnham Institute
> > 10901 N. Torrey Pines Rd.
> > La Jolla, CA 92037
> > USA
> > Tel: +1 (858) 646 3100 x3516
> > Fax: +1 (858) 713 9930
> > http://ffas.ljcrf.edu/~iddo
> >
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From prathibha_562 at yahoo.co.in  Thu Mar 11 04:23:36 2004
From: prathibha_562 at yahoo.co.in (=?iso-8859-1?q?prathibha=20bharathi?=)
Date: Thu, 11 Mar 2004 09:23:36 +0000 (GMT)
Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of time to complete a single database similarity search
In-Reply-To: <404F6267.5040806@burnham.org>
Message-ID: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com>

Hai all,
 
         My protein sequence analysis tool is taking a lot of time to complete a single request for database similarity search.My database is a relational database for MySQL which contains 16 tables and 2,83,366 sequence entries.
 
My Sequence analysis tool is currently running on a Local intranet server 
with 1.9GHz processor and 256MB RAM.
 
For a single pairwise alignment it is taking around 10msecs depending on the length of query sequence and was  taking more than 24 hours to complete single request with 4 threads working on 4 partitions .By making only 2 threads to be alive at a time working on 2 partitions(I partitioned my Database in to 8 based on sequence chesk sum) ,now it is taking around 9 hours to complete a single request for database similarity search.
 
Is it really possible to reduce the time further with hardware configuration of 1.9Ghz and 256MB RAM.
Or have I to go for more more powerful hardware  configuration.
Now i'm using MySQL database server and Apache HTTP server with JRun application server.Have i to go for more powerful application server than JRun .
My implementation platform is Java and algorithm being used is" SMITH-WATERMAN LOCAL ALIGNMENT" algorithm.
                  Thanking You,
                                                          Prathibha.


Yahoo! India Insurance Special: Be informed on the best policies, services, tools and more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040311/f0b0de38/attachment.html>

From mgollery at unr.edu  Thu Mar 11 09:22:12 2004
From: mgollery at unr.edu (Martin Gollery)
Date: Thu, 11 Mar 2004 06:22:12 -0800
Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of time to complete a single database similarity search
In-Reply-To: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com>
References: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com>
Message-ID: <1079014932.40507614190a0@webmail.unr.edu>

The Smith-Waterman algorithm is quite sensitive, but yes it is slow. Switch to 
FASTA or BLAST with some of the sequences that you have already run and see 
what you miss with your particular data- in most cases, it will be several 
percent that you will miss, but it might be worth it if it allows you to get 
the job done in a reasonable amount of time.


Cheers,
Marty

Quoting prathibha bharathi <prathibha_562 at yahoo.co.in>:

> Hai all,
>  
>          My protein sequence analysis tool is taking a lot of time to
> complete a single request for database similarity search.My database is a
> relational database for MySQL which contains 16 tables and 2,83,366 sequence
> entries.
>  
> My Sequence analysis tool is currently running on a Local intranet server 
> with 1.9GHz processor and 256MB RAM.
>  
> For a single pairwise alignment it is taking around 10msecs depending on the
> length of query sequence and was  taking more than 24 hours to complete
> single request with 4 threads working on 4 partitions .By making only 2
> threads to be alive at a time working on 2 partitions(I partitioned my
> Database in to 8 based on sequence chesk sum) ,now it is taking around 9
> hours to complete a single request for database similarity search.
>  
> Is it really possible to reduce the time further with hardware configuration
> of 1.9Ghz and 256MB RAM.
> Or have I to go for more more powerful hardware  configuration.
> Now i'm using MySQL database server and Apache HTTP server with JRun
> application server.Have i to go for more powerful application server than
> JRun .
> My implementation platform is Java and algorithm being used is"
> SMITH-WATERMAN LOCAL ALIGNMENT" algorithm.
>                   Thanking You,
>                                                           Prathibha.
> 
> 
> 
> Yahoo! India Insurance Special: Be informed on the best policies, services,
> tools and more.


Martin Gollery
Associate Director
Center For Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS330
775-784-7042


-------------------------------------------------
This mail sent through https://webmail.unr.edu


From hamid at ibb.ut.ac.ir  Thu Mar 11 09:44:46 2004
From: hamid at ibb.ut.ac.ir (hamid)
Date: Thu, 11 Mar 2004 19:14:46 +0430
Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of
	time to complete a single database similarity search
In-Reply-To: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com>
References: <404F6267.5040806@burnham.org> <20040311092336.50395.qmail@web8202.mail.in.yahoo.com>
Message-ID: <WorldClient-F200403111914.AA14460015@ibb.ut.ac.ir>

 Why do you use 18 tables? the relation among theses tables help to 
reduce speed! I think its better to use less tables. i think most of 
consumed time is for searching sequences in tables!!
/*
?Hamid Nikbakht, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
?M.Sc of Cell and Molecular Biology, ? ? ? ? ? ?
?Laboratory of Biophysics and Molecular Biology,
 Bioinformatics Department,
?Institute of Biochemistry and Biophysics(IBB), 
?University of Tehran, ? ? ? ? ? ? ? ? ? ? ? ? ?
?Tehran,Iran. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
?Tel: +98-21-611-3322 ? ? ? ? ? ? ? ? ? ? ? ? ? 
?Fax: +98-21-640-4680 ? ? ? ? ? ? ? ? ? ? ? ? ? 
?E-Mail: hamid at ibb.ut.ac.ir ? ? ? ? ? ? ? ? ? ? 
?Alt. E-mail: nikbakht at ibb.ut.ac.ir ? ? ? ? ? ? 
*/


-----Original Message-----
From: prathibha bharathi <prathibha_562 at yahoo.co.in>
To: bio_bulletin_board at bioinformatics.org
Date: Thu, 11 Mar 2004 09:23:36 +0000 (GMT)
Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of 
time to complete a single database similarity search

> Hai all,
>  
>          My protein sequence analysis tool is taking a lot of time to
> complete a single request for database similarity search.My database is
> a relational database for MySQL which contains 16 tables and 2,83,366
> sequence entries.
>  
> My Sequence analysis tool is currently running on a Local intranet
> server 
> with 1.9GHz processor and 256MB RAM.
>  
> For a single pairwise alignment it is taking around 10msecs depending
> on the length of query sequence and was  taking more than 24 hours to
> complete single request with 4 threads working on 4 partitions .By
> making only 2 threads to be alive at a time working on 2 partitions(I
> partitioned my Database in to 8 based on sequence chesk sum) ,now it is
> taking around 9 hours to complete a single request for database
> similarity search.
>  
> Is it really possible to reduce the time further with hardware
> configuration of 1.9Ghz and 256MB RAM.
> Or have I to go for more more powerful hardware  configuration.
> Now i'm using MySQL database server and Apache HTTP server with JRun
> application server.Have i to go for more powerful application server
> than JRun .
> My implementation platform is Java and algorithm being used is"
> SMITH-WATERMAN LOCAL ALIGNMENT" algorithm.
>                   Thanking You,
>                                                           Prathibha.
> 
> 
> 
> Yahoo! India Insurance Special: Be informed on the best policies,
> services, tools and more.


From lxyiwc at yahoo.com  Thu Mar 11 12:03:33 2004
From: lxyiwc at yahoo.com (l x yi)
Date: Thu, 11 Mar 2004 09:03:33 -0800 (PST)
Subject: [BiO BB] protein database
Message-ID: <20040311170333.96300.qmail@web21207.mail.yahoo.com>

Hi, 
for my research, I need to download about 800 protein sequences each with length more than 1000 amino acids, could anyone tell me what is the best way to do this? I've looked at some batch retrieval options on the internet, but they only allow text searching using ids instead of specifying length.. 
 
Thanks very much for your help. 
 
Lily


---------------------------------
Do you Yahoo!?
Yahoo! Search - Find what you?re looking for faster.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040311/827e5706/attachment.html>

From schlitt at ebi.ac.uk  Thu Mar 11 12:57:15 2004
From: schlitt at ebi.ac.uk (Thomas Schlitt)
Date: Thu, 11 Mar 2004 17:57:15 +0000 (GMT)
Subject: [BiO BB] protein database
In-Reply-To: <20040311170333.96300.qmail@web21207.mail.yahoo.com>
Message-ID: <Pine.LNX.4.10.10403111753370.26984-100000@ma-login.ebi.ac.uk>

Hi Lily
from where do you want to download the data?
srs lets you do something like this (I think) for the databases at EBI
try www.ebi.ac.uk/srs
select a protein Database (uniprot?)
go to query and click on field to choose Sequence length
you might want to define a "view" which contains the data fields you want
... dont know if this helps...

Cheers
Thomas


On Thu, 11 Mar 2004, l x yi wrote:

> Hi, 
> for my research, I need to download about 800 protein sequences each with length more than 1000 amino acids, could anyone tell me what is the best way to do this? I've looked at some batch retrieval options on the internet, but they only allow text searching using ids instead of specifying length.. 
>  
> Thanks very much for your help. 
>  
> Lily
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Search - Find what you?re looking for faster.

_____________________________________________________________
     Thomas Schlitt - Bioinformatics Research Fellow 
    
EMBL-EBI, Hinxton             British Antarctic Survey 
Wellcome Trust Genome Campus  High Cross, Madingley Road                        
Cambridge CB10 1SD, UK        Cambridge CB3 0ET, UK 
Tel. ++44-1223-494651         Tel. ++44-1223-221656
eMail schlitt at ebi.ac.uk       tsc at bas.ac.uk  


From logan at cacs.louisiana.edu  Thu Mar 11 14:01:25 2004
From: logan at cacs.louisiana.edu (Rasaiah Loganantharaj)
Date: Thu, 11 Mar 2004 13:01:25 -0600
Subject: [BiO BB] Day symposium on Bioinformatics at University of Louisiana on
 April 8
Message-ID: <1079031685.3056.94.camel@logan.cacs.louisiana.edu>

We are organizing a day symposium on Bioinformatics at the University of
Louisiana, Lafayette. The keynote presentation at the symposium include
Dr. David Mount from University of Arizona, Dr. Peter Good from NIH, Dr.
Luxmy Parida from IBM T. J. Watson Lab. and Dr. Mark Borodovsky from
Georgia Institute of Technology.

The objective of the symposium is to help the new and existing
bioinformatics researchers and to facilitate  networking for future
collaborative research and funding opportunity.

There is no registration fee to attend/present at the symposium, but
must register to attend the symposium. Those who want to present at the
symposium, must submit his/her abstract before March 19th, 2004. 

The details of the symposium is given at 
http://www.cacs.louisiana.edu/bioinformatics/index.html

Best regards from
Raja Loganantharaj


From B.A.T.Svensson at lumc.nl  Fri Mar 12 04:57:32 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: 12 Mar 2004 10:57:32 +0100
Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of
	time to complete a single database similarity search
In-Reply-To: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com>
References: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com>
Message-ID: <1079024916.2958.259.camel@ander>

This 10ms is that the true execution time, or just the
measured time between call and return?

~2G Hz means that you are executing instructions in the
nanoseconds interval with the CPU. So any single operation
that consumes 1ms of CPU time have done quite a lot things -
or did just butterflied for a while.

It might be that your machine  is swapping and is spending
time in doing paging (disk I/O /IS/ slow). The standard
solution for this is to put in more RAM in the machine.

You mentioning that 2 threads runs faster than 4, this might
just be a coincident (dependent on current load, weather you
uses light weight thread, available system resources, etc, etc)
but it might also suggest that you are low on memory, and the
operating system is forced to swap memory pages when doing
context switches. (This can be true for light weight threads
if they uses a lot of private memory for data.)


But you say you are using Java. And for me that explains it all. :)

In my experience (been watching java process executing) Java is
really bad in handling memory garbage collection when it gets
stressed. This is due to the fact that the decision on when
to do garbage collection is left to a machine - and as such
it might not be at the most optimal time memory is being 
released - and that might very well be your case - you can
figure this out by watching the memory allocation/deallocation
statistics from your application.

In any case, you may be able to handle this is two ways: write
your application in C++_ and take control of the memory handling,
or just by sufficient enough memory so the Java app never runs
out of memory. 


On Thu, 2004-03-11 at 10:23, prathibha bharathi wrote:
> Hai all,
>  
>          My protein sequence analysis tool is taking a lot of time to
> complete a single request for database similarity search.My database
> is a relational database for MySQL which contains 16 tables and
> 2,83,366 sequence entries.
>  
> My Sequence analysis tool is currently running on a Local intranet
> server with 1.9GHz processor and 256MB RAM.
>  
> For a single pairwise alignment it is taking around 10msecs depending
> on the length of query sequence and was  taking more than 24 hours to
> complete single request with 4 threads working on 4 partitions .By
> making only 2 threads to be alive at a time working on 2 partitions(I
> partitioned my Database in to 8 based on sequence chesk sum) ,now it
> is taking around 9 hours to complete a single request for database
> similarity search.
>  
> Is it really possible to reduce the time further with hardware
> configuration of 1.9Ghz and 256MB RAM.
> Or have I to go for more more powerful hardware configuration.
> Now i'm using MySQL database server and Apache HTTP server with JRun
> application server.Have i to go for more powerful application server
> than JRun .
> My implementation platform is Java and algorithm being used is"
> SMITH-WATERMAN LOCAL ALIGNMENT" algorithm.
>                   Thanking You,
>                                                           Prathibha.
> 
> 
> 
> Yahoo! India Insurance Special: Be informed on the best policies,
> services, tools and more.


From ssaikumar at yahoo.co.uk  Sun Mar 14 21:49:38 2004
From: ssaikumar at yahoo.co.uk (sadhu saikumar)
Date: Sun, 14 Mar 2004 18:49:38 -0800 (PST)
Subject: [BiO BB] problems encountered while using tools
Message-ID: <20040315024938.18805.qmail@web9608.mail.yahoo.com>

Hi all,
We are doing a survey in which we are trying to talk
to Bioinformatics or biology people regarding 
1. The problems that they are facing while using the
web based tools. Ex: Filtering of records is not
available based on species for BLAST.
2. You might want to have particular tool online
instead of installing it on your local system. Ex:
Sequence Viewer which is not hosted on the web whereas
BLAST is hosted by NCBI.
3.  Features such as results from multiple databases
is missing.
And so on.

I know this community uses the Tools very much in
their daily life. So It would be glad for us to know
such problems and solve it for you.
thanks,
Sai.


__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com


From landman at scalableinformatics.com  Mon Mar 15 07:14:59 2004
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 15 Mar 2004 07:14:59 -0500
Subject: [BiO BB] NCBI BLAST 2.2.8 RPMs available
Message-ID: <40559E43.7010004@scalableinformatics.com>

Folks:

  Redid packaging of the NCBI toolkit RPMs.  The binaries are better 
captured by the newer system, and the sizes are somewhat larger.  
Packages built for P4 (labeled i686), Opteron (x86_64), Athlon, and 
source (src).  The packages are located at 
http://downloads.scalableinformatics.com/downloads/ncbi/  and labeled 
2.2.8-1 .  The x86_64 was built under Fedora Core for AMD64, working on 
getting a SUSE load on the machine as well.  The athlon and p4 packages 
were built under RH9.0.  If you try to install a binary and it fails to 
work (crashes and dumps core), pull the source and run (as root)

       rpmbuild --rebuild NCBI-2.2.8-1.src.rpm

on your machine.  It will eventually generate the RPM barring 
compilation/permissions problems.  You might need to do this if you are 
using a pre-RedHat 9.0 machine (the NPTL issue).  

Joe

--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615


From t.fiedler at umiami.edu  Thu Mar 18 13:16:10 2004
From: t.fiedler at umiami.edu (Tristan Fiedler)
Date: Thu, 18 Mar 2004 13:16:10 -0500 (EST)
Subject: [BiO BB] blastall in php
Message-ID: <53089.129.171.111.22.1079633770.squirrel@webmail.rsmas.miami.edu>

Greetings to All!

If possible, please comment on

http://www.phphelp.com/phpBB2/viewtopic.php?t=6040

Thanks


-- 
Tristan J. Fiedler, Ph.D.
Postdoctoral Research Fellow - Walsh Laboratory
NIEHS Marine & Freshwater Biomedical Sciences Center
Rosenstiel School of Marine & Atmospheric Sciences
University of Miami

tfiedler at rsmas.miami.edu
t.fiedler at umiami.edu (alias)
305-361-4626


From B.A.T.Svensson at lumc.nl  Thu Mar 18 13:28:06 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: Thu, 18 Mar 2004 19:28:06 +0100
Subject: [BiO BB] blastall in php
Message-ID: <D291F33C586C8E48B95C26F8C805513A01A3D9C3@mail5.lumc.nl>

If you want to execute external program with php in general
you might like to have a look at http://www.php.net/popen.

But I see no reason in general that would prevent you to
start a blast with php. 

When it comes the that guys question about his program,
it does not make sense at all. 

-----Original Message-----
From: Tristan Fiedler
To: bio_bulletin_board at bioinformatics.org
Sent: 18-3-2004 19:16
Subject: [BiO BB] blastall in php

Greetings to All!

If possible, please comment on

http://www.phphelp.com/phpBB2/viewtopic.php?t=6040

Thanks


-- 
Tristan J. Fiedler, Ph.D.
Postdoctoral Research Fellow - Walsh Laboratory
NIEHS Marine & Freshwater Biomedical Sciences Center
Rosenstiel School of Marine & Atmospheric Sciences
University of Miami

tfiedler at rsmas.miami.edu
t.fiedler at umiami.edu (alias)
305-361-4626
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From idoerg at burnham.org  Thu Mar 18 13:53:11 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Thu, 18 Mar 2004 10:53:11 -0800
Subject: [BiO BB] blastall in php
In-Reply-To: <53089.129.171.111.22.1079633770.squirrel@webmail.rsmas.miami.edu>
References: <53089.129.171.111.22.1079633770.squirrel@webmail.rsmas.miami.edu>
Message-ID: <4059F017.5020609@burnham.org>

Tristan,

Could it have something to do with the .ncbirc file not being properly 
read when called from php? Just a thought.

./I


Tristan Fiedler wrote:
> Greetings to All!
> 
> If possible, please comment on
> 
> http://www.phphelp.com/phpBB2/viewtopic.php?t=6040
> 
> Thanks
> 
> 
> 

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo


From pculpep at hotmail.com  Thu Mar 18 14:30:15 2004
From: pculpep at hotmail.com (Pamela Culpepper)
Date: Thu, 18 Mar 2004 19:30:15 +0000
Subject: [BiO BB] blastall in php
Message-ID: <BAY9-F22LsJd9tiTcwY0001f4c1@hotmail.com>

Tristan,

You will need to fork another process to run the blastall program.

Pam
LifeFormulae, L.L.C


>From: Iddo Friedberg <idoerg at burnham.org>
>Reply-To: bio_bulletin_board at bioinformatics.org
>To: bio_bulletin_board at bioinformatics.org
>Subject: Re: [BiO BB] blastall in php
>Date: Thu, 18 Mar 2004 10:53:11 -0800
>
>Tristan,
>
>Could it have something to do with the .ncbirc file not being properly read 
>when called from php? Just a thought.
>
>./I
>
>
>Tristan Fiedler wrote:
>>Greetings to All!
>>
>>If possible, please comment on
>>
>>http://www.phphelp.com/phpBB2/viewtopic.php?t=6040
>>
>>Thanks
>>
>>
>>
>
>--
>Iddo Friedberg, Ph.D.
>The Burnham Institute
>10901 N. Torrey Pines Rd.
>La Jolla, CA 92037
>USA
>Tel: +1 (858) 646 3100 x3516
>Fax: +1 (858) 713 9930
>http://ffas.ljcrf.edu/~iddo
>
>_______________________________________________
>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar ? get it now! 
http://clk.atdmt.com/AVE/go/onm00200415ave/direct/01/


From rwang at bccancer.bc.ca  Thu Mar 18 18:25:35 2004
From: rwang at bccancer.bc.ca (Renxue Wang)
Date: Thu, 18 Mar 2004 15:25:35 -0800
Subject: [BiO BB] Error message with nrdb
Message-ID: <6BAF4D075F07D411B30900508B94CBA00D846708@SERVER20>

Hi, There,

I am using NRDB implemented in WUBLAST to eliminate the replicated entries 
in my custom sequence database.  While I am processing one of 
my sequence file (about 50 mb), NRDB gave me an error message read like
this,

$ nrdb test.fa >testnr
FATAL:  Report:  fwrite error:  Success
$

and it appears that the operation stopped running at this point.  I tried
the 
same thing both on the server and on my linux, both stopped at same sequence

(does not seem anything wrong with this seq, when I move this seq to the end
of the sequence file. The program stopped somewhere else).  The last line of
the output file is, 

>gi|28574491:CDS(1), original length:2213. ORF(+) length:465,1..465,
FATAL:  Report:  fwrite error:  Success

Does anyone know what the error message means and how to deal with it?

Thanks a lot.  

Renxue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040318/0c33652c/attachment.html>

From brent at scihelp.com  Tue Mar  9 23:52:47 2004
From: brent at scihelp.com (Brent Silver)
Date: Tue, 9 Mar 2004 20:52:47 -0800 (PST)
Subject: [BiO BB] request for info
Message-ID: <20040310045247.92605.qmail@web109.biz.mail.yahoo.com>

Hi,
 
I am a Bay area biotech researcher who is interested in career opportunities in the Bay area (bioinformatics specifically). Can you please advise?
 
Regards,
 
Brent Silver
www.scihelp.com/cv.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040309/c43a3acb/attachment.html>

From straberry_fizza at yahoo.co.in  Mon Mar 15 23:59:29 2004
From: straberry_fizza at yahoo.co.in (=?iso-8859-1?q?sunil=20kumar?=)
Date: Tue, 16 Mar 2004 04:59:29 +0000 (GMT)
Subject: [BiO BB] about bioinformatics
Message-ID: <20040316045929.20908.qmail@web8304.mail.in.yahoo.com>

respected sir,
     I am studying msc in madras university.Here nobody is there to give good suggetions for bioinformatics projects as well as about the course also.This april I am going to complete my course.Just give your advices after this cource what jobs avialable and which is the good project.Please inform me sir which companies are good For this I will always greatefull to you sir.
                          Thanking you sir
                                                                                SUNIL

Yahoo! India Promos:  Win a trip for 2 to Britain. Click here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040316/6a6c96de/attachment.html>

From mourad12345678 at yahoo.com  Wed Mar 17 21:08:15 2004
From: mourad12345678 at yahoo.com (Mourad Elloumi)
Date: Wed, 17 Mar 2004 18:08:15 -0800 (PST)
Subject: [BiO BB] CfP : Algorithms in Molecular Biology (ALBIO'04)
Message-ID: <20040318020815.90890.qmail@web12304.mail.yahoo.com>

                   CALL FOR PAPERS

            Algorithms in Molecular Biology
                     (ALBIO'04)

                     Session of

  6th Int.Conf. on Algorithms, Scientific Computing,
              Modelling and Simulation 
                    (ASCOMS'04) 

                   Cancun, Mexico,
                   May 12-15, 2004

http://www.worldses.org/conferences/2004/mexico/ascoms/index.html

Computational Molecular Biology, has emerged from the
Human Genome Project as an important discipline for
academic research and industrial application. The
growing size of biological databases, the complexity
of biological problems and the necessity to deal with
errors in biological sequences all result in large run
time and memory requirements. Biological sequence
databases are growing at an exponential rate. All of
these factors will make the development of fast, low
memory requirements and high-performances algorithms
increasingly important in Computational Molecular
Biology. 

In our session, we are interested in papers that deal
with all aspects of algorithms in Molecular Biology.
We are, particularly, interested in algorithms that
address fundamental and/or applied problems in
Molecular Biology, that are computationally efficient,
that have been implemented and experimented on
simulated and/or on real biological sequences and that
provide interesting new results. The submitted papers
should present recent research results and identify
and explore directions for future research. 
Topics include, but not limited to: (i) strings
processing, (ii) biological sequences comparison,
(iii) structures prediction, (iv) phylogeny
reconstruction, (v) DNA sequences assembly,
clustering, and mapping, (vi) molecular evolution,
(vii) genes prediction/recognition, (viii) genes
expression 

INSTRUCTIONS TO AUTHORS
You are invited to submit a hardcopy or a pdf version
of a draft paper, about 4 to 5 pages including figures
and references, before April 2, 2004 to the Session
Chair :

Dr. Mourad Elloumi :
Mailing Address : Cit? Intilak bloc 6, app. 7,
                  El Menzah 6,
                  2091 Tunis,
                  Tunisia.
E.Mail: Mourad.Elloumi at fsegt.rnu.tn 
 
how to send your paper?

please, fill in the form

http://www.worldses.org/conferences/2004/mexico/internet.htm


and declare that your paper belong to this session

 
__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com


From mourad12345678 at yahoo.com  Thu Mar 18 16:38:06 2004
From: mourad12345678 at yahoo.com (Mourad Elloumi)
Date: Thu, 18 Mar 2004 13:38:06 -0800 (PST)
Subject: [BiO BB] 2004 RECOMB Satellite Meeting on DNA Sequencing Technologies and Computation
Message-ID: <20040318213806.40829.qmail@web12308.mail.yahoo.com>

Dear colleagues,

The fourth annual RECOMB Satellite meeting on DNA
Sequencing Technologies and Computation
(http://recomb-satellite.stanford.edu/) will take
place on May 22-23, at Stanford University. This year
we will have 
another exciting, focussed meeting, with emphasis on
new sequencing technologies and on the future of
sequencing efforts. 

Confirmed speakers include Robert Waterston
(Washington University School of Medicine), Jeff
Schloss (NIH), Bjorn Andersson (Karolinska 
Institute, Sweden), Lene V. Hau (Harvard University),
Tony Smith (Solexa), Jonathan Rothberg (CuraGen),
Mostafa Ronaghi (Stanford Genome Technology 
Center), Paul Havlak (Baylor College of Medicine),
Serafim Batzoglou (Stanford University), James Galagan
(Whitehead Institute/MIT Center for Genome Research). 

CALL FOR ABSTRACTS

Fourth Annual RECOMB Satellite meeting on DNA
Sequencing Technologies and Computation 

May 22-23, Stanford University, Stanford, CA

Abstract Submission Deadline: April 4; Notification of
acceptance: April 20, 2004.

Genome sequencing has been truly flourishing the past
several years. Recent achievements include the
completion of the human, mouse, rat, fugu, mosquito,
malaria, and several other genomes. April 2003 marked
the completion of the finished version of the human
genome. The recent sequencing achievements have been
possible because of advances in both lab techniques,
and computational methods and capabilities. Notably,
computational assembly has been an essential part of
sequencing since the conception of the sequencing
technology, and recent advances to computational
assembly systems and algorithms were instrumental in
recent sequencing successes. 

Despite the success of recent sequencing projects,
genome sequencing is still extremely costly,
time-consuming, and error-prone. Some efforts in
making sequencing vastly easier, potentially reducing
time and cost by several orders of magnitude are
starting to emerge. Novel sequencing methods hold
great potential for the future, and developing such 
technologies will be a focus of NIH for the next 5 to
10 years. The ultimate goal is to sequence or
re-sequence a mammalian-size genome for as little as
$1,000. 

Once the genome is at hand, the next step is analysis.
The first steps in analyzing genomes are to annotate
genes, common repeats, and other biologically
important elements, and to compare genomes of related 
organisms. High-throughput pipelines and servers for
that purpose are instrumental to making the genomic
data useful to the research community. 

The purpose of this meeting is to bring together many
of the people working on algorithms and software for
large-scale sequencing and analysis of genomes, and
novel technologies for genome determination. The main 
themes will be:

? Whole Genome Sequencing and Assembly. 

? New and exotic sequencing technologies. 

? Whole-genome analysis. 

Topics of interest include: whole-genome sequencing
and assembly, novel sequencing technologies and the
computational assembly problems they motivate,
improved methods for sequencing and finishing,
comparison and reconciliation of whole genome
assemblies, high-throughput experimental techniques
for genome analysis, pipelines for whole-genome
annotation, comparison, and analysis Successful
submissions will be invited for a 15-minute
presentation, and a 1-2 page abstract will be printed
on the conference proceedings, to be distributed to
the meeting attendees. 

Abstracts should be 1 to 2 pages, and submitted in
plain text or WORD format.

Abstract Submission Deadline: April 4; Notification of
acceptance: April 20, 2004.

http://recomb-satellite.stanford.edu/ 
<http://recomb-satellite.stanford.edu/> 


__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com


From biotelerock at yahoo.com  Mon Mar 22 12:29:36 2004
From: biotelerock at yahoo.com (Samantha Austin)
Date: Mon, 22 Mar 2004 09:29:36 -0800 (PST)
Subject: [BiO BB] Bay Area Bioinformatic Startup
Message-ID: <20040322172936.31541.qmail@web41410.mail.yahoo.com>

Hello Folks,
    I'm looking for bench trained, bay area molecular
biologists (PhD) who would be interested in starting
up a bioinformatic support company for life
scientists.  Skills should include Java/HTML/MySQL,
extensive hands on experience with molecular biology
techniques, and a bioentrepreneurial
 spirit.  This isn't a job offer, just a request for
folks interested in starting up something that might
grow into a great day job.

__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html


From lon at bio-code.com  Mon Mar 22 12:52:42 2004
From: lon at bio-code.com (L. James)
Date: Mon, 22 Mar 2004 09:52:42 -0800
Subject: [BiO BB] Bay Area Bioinformatic Startup
In-Reply-To: <20040322172936.31541.qmail@web41410.mail.yahoo.com>
Message-ID: <BJEGKHOJKHPLDCCBKJMFKEHDDAAA.lon@bio-code.com>

i'm interested. 415-573-9192
lon

-----Original Message-----
From: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of
Samantha Austin
Sent: Monday, March 22, 2004 9:30 AM
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] Bay Area Bioinformatic Startup


Hello Folks,
    I'm looking for bench trained, bay area molecular
biologists (PhD) who would be interested in starting
up a bioinformatic support company for life
scientists.  Skills should include Java/HTML/MySQL,
extensive hands on experience with molecular biology
techniques, and a bioentrepreneurial
 spirit.  This isn't a job offer, just a request for
folks interested in starting up something that might
grow into a great day job.

__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From monty_afs at yahoo.com  Wed Mar 24 05:54:15 2004
From: monty_afs at yahoo.com (monty monty)
Date: Wed, 24 Mar 2004 02:54:15 -0800 (PST)
Subject: hello MR sunil ----Re: [BiO BB] about bioinformatics
In-Reply-To: <20040316045929.20908.qmail@web8304.mail.in.yahoo.com>
Message-ID: <20040324105415.86367.qmail@web40707.mail.yahoo.com>

hello 
  Sunil
   So regarding the projects in bioinformatics
-presently there is no good research going on in it-
and most of the bioinfo companies placing students of
computer science only. i suggest you to apply to all
big companies concerning
lifesciences,biomedicine,bioinformatics -even though u
dont have work experience,i am sure if you try hard u
will be placed in good position.

 all the best.
 sridhar.M

--- sunil kumar <straberry_fizza at yahoo.co.in> wrote:
> respected sir,
>      I am studying msc in madras university.Here
> nobody is there to give good suggetions for
> bioinformatics projects as well as about the course
> also.This april I am going to complete my
> course.Just give your advices after this cource what
> jobs avialable and which is the good project.Please
> inform me sir which companies are good For this I
> will always greatefull to you sir.
>                           Thanking you sir
>                                                     
>                            SUNIL
> 
> Yahoo! India Promos:  Win a trip for 2 to Britain.
> Click here.


__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html


From sean_s_sun at hotmail.com  Thu Mar 25 11:26:42 2004
From: sean_s_sun at hotmail.com (S S)
Date: Thu, 25 Mar 2004 16:26:42 +0000
Subject: [BiO BB] Bay Area Bioinformatic Startup
Message-ID: <Sea2-F532kIsfkqdIie00007fd6@hotmail.com>

An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20040325/40aa7fd0/attachment.html>

From pkerrwall at psu.edu  Thu Mar 25 13:51:38 2004
From: pkerrwall at psu.edu (Kerr Wall)
Date: Thu, 25 Mar 2004 13:51:38 -0500
Subject: [BiO BB] BLAST problem: limiting # of HSPs
Message-ID: <BC88946A.D348%pkerrwall@psu.edu>

We are runing tBLASTx locally against our own data set.  We're looking for
ways to reduce the output size produced by BLAST and have set the alignment
view to tabular (-m 8).  The problem that we've come across is that a query
will have multiple hits to the same sequence but for different HSPs.  We
need BLAST to retain only one result instead of filling the BLAST report
with multiple E values for HSPs from the same gene.

In the default blast output, there are summary statistics for the overall
hit, is there an option for the tab-deliminated BLAST output that would give
us this overall hit statistic instead of one for each HSP?

If not, is there an option to limit the number of HSPs returned in the
tab-deliminated output?

Thanks,

Kerr Wall


From dmb at mrc-dunn.cam.ac.uk  Thu Mar 25 15:03:48 2004
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Thu, 25 Mar 2004 20:03:48 +0000 (GMT)
Subject: [BiO BB] BLAST problem: limiting # of HSPs
In-Reply-To: <BC88946A.D348%pkerrwall@psu.edu>
Message-ID: <Pine.LNX.4.21.0403251948280.31968-100000@mail.mrc-dunn.cam.ac.uk>

> In the default blast output, there are summary statistics for the overall
> hit, is there an option for the tab-deliminated BLAST output that would give
> us this overall hit statistic instead of one for each HSP?


I think you can simply sum the e-values for each non overlapping HSP (I 
think they shouldn't overlap). Anybody know the correct formula?


> If not, is there an option to limit the number of HSPs returned in the
> tab-deliminated output?


I am sure there is a way to do this, but I can't find any mention of this
option in the 

ncbi/doc/blast.txt

file.

Hmm.... Not sure if these have anything to do with it...

-K N (blastall, blastcl3, blastpgp)
       Number  of  best  hits from a region to keep (off by default, if
       used a value of 100 is recommended)

-P N (blastall, blastpgp, rpsblast)
       Set to  1  for  single-hit  mode  or  0  for  multiple-hit
       mode (default)

-b N (blastall, blastcl3, blastpgp, impala, megablast, rpsblast, seed-
      top)
       Number of database sequences to show alignments for (B) (default
       is 250)

If you get an answer from blast-help at ncbi.nlm.nih.gov can you please post
it up? (these emails get archived).

Cheers,
Dan.

> 
> Thanks,
> 
> Kerr Wall
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From pkerrwall at psu.edu  Fri Mar 26 17:18:42 2004
From: pkerrwall at psu.edu (Kerr Wall)
Date: Fri, 26 Mar 2004 17:18:42 -0500
Subject: [BiO BB] BLAST problem: limiting # of HSPs
In-Reply-To: <20040326170107.D2A93D1F11@www.bioinformatics.org>
Message-ID: <BC8A1672.D388%pkerrwall@psu.edu>

On 3/26/04 12:01 PM, "Dan Bolser <dmb at mrc-dunn.cam.ac.uk>" wrote:

>> In the default blast output, there are summary statistics for the overall
>> hit, is there an option for the tab-deliminated BLAST output that would give
>> us this overall hit statistic instead of one for each HSP?
> 
> 
> I think you can simply sum the e-values for each non overlapping HSP (I
> think they shouldn't overlap). Anybody know the correct formula?

I can handle non overlapping HSP's because I would only be parsing out the
best evalue from each hit.  I'm just trying to avoid it if at all possible.
I'm running a tblastx of ~ 1,000,000 cdna's against themselves to produce a
similarity matrix.  Therefore, I'm more worried about the size of the output
files and making sure that I don't run out of similarities between more
distantly related genes that might get left out of the output when the
maximum number of hits is reached (for some of the larger gene families).  I
need to make sure the matrix is as symmetrical as possible.

>> If not, is there an option to limit the number of HSPs returned in the
>> tab-deliminated output?
> 
> I am sure there is a way to do this, but I can't find any mention of this
> option in the 
> 
> ncbi/doc/blast.txt

Yes, I know.  They don?t even discuss all of the options in that file.  You
would think that the documentation for blast would be complete considering
how long it has been around.

> Hmm.... Not sure if these have anything to do with it...
> 
> -K N (blastall, blastcl3, blastpgp)
>      Number  of  best  hits from a region to keep (off by default, if
>      used a value of 100 is recommended)
> 
> -P N (blastall, blastpgp, rpsblast)
>      Set to  1  for  single-hit  mode  or  0  for  multiple-hit
>      mode (default)
> 
> -b N (blastall, blastcl3, blastpgp, impala, megablast, rpsblast, seed-
>     top)
>      Number of database sequences to show alignments for (B) (default
>      is 250)

Thanks.  Those are the parameters I've been working with so far.  I did find
a paragraph in the documentation that might be on this same track.
Specifically #4 in the section "Notes for 2.0.6 release":


############################################################################
Notes for 2.0.6 release:

Enhancements:

...

4.) BLAST has been changed to reduce the number of redundant hits that a
user may see.  This is acheived by keeping track of the number of hits
completely contained in a certain region and eliminating those lower scoring
hits that are redundant with others.  This behavior may be controlled with
the -K and -L options:

  -K  Number of best hits from a region to keep [Integer]
    default = 50
  -L  Length of region used to judge hits [Integer]
    default = 20

Setting -K to zero turns off this feature.  This is the default only on
blastall.
############################################################################

Of course, when you get a list of all the options 'blastall -', the L option
is labeled as '-L  Location on query sequence [String]  Optional'.  Not sure
what to make of that?  I wonder if they have changed parameter names from
2.0.6 to 2.2.8?

It looks as if setting K = 1 and using L > 100 (or much larger) would help
me reduce the number of output.  I think also using P = 1 as you stated
above would probably help out the most.

> If you get an answer from blast-help at ncbi.nlm.nih.gov can you please post
> it up? (these emails get archived).

I will.  I sent them an email yesterday afternoon so I won't be expecting
anything back until sometime next week.  I usually have solved the problem
by the time they get back to me.

Thanks for the help,

Kerr


> Cheers,
> Dan.
> 
>> 
>> Thanks,
>> 
>> Kerr Wall
>> 
>> _______________________________________________
>> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>> 
> 
> 
> 
> --__--__--
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
> 
> End of BiO_Bulletin_Board Digest
> 


From dmb at mrc-dunn.cam.ac.uk  Sat Mar 27 07:47:15 2004
From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser)
Date: Sat, 27 Mar 2004 12:47:15 +0000 (GMT)
Subject: [BiO BB] BLAST problem: limiting # of HSPs
In-Reply-To: <BC8A1672.D388%pkerrwall@psu.edu>
Message-ID: <Pine.LNX.4.21.0403271234390.18156-100000@mail.mrc-dunn.cam.ac.uk>

On Fri, 26 Mar 2004, Kerr Wall wrote:

> On 3/26/04 12:01 PM, "Dan Bolser <dmb at mrc-dunn.cam.ac.uk>" wrote:
> 
> >> In the default blast output, there are summary statistics for the overall
> >> hit, is there an option for the tab-deliminated BLAST output that would give
> >> us this overall hit statistic instead of one for each HSP?
> > 
> > 
> > I think you can simply sum the e-values for each non overlapping HSP (I
> > think they shouldn't overlap). Anybody know the correct formula?
> 
> I can handle non overlapping HSP's because I would only be parsing out the
> best evalue from each hit.  I'm just trying to avoid it if at all possible.
> I'm running a tblastx of ~ 1,000,000 cdna's against themselves to produce a
> similarity matrix.  Therefore, I'm more worried about the size of the output
> files and making sure that I don't run out of similarities between more
> distantly related genes that might get left out of the output when the
> maximum number of hits is reached (for some of the larger gene families).  I
> need to make sure the matrix is as symmetrical as possible.


Have you seen 

http://www.ebi.ac.uk/research/cgg/tribe/

and

http://micans.org/mcl/

?

They provide tools to make a symmetrical all V all similarity matrix (I
think it is an interface to blastall).


> >> If not, is there an option to limit the number of HSPs returned in the
> >> tab-deliminated output?
> > 
> > I am sure there is a way to do this, but I can't find any mention of this
> > option in the 
> > 
> > ncbi/doc/blast.txt
> 
> Yes, I know.  They don?t even discuss all of the options in that file.  You
> would think that the documentation for blast would be complete considering
> how long it has been around.

:)

Have you tried the man pages?

ncbi/doc/man/


> > Hmm.... Not sure if these have anything to do with it...
> > 
> > -K N (blastall, blastcl3, blastpgp)
> >      Number  of  best  hits from a region to keep (off by default, if
> >      used a value of 100 is recommended)
> > 
> > -P N (blastall, blastpgp, rpsblast)
> >      Set to  1  for  single-hit  mode  or  0  for  multiple-hit
> >      mode (default)
> > 
> > -b N (blastall, blastcl3, blastpgp, impala, megablast, rpsblast, seed-
> >     top)
> >      Number of database sequences to show alignments for (B) (default
> >      is 250)
> 
> Thanks.  Those are the parameters I've been working with so far.  I did find
> a paragraph in the documentation that might be on this same track.
> Specifically #4 in the section "Notes for 2.0.6 release":
> 
> 
> ############################################################################
> Notes for 2.0.6 release:
> 
> Enhancements:
> 
> ...
> 
> 4.) BLAST has been changed to reduce the number of redundant hits that a
> user may see.  This is acheived by keeping track of the number of hits
> completely contained in a certain region and eliminating those lower scoring
> hits that are redundant with others.  This behavior may be controlled with
> the -K and -L options:
> 
>   -K  Number of best hits from a region to keep [Integer]
>     default = 50
>   -L  Length of region used to judge hits [Integer]
>     default = 20
> 
> Setting -K to zero turns off this feature.  This is the default only on
> blastall.
> ############################################################################


Cheers.


> Of course, when you get a list of all the options 'blastall -', the L option
> is labeled as '-L  Location on query sequence [String]  Optional'.  Not sure
> what to make of that?  I wonder if they have changed parameter names from
> 2.0.6 to 2.2.8?

Tipical problem!

blast.1

-L start,stop (blastall, blastcl3, megablast, rpsblast)
   Location on query sequence (for rpsblast, only valid  in blastp mode)

blastclust.1

-L X   Length coverage threshold (default = 0.9)

?
 

> It looks as if setting K = 1 and using L > 100 (or much larger) would help
> me reduce the number of output.  I think also using P = 1 as you stated
> above would probably help out the most.
> 
> > If you get an answer from blast-help at ncbi.nlm.nih.gov can you please post
> > it up? (these emails get archived).
> 
> I will.  I sent them an email yesterday afternoon so I won't be expecting
> anything back until sometime next week.  I usually have solved the problem
> by the time they get back to me.


They are very buisy I guess. 

Best of luck!

Dan.


> 
> Thanks for the help,
> 
> Kerr
> 
> 
> > Cheers,
> > Dan.
> > 
> >> 
> >> Thanks,
> >> 
> >> Kerr Wall
> >> 
> >> _______________________________________________
> >> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> >> 
> > 
> > 
> > 
> > --__--__--
> > 
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > 
> > 
> > End of BiO_Bulletin_Board Digest
> > 
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From jeff at bioinformatics.org  Sun Mar 28 21:24:44 2004
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Sun, 28 Mar 2004 21:24:44 -0500
Subject: [BiO BB] Agenda for the 4th Annual Meeting of Bioinformatics.Org
Message-ID: <406788EC.5000700@bioinformatics.org>

Greetings.

The following is the agenda for the 4th Annual Meeting of 
Bioinformatics.Org, taking place this week at the Bio-IT World 
Conference + Expo in Boston, Mass.

Tuesday, March 30
-----------------
Bioinformatics.Org is co-organizing the Bioclusters Workshop 2004, which 
developed out of our very own Bioclusters mailing list:
10:00 am to 5:00 pm

We also have a booth (#523) in the exhibit hall, which opens briefly in 
the evening.  We will be giving away BioBrew Linux DVDs while they last 
(50 total).  Booth personnel include J.W. Bizzaro, Organization 
president, and Glen Otero, BioBrew Linux Brewmeister:
6:00 pm to 7:15 pm

Wednesday, March 31
-------------------
Bioinformatics.Org is presenting its 2004 Benjamin Franklin Award in 
Bioinformatics to Lincoln Stein of Cold Spring Harbor Laboratory. 
Lincoln will give a 20 minute talk after receiving the Award from J.W. 
Bizzaro:
8:30 am to 9:00 am

Stop by and see us at our booth (#523).  We will be giving away BioBrew 
Linux DVDs while they last.  Booth personnel include J.W. Bizzaro and 
Gene Ioffe, Organization treasurer:
10:00 am to 6:00 pm

Thursday, April 1
-----------------
Bioinformatics.Org will have banquet room #204 for the entire day.  We 
will try to have coffee and other things available as we can afford 
them.  At 3:00 pm (tentative), J.W. Bizzaro will give a talk about the 
Organization and future directions:
8:00 am to 5:00 pm

Stop by and see us at our booth (#523).  We will be giving away BioBrew 
Linux DVDs while they last:
10:00 am to 2:00 pm

Registration
------------
The free guest pass will give you access to everything 
Bioinformatics.Org is involved in, except for the Workshop.  Just print 
out this pass (PDF) and bring it to the registration counter:
http://bioinformatics.org/events/2004/guest_pass.pdf

Registration for the full expo and conference will give you access to 
other events (and will make Bio-IT World want to invite us back next year):
http://www.bioitworldexpo.com/


Cheers.
Jeff
-- 
J.W. Bizzaro                                jeff at bioinformatics.org
President, Bioinformatics.Org       http://bioinformatics.org/~jeff
"As we enjoy great advantages from the inventions of others, we
should be glad of an opportunity to serve others by any invention
of ours; and this we should do freely and generously."
                    -- Benjamin Franklin
--


From nidhi.jain at sbcglobal.net  Mon Mar 29 12:37:27 2004
From: nidhi.jain at sbcglobal.net (Nidhi Jain)
Date: Mon, 29 Mar 2004 09:37:27 -0800
Subject: [BiO BB] Numerical libraries
Message-ID: <HKENJBOKELAMNNGNIMDIMENMCAAA.nidhi.jain@sbcglobal.net>

Hi,

I am working with a software company which is building the platform for
bioinformatics. I was wondering if there are good, high performance
numerical libraries like LAPACK available on the desktop.

I will really appreciate sharing this knowledge with me.

Thanks
Nidhi


From B.A.T.Svensson at lumc.nl  Mon Mar 29 15:40:16 2004
From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG))
Date: Mon, 29 Mar 2004 22:40:16 +0200
Subject: [BiO BB] Numerical libraries
Message-ID: <D291F33C586C8E48B95C26F8C805513A01A3D9E1@mail5.lumc.nl>

A google search with "numerical libraries C high performance" 
gave the two first hits as:

Java: http://hoschek.home.cern.ch/hoschek/colt/
C/Fortran: http://www.nag.com/numeric/numerical_libraries.asp 


-----Original Message-----
From: Nidhi Jain
To: bio_bulletin_board at bioinformatics.org
Sent: 29-3-2004 19:37
Subject: [BiO BB] Numerical libraries

Hi,

I am working with a software company which is building the platform for
bioinformatics. I was wondering if there are good, high performance
numerical libraries like LAPACK available on the desktop.

I will really appreciate sharing this knowledge with me.

Thanks
Nidhi


From Austin.Tanney at arragen.com  Tue Mar 30 08:19:56 2004
From: Austin.Tanney at arragen.com (Austin Tanney)
Date: Tue, 30 Mar 2004 14:19:56 +0100
Subject: [BiO BB] Bay Area Bioinformatic Startup
Message-ID: <DE56FC0C3E54E846AE75AEAA106D8942012B8090@ni-cr-svc-ex1.pharms-services.com>

Hi Samantha

How are your plans going for this startup?

Austin


Dr. Austin Tanney
Senior Scientist
Arragen Ltd.

E-mail: Austin.Tanney at Arragen.com
Phone: +44 283839 5750
Fax: +44 283839 8676
Mobile: +44 7968 013939


-----Original Message-----
From: Samantha Austin [mailto:biotelerock at yahoo.com]
Sent: 22 March 2004 17:30
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] Bay Area Bioinformatic Startup


Hello Folks,
    I'm looking for bench trained, bay area molecular
biologists (PhD) who would be interested in starting
up a bioinformatic support company for life
scientists.  Skills should include Java/HTML/MySQL,
extensive hands on experience with molecular biology
techniques, and a bioentrepreneurial
 spirit.  This isn't a job offer, just a request for
folks interested in starting up something that might
grow into a great day job.

__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


This e-mail is from ArraGen Ltd

The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. 

Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. 

If you have received the e-mail in error please notify helpdesk at arragen.com or telephone +44 28 38 363841 and delete the e-mail from your system.

E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient.

Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free.

ArraGen Ltd. Registration Number NI 43067
Registered Address :  Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD


From rfsouza at citri.iq.usp.br  Fri Mar 26 12:48:18 2004
From: rfsouza at citri.iq.usp.br (Robson Francisco de Souza)
Date: Fri, 26 Mar 2004 14:48:18 -0300
Subject: [BiO BB] GI numbers
Message-ID: <20040326174818.GH23629@genoma4.iq.usp.br>

Hi,

I'm analyzing a set of sequences with regard to their classifications as
homologs from both COG and Kegg databases of orthologs. Although both
COG and Kegg provide tables relating gene names to GI (PID) numbers,
I'm, up to this moment, unable to map GIs from one dataset to the other,
in order to check classifications for genes in both catalogs.

GIs from COG appear to be from RefSeq and those from Kegg seem to be
from GenPept. How can I map GI numbers from Kegg to GI numbers from COG
database? Is there any query I can make to download such info for 185904
proteins in COG and their equivalents on Kegg Orthologs database?

Here is an example:

Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum 
pernix complete genome, as described in COG's table myva=gb. The same 
sequence is identified by GI 5103570 in Kegg. In this case, I was able map
COG's GI to Kegg's GI by using the gene identifier and annotation, a
procedure that is not easily automated.

How can I retrive equivalent IDs for the whole COG gene set?

Thanks in advance for any help.
Robson


From idh at poulet.org  Tue Mar 30 10:04:33 2004
From: idh at poulet.org (Yannick Wurm)
Date: Tue, 30 Mar 2004 10:04:33 -0500
Subject: [BiO BB] DNA Strider
Message-ID: <8D9C0586-825B-11D8-AF58-000393CAA04A@poulet.org>

Hi,
I'm a student in Bioinformatics and Modeling at a French engineering 
school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in 
my last year, I'm currently doing a six month internship in a C. 
elegans lab at McGill University in Montreal.
The lab's computer are Macs, and besides standard browsing, word 
processing and image processing, lab members also use them to aid them 
in their molecular biology work.
One of the programs they use is called DNA Strider. This piece of 
software has not been updated in a long time (probably since Apple's 
System 6.x - window sizes are fixed to the small old mac screen size!) 
and could require a face-lift.

In the lab, it is mainly used for managing and manipulating sequences 
of genes, primers and constructs. The main features of interest here 
are:
	- Sequence management
     - Graphical (circular or linear) restriction maps of a given 
sequence (or part of it), showing restriction site data concerning the 
part or whole sequence (for each enzyme, you get the number of 
restriction sites, and the obtained fragement sizes)
	- Reverse complementary sequence
	- Quick and simple alignment between two sequences

I've searched the web and could not find an all-in-one package that 
seemed as user friendly and coherent as DNA Strider. Individual web 
sites and software tools do offer these features, but
	- the internet is slow (you click and need to wait before getting your 
result)
	- having everything in one place is nice

Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be 
aiming to do what DNA Strider does, but is still very young (and 
closed-source, but thats a different debate).

http://www.mekentosj.com/ has some very nice tools as well, but they're 
very problem-specific.

Have I missed something? Is there a really cool java app or web 
software (that I could install locally for speed) that would replace 
DNA Strider? What does your molecular biology lab use in for it's day 
to day work?
Oh and buying something expensive is not a solution.

Thanks for any leads,

Yannick.

\\\\\\\\\\\\\\\\\\\
\\  http://yannick.poulet.org icq: 22044361
\\  idh at poulet.org  tel: ++33.6.16.41.71.92


From ryangolhar at hotmail.com  Tue Mar 30 18:30:36 2004
From: ryangolhar at hotmail.com (Ryan Golhar)
Date: Tue, 30 Mar 2004 18:30:36 -0500
Subject: [BiO BB] DNA Strider
In-Reply-To: <8D9C0586-825B-11D8-AF58-000393CAA04A@poulet.org>
Message-ID: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>

You know, I'm constantly finding different programs to perform different
tasks.  Either client applications, or web-based.  Some run on Linux,
others Windows.  

I would like to see 1 application for multiple platforms to performs dna
sequence analysis.  I started writing something in Java to do this but
haven't touched in awhile.

I'm wondering how many people would be interested in helping to develop
a  platform-independent application to perform all sorts of sequence
analysis - alignments, snp analysis, assembly, etc.  Sort of like GCG,
but free and actually user-friendly and useful.  If people are
interested, I think we should talk about a framework and start building
something as needed.  

Any comments?

-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam at umdnj.edu

-----Original Message-----
From: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of
Yannick Wurm
Sent: Tuesday, March 30, 2004 10:05 AM
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] DNA Strider


Hi,
I'm a student in Bioinformatics and Modeling at a French engineering 
school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in 
my last year, I'm currently doing a six month internship in a C. 
elegans lab at McGill University in Montreal.
The lab's computer are Macs, and besides standard browsing, word 
processing and image processing, lab members also use them to aid them 
in their molecular biology work.
One of the programs they use is called DNA Strider. This piece of 
software has not been updated in a long time (probably since Apple's 
System 6.x - window sizes are fixed to the small old mac screen size!) 
and could require a face-lift.

In the lab, it is mainly used for managing and manipulating sequences 
of genes, primers and constructs. The main features of interest here 
are:
	- Sequence management
     - Graphical (circular or linear) restriction maps of a given 
sequence (or part of it), showing restriction site data concerning the 
part or whole sequence (for each enzyme, you get the number of 
restriction sites, and the obtained fragement sizes)
	- Reverse complementary sequence
	- Quick and simple alignment between two sequences

I've searched the web and could not find an all-in-one package that 
seemed as user friendly and coherent as DNA Strider. Individual web 
sites and software tools do offer these features, but
	- the internet is slow (you click and need to wait before
getting your 
result)
	- having everything in one place is nice

Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be 
aiming to do what DNA Strider does, but is still very young (and 
closed-source, but thats a different debate).

http://www.mekentosj.com/ has some very nice tools as well, but they're 
very problem-specific.

Have I missed something? Is there a really cool java app or web 
software (that I could install locally for speed) that would replace 
DNA Strider? What does your molecular biology lab use in for it's day 
to day work?
Oh and buying something expensive is not a solution.

Thanks for any leads,

Yannick.

\\\\\\\\\\\\\\\\\\\
\\  http://yannick.poulet.org icq: 22044361
\\  idh at poulet.org  tel: ++33.6.16.41.71.92

_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From mgollery at unr.edu  Tue Mar 30 18:38:33 2004
From: mgollery at unr.edu (Martin Gollery)
Date: Tue, 30 Mar 2004 15:38:33 -0800
Subject: [BiO BB] DNA Strider
In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>
References: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>
Message-ID: <406A04F9.6030902@unr.edu>

Sounds like EMBOSS...

Ryan Golhar wrote:

> You know, I'm constantly finding different programs to perform different
> tasks.  Either client applications, or web-based.  Some run on Linux,
> others Windows.  
> 
> I would like to see 1 application for multiple platforms to performs dna
> sequence analysis.  I started writing something in Java to do this but
> haven't touched in awhile.
> 
> I'm wondering how many people would be interested in helping to develop
> a  platform-independent application to perform all sorts of sequence
> analysis - alignments, snp analysis, assembly, etc.  Sort of like GCG,
> but free and actually user-friendly and useful.  If people are
> interested, I think we should talk about a framework and start building
> something as needed.  
> 
> Any comments?
> 
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
> 
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam at umdnj.edu
> 
> -----Original Message-----
> From: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of
> Yannick Wurm
> Sent: Tuesday, March 30, 2004 10:05 AM
> To: bio_bulletin_board at bioinformatics.org
> Subject: [BiO BB] DNA Strider
> 
> 
> Hi,
> I'm a student in Bioinformatics and Modeling at a French engineering 
> school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in 
> my last year, I'm currently doing a six month internship in a C. 
> elegans lab at McGill University in Montreal.
> The lab's computer are Macs, and besides standard browsing, word 
> processing and image processing, lab members also use them to aid them 
> in their molecular biology work.
> One of the programs they use is called DNA Strider. This piece of 
> software has not been updated in a long time (probably since Apple's 
> System 6.x - window sizes are fixed to the small old mac screen size!) 
> and could require a face-lift.
> 
> In the lab, it is mainly used for managing and manipulating sequences 
> of genes, primers and constructs. The main features of interest here 
> are:
> 	- Sequence management
>      - Graphical (circular or linear) restriction maps of a given 
> sequence (or part of it), showing restriction site data concerning the 
> part or whole sequence (for each enzyme, you get the number of 
> restriction sites, and the obtained fragement sizes)
> 	- Reverse complementary sequence
> 	- Quick and simple alignment between two sequences
> 
> I've searched the web and could not find an all-in-one package that 
> seemed as user friendly and coherent as DNA Strider. Individual web 
> sites and software tools do offer these features, but
> 	- the internet is slow (you click and need to wait before
> getting your 
> result)
> 	- having everything in one place is nice
> 
> Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be 
> aiming to do what DNA Strider does, but is still very young (and 
> closed-source, but thats a different debate).
> 
> http://www.mekentosj.com/ has some very nice tools as well, but they're 
> very problem-specific.
> 
> Have I missed something? Is there a really cool java app or web 
> software (that I could install locally for speed) that would replace 
> DNA Strider? What does your molecular biology lab use in for it's day 
> to day work?
> Oh and buying something expensive is not a solution.
> 
> Thanks for any leads,
> 
> Yannick.
> 
> \\\\\\\\\\\\\\\\\\\
> \\  http://yannick.poulet.org icq: 22044361
> \\  idh at poulet.org  tel: ++33.6.16.41.71.92
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

-- 
Martin Gollery
Associate Director
Center For Bioinformatics
University of Nevada at Reno
Dept. of Biochemistry / MS330
775-784-7042
'Don't worry that the world will end today- it's already tomorrow in 
Australia...' -Charles Schulz


From idoerg at burnham.org  Tue Mar 30 18:49:38 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Tue, 30 Mar 2004 15:49:38 -0800
Subject: [BiO BB] DNA Strider
In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>
References: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>
Message-ID: <406A0792.3000407@burnham.org>

Hi Ryan,

EMBOSS works on Un*x, Linux and FreeBSD machines, and on Mac OS-X, so 
that covers everything but Windows.

http://emboss.org

Apparently you can get full functionality using Cygwin in Windows, or

there is an EMBOSS for Windows project going on,

http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html

Seems to be alive, their latest update is from 02/04. They claim to have 
158 programs from the original EMBOSS suite implemented.

JEMBOSS is the Java-based point-and-click interface, works on Linux, Mac 
OS-X, AND Windows.

Cheers,

Iddo


Ryan Golhar wrote:
> You know, I'm constantly finding different programs to perform different
> tasks.  Either client applications, or web-based.  Some run on Linux,
> others Windows.  
> 
> I would like to see 1 application for multiple platforms to performs dna
> sequence analysis.  I started writing something in Java to do this but
> haven't touched in awhile.
> 
> I'm wondering how many people would be interested in helping to develop
> a  platform-independent application to perform all sorts of sequence
> analysis - alignments, snp analysis, assembly, etc.  Sort of like GCG,
> but free and actually user-friendly and useful.  If people are
> interested, I think we should talk about a framework and start building
> something as needed.  
> 
> Any comments?
> 
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
> 
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam at umdnj.edu
> 
> -----Original Message-----
> From: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of
> Yannick Wurm
> Sent: Tuesday, March 30, 2004 10:05 AM
> To: bio_bulletin_board at bioinformatics.org
> Subject: [BiO BB] DNA Strider
> 
> 
> Hi,
> I'm a student in Bioinformatics and Modeling at a French engineering 
> school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in 
> my last year, I'm currently doing a six month internship in a C. 
> elegans lab at McGill University in Montreal.
> The lab's computer are Macs, and besides standard browsing, word 
> processing and image processing, lab members also use them to aid them 
> in their molecular biology work.
> One of the programs they use is called DNA Strider. This piece of 
> software has not been updated in a long time (probably since Apple's 
> System 6.x - window sizes are fixed to the small old mac screen size!) 
> and could require a face-lift.
> 
> In the lab, it is mainly used for managing and manipulating sequences 
> of genes, primers and constructs. The main features of interest here 
> are:
> 	- Sequence management
>      - Graphical (circular or linear) restriction maps of a given 
> sequence (or part of it), showing restriction site data concerning the 
> part or whole sequence (for each enzyme, you get the number of 
> restriction sites, and the obtained fragement sizes)
> 	- Reverse complementary sequence
> 	- Quick and simple alignment between two sequences
> 
> I've searched the web and could not find an all-in-one package that 
> seemed as user friendly and coherent as DNA Strider. Individual web 
> sites and software tools do offer these features, but
> 	- the internet is slow (you click and need to wait before
> getting your 
> result)
> 	- having everything in one place is nice
> 
> Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be 
> aiming to do what DNA Strider does, but is still very young (and 
> closed-source, but thats a different debate).
> 
> http://www.mekentosj.com/ has some very nice tools as well, but they're 
> very problem-specific.
> 
> Have I missed something? Is there a really cool java app or web 
> software (that I could install locally for speed) that would replace 
> DNA Strider? What does your molecular biology lab use in for it's day 
> to day work?
> Oh and buying something expensive is not a solution.
> 
> Thanks for any leads,
> 
> Yannick.
> 
> \\\\\\\\\\\\\\\\\\\
> \\  http://yannick.poulet.org icq: 22044361
> \\  idh at poulet.org  tel: ++33.6.16.41.71.92
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
> 

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo


From gary at www.bioinformatics.org  Tue Mar 30 19:14:25 2004
From: gary at www.bioinformatics.org (Gary Van Domselaar)
Date: Tue, 30 Mar 2004 19:14:25 -0500 (EST)
Subject: [BiO BB] DNA Strider
In-Reply-To: <406A0792.3000407@burnham.org>
Message-ID: <Pine.LNX.4.44.0403301901260.14221-100000@www.bioinformatics.org>

Hey Gang,

For platform-independant sequence _manipulation_ there is the sequence 
manipulation suite:

http://bioinformatics.org/sms2

SMS2 and EMBOSS together cover most of what you can expect to get out of 
an old sequence analysis package like DNA strider.

Decidedly,

g.


On Tue, 30 Mar 2004, Iddo Friedberg wrote:

> Hi Ryan,
> 
> EMBOSS works on Un*x, Linux and FreeBSD machines, and on Mac OS-X, so 
> that covers everything but Windows.
> 
> http://emboss.org
> 
> Apparently you can get full functionality using Cygwin in Windows, or
> 
> there is an EMBOSS for Windows project going on,
> 
> http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html
> 
> Seems to be alive, their latest update is from 02/04. They claim to have 
> 158 programs from the original EMBOSS suite implemented.
> 
> JEMBOSS is the Java-based point-and-click interface, works on Linux, Mac 
> OS-X, AND Windows.
> 
> Cheers,
> 
> Iddo
> 
> 
> Ryan Golhar wrote:
> > You know, I'm constantly finding different programs to perform different
> > tasks.  Either client applications, or web-based.  Some run on Linux,
> > others Windows.  
> > 
> > I would like to see 1 application for multiple platforms to performs dna
> > sequence analysis.  I started writing something in Java to do this but
> > haven't touched in awhile.
> > 
> > I'm wondering how many people would be interested in helping to develop
> > a  platform-independent application to perform all sorts of sequence
> > analysis - alignments, snp analysis, assembly, etc.  Sort of like GCG,
> > but free and actually user-friendly and useful.  If people are
> > interested, I think we should talk about a framework and start building
> > something as needed.  
> > 
> > Any comments?
> > 
> > -----
> > Ryan Golhar
> > Computational Biologist
> > The Informatics Institute at
> > The University of Medicine & Dentistry of NJ
> > 
> > Phone: 973-972-5034
> > Fax: 973-972-7412
> > Email: golharam at umdnj.edu
> > 
> > -----Original Message-----
> > From: bio_bulletin_board-admin at bioinformatics.org
> > [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of
> > Yannick Wurm
> > Sent: Tuesday, March 30, 2004 10:05 AM
> > To: bio_bulletin_board at bioinformatics.org
> > Subject: [BiO BB] DNA Strider
> > 
> > 
> > Hi,
> > I'm a student in Bioinformatics and Modeling at a French engineering 
> > school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in 
> > my last year, I'm currently doing a six month internship in a C. 
> > elegans lab at McGill University in Montreal.
> > The lab's computer are Macs, and besides standard browsing, word 
> > processing and image processing, lab members also use them to aid them 
> > in their molecular biology work.
> > One of the programs they use is called DNA Strider. This piece of 
> > software has not been updated in a long time (probably since Apple's 
> > System 6.x - window sizes are fixed to the small old mac screen size!) 
> > and could require a face-lift.
> > 
> > In the lab, it is mainly used for managing and manipulating sequences 
> > of genes, primers and constructs. The main features of interest here 
> > are:
> > 	- Sequence management
> >      - Graphical (circular or linear) restriction maps of a given 
> > sequence (or part of it), showing restriction site data concerning the 
> > part or whole sequence (for each enzyme, you get the number of 
> > restriction sites, and the obtained fragement sizes)
> > 	- Reverse complementary sequence
> > 	- Quick and simple alignment between two sequences
> > 
> > I've searched the web and could not find an all-in-one package that 
> > seemed as user friendly and coherent as DNA Strider. Individual web 
> > sites and software tools do offer these features, but
> > 	- the internet is slow (you click and need to wait before
> > getting your 
> > result)
> > 	- having everything in one place is nice
> > 
> > Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be 
> > aiming to do what DNA Strider does, but is still very young (and 
> > closed-source, but thats a different debate).
> > 
> > http://www.mekentosj.com/ has some very nice tools as well, but they're 
> > very problem-specific.
> > 
> > Have I missed something? Is there a really cool java app or web 
> > software (that I could install locally for speed) that would replace 
> > DNA Strider? What does your molecular biology lab use in for it's day 
> > to day work?
> > Oh and buying something expensive is not a solution.
> > 
> > Thanks for any leads,
> > 
> > Yannick.
> > 
> > \\\\\\\\\\\\\\\\\\\
> > \\  http://yannick.poulet.org icq: 22044361
> > \\  idh at poulet.org  tel: ++33.6.16.41.71.92
> > 
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > _______________________________________________
> > BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> > 
> > 
> 
> 


From idh at poulet.org  Tue Mar 30 19:47:35 2004
From: idh at poulet.org (Yannick Wurm)
Date: Tue, 30 Mar 2004 19:47:35 -0500
Subject: [BiO BB] DNA Strider
In-Reply-To: <Pine.LNX.4.44.0403301901260.14221-100000@www.bioinformatics.org>
References: <Pine.LNX.4.44.0403301901260.14221-100000@www.bioinformatics.org>
Message-ID: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org>

Hi,
Thanks for all the feedback!
I have checked out both sms2 and emboss, and both seem very powerfull. 
 From what I gather though, both only seem to output text!
For restriction enzyme cleavage sites, getting an overview with a 
schema such as this one can be a great help though: 
http://bcr.musc.edu/images/dnastrider.gif

I know I won't be able to replace DNA Strider before I find something 
that makes nice visual maps as well...

Did I miss something?

yannick.

On 30-Mar-04, at 7:14 PM, Gary Van Domselaar wrote:
> Hey Gang,
>
> For platform-independant sequence _manipulation_ there is the sequence
> manipulation suite:
>
> http://bioinformatics.org/sms2
>
> SMS2 and EMBOSS together cover most of what you can expect to get out 
> of
> an old sequence analysis package like DNA strider.
>
> Decidedly,
>
> g.
\\\\\\\\\\\\\\\\\\\
\\  http://yannick.poulet.org icq: 22044361
\\  idh at poulet.org  tel: ++33.6.16.41.71.92


From idoerg at burnham.org  Tue Mar 30 20:19:11 2004
From: idoerg at burnham.org (Iddo Friedberg)
Date: Tue, 30 Mar 2004 17:19:11 -0800
Subject: [BiO BB] DNA Strider
In-Reply-To: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org>
References: <Pine.LNX.4.44.0403301901260.14221-100000@www.bioinformatics.org> <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org>
Message-ID: <406A1C8F.2080701@burnham.org>


Yannick Wurm wrote:
> Hi,
>
> 
> I know I won't be able to replace DNA Strider before I find something 
> that makes nice visual maps as well...
> 
> Did I miss something?
> 
Yes. EMBOSS does have graphics. Which commands were you looking at?


./I


-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo


From pagarwal at linus.ornl.gov  Tue Mar 30 20:20:23 2004
From: pagarwal at linus.ornl.gov (Pratul K. Agarwal)
Date: Tue, 30 Mar 2004 20:20:23 -0500 (EST)
Subject: [BiO BB] Announcing live CD for bio/chemical modeling
Message-ID: <Pine.LNX.4.44.0403302020100.8145-100000@linus.ornl.gov>

http://www.vigyaancd.org/

Vigyaan is an electronic workbench for computational biology
and computational chemistry. It has been designed to meet
the needs of both beginners and experts. VigyaanCD is a
Linux-live CD containing all the required software to boot
the computer with ready to use modeling software.


From gary at www.bioinformatics.org  Tue Mar 30 23:34:05 2004
From: gary at www.bioinformatics.org (Gary Van Domselaar)
Date: Tue, 30 Mar 2004 23:34:05 -0500 (EST)
Subject: [BiO BB] DNA Strider
In-Reply-To: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org>
Message-ID: <Pine.LNX.4.44.0403302331350.23551-100000@www.bioinformatics.org>

Hi Yannick,

The guy who made sms2 also makes a map viewer called CGView.  It has been 
implemented in the web server 'PlasMapper':
http://wishart.biology.ualberta.ca/PlasMapper/index.html

You can contact him to see about the availability of CGView itself:
stothard at ualberta.ca

Regards,

g.

On Tue, 30 Mar 2004, Yannick Wurm wrote:

> Hi,
> Thanks for all the feedback!
> I have checked out both sms2 and emboss, and both seem very powerfull. 
>  From what I gather though, both only seem to output text!
> For restriction enzyme cleavage sites, getting an overview with a 
> schema such as this one can be a great help though: 
> http://bcr.musc.edu/images/dnastrider.gif
> 
> I know I won't be able to replace DNA Strider before I find something 
> that makes nice visual maps as well...
> 
> Did I miss something?
> 
> yannick.
> 
> On 30-Mar-04, at 7:14 PM, Gary Van Domselaar wrote:
> > Hey Gang,
> >
> > For platform-independant sequence _manipulation_ there is the sequence
> > manipulation suite:
> >
> > http://bioinformatics.org/sms2
> >
> > SMS2 and EMBOSS together cover most of what you can expect to get out 
> > of
> > an old sequence analysis package like DNA strider.
> >
> > Decidedly,
> >
> > g.
> \\\\\\\\\\\\\\\\\\\
> \\  http://yannick.poulet.org icq: 22044361
> \\  idh at poulet.org  tel: ++33.6.16.41.71.92
> 
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 


From stefanielager at fastmail.ca  Wed Mar 31 00:11:57 2004
From: stefanielager at fastmail.ca (Stefanie Lager)
Date: Wed, 31 Mar 2004 05:11:57 +0000 (UTC)
Subject: [BiO BB] DNA Strider
In-Reply-To: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org>
Message-ID: <20040331051157.37FDF863566@mail.interchange.ca>

The TACG program does restriction cleavage really nice, and complements
EMBOSS for some functions missing there http://tacg.sourceforge.net/ .
For plasmid maps I think you would have to buy a commercial software
package.

Stefanie

> Hi,
> Thanks for all the feedback!
> I have checked out both sms2 and emboss, and both seem very powerfull.
> From what I gather though, both only seem to output text!
> For restriction enzyme cleavage sites, getting an overview with a
> schema such as this one can be a great help though:
> http://bcr.musc.edu/images/dnastrider.gif
> 
> I know I won't be able to replace DNA Strider before I find something
> that makes nice visual maps as well...
> 
> Did I miss something?
> 
> yannick.
> 
> On 30-Mar-04, at 7:14 PM, Gary Van Domselaar wrote:
>> Hey Gang,
>> 
>> For platform-independant sequence _manipulation_ there is the
>> sequence manipulation suite:
>> 
>> http://bioinformatics.org/sms2
>> 
>> SMS2 and EMBOSS together cover most of what you can expect to get out
>> of
>> an old sequence analysis package like DNA strider.
>> 
>> Decidedly,
>> 
>> g.
> \\\\\\\\\\\\\\\\\\\
> \\  http://yannick.poulet.org icq: 22044361
> \\  idh at poulet.org  tel: ++33.6.16.41.71.92
> 
_________________________________________________________________
    http://fastmail.ca/ - Fast Secure Web Email for Canadians


From idh at poulet.org  Wed Mar 31 00:22:19 2004
From: idh at poulet.org (Yannick Wurm)
Date: Wed, 31 Mar 2004 00:22:19 -0500
Subject: [BiO BB] DNA Strider
In-Reply-To: <20040331051157.37FDF863566@mail.interchange.ca>
References: <20040331051157.37FDF863566@mail.interchange.ca>
Message-ID: <61C5E606-82D3-11D8-AF58-000393CAA04A@poulet.org>

Thanks so much!
the generated maps look great!
Maybe I will be able to convince my biologists to trash DNA Strider 
after all :)

Cheers,
Yannick.

On 31-Mar-04, at 12:11 AM, Stefanie Lager wrote:

> The TACG program does restriction cleavage really nice, and complements
> EMBOSS for some functions missing there http://tacg.sourceforge.net/ .
> For plasmid maps I think you would have to buy a commercial software
> package.
>
> Stefanie

and

On 30-Mar-04, at 11:34 PM, Gary Van Domselaar wrote:

> Hi Yannick,
>
> The guy who made sms2 also makes a map viewer called CGView.  It has 
> been
> implemented in the web server 'PlasMapper':
> http://wishart.biology.ualberta.ca/PlasMapper/index.html
>
> You can contact him to see about the availability of CGView itself:
> stothard at ualberta.ca
>
> Regards,
>
> g.

\\\\\\\\\\\\\\\\\\\
\\  http://yannick.poulet.org icq: 22044361
\\  idh at poulet.org  tel: ++33.6.16.41.71.92


From stefanielager at fastmail.ca  Wed Mar 31 05:44:26 2004
From: stefanielager at fastmail.ca (Stefanie Lager)
Date: Wed, 31 Mar 2004 10:44:26 +0000 (UTC)
Subject: [BiO BB] GI numbers
In-Reply-To: <20040326174818.GH23629@genoma4.iq.usp.br>
Message-ID: <20040331104426.AB5DF86258A@mail.interchange.ca>

Try linking them through LocusLink, either using one of the mapping
tables found at: ftp://ftp.ncbi.nih.gov/refseq/LocusLink/ or (a bit more
complicaated) using a system like OpenBNS: http://openbns.sourceforge.net/

Stefanie 

  
> Hi,
> 
> I'm analyzing a set of sequences with regard to their classifications
> as homologs from both COG and Kegg databases of orthologs. Although
> both COG and Kegg provide tables relating gene names to GI (PID)
> numbers, I'm, up to this moment, unable to map GIs from one dataset to
> the other, in order to check classifications for genes in both
> catalogs.
> 
> GIs from COG appear to be from RefSeq and those from Kegg seem to be
> from GenPept. How can I map GI numbers from Kegg to GI numbers from
> COG database? Is there any query I can make to download such info for
> 185904 proteins in COG and their equivalents on Kegg Orthologs
> database?
> 
> Here is an example:
> 
> Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum
> pernix complete genome, as described in COG's table myva=gb. The same
> sequence is identified by GI 5103570 in Kegg. In this case, I was able
> map COG's GI to Kegg's GI by using the gene identifier and annotation,
> a procedure that is not easily automated.
> 
> How can I retrive equivalent IDs for the whole COG gene set?
> 
> Thanks in advance for any help.
> Robson
_________________________________________________________________
    http://fastmail.ca/ - Fast Secure Web Email for Canadians


From ml at mb.au.dk  Wed Mar 31 06:38:20 2004
From: ml at mb.au.dk (Martin Luetzelberger)
Date: Wed, 31 Mar 2004 13:38:20 +0200 (CEST)
Subject: [BiO BB] Seqio and fmtseq
In-Reply-To: <20040331112503.B6D20D1F06@www.bioinformatics.org>
References: <20040331112503.B6D20D1F06@www.bioinformatics.org>
Message-ID: <Pine.LNX.4.53.0403311330420.7455@dna-47-147.mb.au.dk>

Hi,

I've seen recently a message no this board about James Knight's
Seqio package. There seem to be some problems compiling it under
linux with gcc-3.2. Has anybody solved this problem, yet?
Is there an official website where the package is maintained?

Martin


From micheld at mshri.on.ca  Wed Mar 31 10:40:37 2004
From: micheld at mshri.on.ca (Michel Dumontier)
Date: Wed, 31 Mar 2004 10:40:37 -0500
Subject: [BiO BB] GI numbers
References: <20040326174818.GH23629@genoma4.iq.usp.br>
Message-ID: <002601c41736$849a3cf0$6400000a@moose>

Hi Robson,

  Since 14600509 and 5103570 are identifiers for identical sequences but
from different sources, they can be found in the same definition line in the
non-redundant fasta file that NCBI provides on it's FTP site (as a BLAST
database distribution - nr.gz).

This file and each definition line entry has been imported into Seqhound
(http://seqhound.mshri.on.ca), and is searchable under the redundant group
module with a variety of programming interfaces (C/C++/Perl/Java).

-=Michel=-


----- Original Message ----- 
From: "Robson Francisco de Souza" <rfsouza at citri.iq.usp.br>
To: <info at ncbi.nlm.nih.gov>
Cc: <bio_bulletin_board at bioinformatics.org>
Sent: Friday, March 26, 2004 12:48 PM
Subject: [BiO BB] GI numbers


> Hi,
>
> I'm analyzing a set of sequences with regard to their classifications as
> homologs from both COG and Kegg databases of orthologs. Although both
> COG and Kegg provide tables relating gene names to GI (PID) numbers,
> I'm, up to this moment, unable to map GIs from one dataset to the other,
> in order to check classifications for genes in both catalogs.
>
> GIs from COG appear to be from RefSeq and those from Kegg seem to be
> from GenPept. How can I map GI numbers from Kegg to GI numbers from COG
> database? Is there any query I can make to download such info for 185904
> proteins in COG and their equivalents on Kegg Orthologs database?
>
> Here is an example:
>
> Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum
> pernix complete genome, as described in COG's table myva=gb. The same
> sequence is identified by GI 5103570 in Kegg. In this case, I was able map
> COG's GI to Kegg's GI by using the gene identifier and annotation, a
> procedure that is not easily automated.
>
> How can I retrive equivalent IDs for the whole COG gene set?
>
> Thanks in advance for any help.
> Robson
>


From pculpep at hotmail.com  Wed Mar 31 11:44:06 2004
From: pculpep at hotmail.com (Pamela Culpepper)
Date: Wed, 31 Mar 2004 16:44:06 +0000
Subject: [BiO BB] GI numbers
Message-ID: <BAY9-F40Wd7KKQpQq5p0005cde7@hotmail.com>

A package, Integrated Genomic Data System,  has been submitted to the Baylor 
College of Medicine Office of Technology .  A description of the system is 
as follows:

Integrated Genomics Data System (IGDS)

OVERVIEW

The Integrated Genomics Data System (IGDS) integrates data from multiple 
publicly available genomic databases into a relational database format.

The core of the IGDS system is a C/C++ program that data mines National 
Center for Biological Information (NCBI) binary ASN1 files for sequence 
data.  This data is integrated by means of Perl scripts with data from Locus 
Link, UniGene, Gene Ontology Association, Protein Data Bank, and other 
sites.   The resulting information is uploaded by a Java program into a 
relational database defined by the Integrated Genomics Database System 
schema.

FEATURES

The IGDS is a computational tool for data gathering and interpretation of 
genomic data, which saves time and reduces repetition of rote processes.

A methodology using tested relationships among various pieces of data in 
different files reduces the necessity for the accrual and processing of 
massive amounts of data.

Data mining tactic based on the NCBI toolkit, thereby utilizing code that 
has been approved for interpretation of NCBI ANS1.  Native representation of 
pertinent data elements is maintained as are the nesting levels inherent in 
the ASN1 structure.

Data download and interpretation is performed on the most compact 
representation of NCBI data - ASN1 binary files.

Less computer disk space is required to store data files when data mining 
processes are invoked.

Processing of NCBI data in binary format provides optimal computer 
performance with quick results.

Configurable interface affords various levels of processing granularity.  
Processing may be allocated among many processes on one computer or across 
several computers.

Final relational representation of genomic data provides dynamic inference 
not possible with flat file or ASN1 data representation

The system is fully configurable and will download and interpret the entire 
NCBI ASN1 sequence library or a few select sequence sets.

A separate series of Perl scripts cross-references the NCBI Locus 
Link/Unigene libraries providing Accession and GI Number, Gene names, Aliase 
Gene Names, Preferred Gene Names, Clone, Lib, UniGene Id, Tissue, Vector,  
Organ, Cyto_Genetic_Loc, and relevant Gene Ontology information such as GO 
Id, catagories, etc.  This data can be merged with the ASN1 data to create a 
a fully integrated DB system of genomic information.

A Rational Rose UML Data Model is provided as well as relevant SQL tables.

SYSTEM REQUIREMENTS
C/C++, Perl, a compiled version of the NCBI toolkit, and a relational 
database management system.

Contact information for the Baylor College of Medicine Office of Technology 
is as follows --

LarryHope
Baylor College of Medicine
Office of Technology Administration (i.e. Baylor Licensing)
One Baylor Plaza
Mail Stop:  BCM210 600D
Houston, TX 77030
P (713) 798-6821
F (713) 798-1252
lhope at bcm.tmc.edu
http://research.bcm.tmc.edu/OTA/index.htm

Sincerely,

Pam Culpepper

>From: "Stefanie Lager" <stefanielager at fastmail.ca>
>Reply-To: bio_bulletin_board at bioinformatics.org
>To: bio_bulletin_board at bioinformatics.org
>Subject: Re: [BiO BB] GI numbers
>Date: Wed, 31 Mar 2004 10:44:26 +0000 (UTC)
>
>Try linking them through LocusLink, either using one of the mapping
>tables found at: ftp://ftp.ncbi.nih.gov/refseq/LocusLink/ or (a bit more
>complicaated) using a system like OpenBNS: http://openbns.sourceforge.net/
>
>Stefanie
>
>
> > Hi,
> >
> > I'm analyzing a set of sequences with regard to their classifications
> > as homologs from both COG and Kegg databases of orthologs. Although
> > both COG and Kegg provide tables relating gene names to GI (PID)
> > numbers, I'm, up to this moment, unable to map GIs from one dataset to
> > the other, in order to check classifications for genes in both
> > catalogs.
> >
> > GIs from COG appear to be from RefSeq and those from Kegg seem to be
> > from GenPept. How can I map GI numbers from Kegg to GI numbers from
> > COG database? Is there any query I can make to download such info for
> > 185904 proteins in COG and their equivalents on Kegg Orthologs
> > database?
> >
> > Here is an example:
> >
> > Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum
> > pernix complete genome, as described in COG's table myva=gb. The same
> > sequence is identified by GI 5103570 in Kegg. In this case, I was able
> > map COG's GI to Kegg's GI by using the gene identifier and annotation,
> > a procedure that is not easily automated.
> >
> > How can I retrive equivalent IDs for the whole COG gene set?
> >
> > Thanks in advance for any help.
> > Robson
>_________________________________________________________________
>     http://fastmail.ca/ - Fast Secure Web Email for Canadians
>_______________________________________________
>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
All the action. All the drama. Get NCAA hoops coverage at MSN Sports by 
ESPN. http://msn.espn.go.com/index.html?partnersite=espn


From ulimard at yahoo.com.br  Wed Mar 31 12:50:41 2004
From: ulimard at yahoo.com.br (Ulisses)
Date: Wed, 31 Mar 2004 14:50:41 -0300
Subject: [BiO BB] Distributed System and Bioinformatic
References: <D291F33C586C8E48B95C26F8C805513A01A3D9E1@mail5.lumc.nl>
Message-ID: <00d001c41748$af3ed030$8902100a@clonline>

I'm graduating in computer science and I'm writing a paper about distributed
system, but I would like to join with some concepts of bioinformatic. So,
I'd like to receive some links about any works in this area.

Grato pela aten??o.
         Ulisses Dias


From boris.steipe at utoronto.ca  Wed Mar 31 17:43:11 2004
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Wed, 31 Mar 2004 17:43:11 -0500
Subject: [BiO BB] Distributed System and Bioinformatic
In-Reply-To: <00d001c41748$af3ed030$8902100a@clonline>
Message-ID: <CA17AE56-8364-11D8-B9FA-000A9577512E@utoronto.ca>

Google for "mygrid".


best,


Boris

On Wednesday, Mar 31, 2004, at 12:50 Canada/Eastern, Ulisses wrote:

> I'm graduating in computer science and I'm writing a paper about 
> distributed
> system, but I would like to join with some concepts of bioinformatic. 
> So,
> I'd like to receive some links about any works in this area.
>
> Grato pela aten??o.
>          Ulisses Dias
>
>
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


From operon at www.ufsm.br  Tue Mar 30 19:21:28 2004
From: operon at www.ufsm.br (Marcos Oliveira de Carvalho)
Date: Tue, 30 Mar 2004 21:21:28 -0300
Subject: [BiO BB] DNA Strider
In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>
References: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>
Message-ID: <opr5o9x2smx72l2q@coral.ufsm.br>

It would be nice to have a bioinformatics IDRE ( Integrated Development 
and Research Environment) with some basic features (sequence 
visualization, editing, annotation, data management, etc...) and a well 
designed plug-in API, for easy extension. An idea could be build it on top 
of the NetBeans platform , and with java we get platform-independent 
software. (with other languages too, but java do the job)
One can suggest add bindings with python/perl/ruby/java and theirs Bio* 
libraries, interfaces  to bioinformatics tools, including output capture 
(emboss, phred, phrap, BLAST, FASTA, Clustal,R/bioconductor, Mummer,...) . 
Also interfaces to webservices and data retrieving from major databases. 
Well, there are lots of possibilities.

cheers
Marcos


On Tue, 30 Mar 2004 18:30:36 -0500, Ryan Golhar <ryangolhar at hotmail.com> 
wrote:

> You know, I'm constantly finding different programs to perform different
> tasks.  Either client applications, or web-based.  Some run on Linux,
> others Windows.
>
> I would like to see 1 application for multiple platforms to performs dna
> sequence analysis.  I started writing something in Java to do this but
> haven't touched in awhile.
>
> I'm wondering how many people would be interested in helping to develop
> a  platform-independent application to perform all sorts of sequence
> analysis - alignments, snp analysis, assembly, etc.  Sort of like GCG,
> but free and actually user-friendly and useful.  If people are
> interested, I think we should talk about a framework and start building
> something as needed.
>
> Any comments?
>
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
>
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam at umdnj.edu
>
> -----Original Message-----
> From: bio_bulletin_board-admin at bioinformatics.org
> [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of
> Yannick Wurm
> Sent: Tuesday, March 30, 2004 10:05 AM
> To: bio_bulletin_board at bioinformatics.org
> Subject: [BiO BB] DNA Strider
>
>
> Hi,
> I'm a student in Bioinformatics and Modeling at a French engineering
> school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in
> my last year, I'm currently doing a six month internship in a C.
> elegans lab at McGill University in Montreal.
> The lab's computer are Macs, and besides standard browsing, word
> processing and image processing, lab members also use them to aid them
> in their molecular biology work.
> One of the programs they use is called DNA Strider. This piece of
> software has not been updated in a long time (probably since Apple's
> System 6.x - window sizes are fixed to the small old mac screen size!)
> and could require a face-lift.
>
> In the lab, it is mainly used for managing and manipulating sequences
> of genes, primers and constructs. The main features of interest here
> are:
> 	- Sequence management
>      - Graphical (circular or linear) restriction maps of a given
> sequence (or part of it), showing restriction site data concerning the
> part or whole sequence (for each enzyme, you get the number of
> restriction sites, and the obtained fragement sizes)
> 	- Reverse complementary sequence
> 	- Quick and simple alignment between two sequences
>
> I've searched the web and could not find an all-in-one package that
> seemed as user friendly and coherent as DNA Strider. Individual web
> sites and software tools do offer these features, but
> 	- the internet is slow (you click and need to wait before
> getting your
> result)
> 	- having everything in one place is nice
>
> Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be
> aiming to do what DNA Strider does, but is still very young (and
> closed-source, but thats a different debate).
>
> http://www.mekentosj.com/ has some very nice tools as well, but they're
> very problem-specific.
>
> Have I missed something? Is there a really cool java app or web
> software (that I could install locally for speed) that would replace
> DNA Strider? What does your molecular biology lab use in for it's day
> to day work?
> Oh and buying something expensive is not a solution.
>
> Thanks for any leads,
>
> Yannick.
>
> \\\\\\\\\\\\\\\\\\\
> \\  http://yannick.poulet.org icq: 22044361
> \\  idh at poulet.org  tel: ++33.6.16.41.71.92
>
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


-- 
"Science knows no country, because knowledge belongs to humanity, and is 
the torch which illuminates the world. "
Louis Pasteur


From anthony.boureux at crbm.cnrs-mop.fr  Wed Mar 31 03:46:37 2004
From: anthony.boureux at crbm.cnrs-mop.fr (Anthony Boureux)
Date: Wed, 31 Mar 2004 10:46:37 +0200
Subject: [BiO BB] DNA Strider
In-Reply-To: <61C5E606-82D3-11D8-AF58-000393CAA04A@poulet.org>
References: <20040331051157.37FDF863566@mail.interchange.ca> <61C5E606-82D3-11D8-AF58-000393CAA04A@poulet.org>
Message-ID: <406A856D.3090604@crbm.cnrs-mop.fr>

If you want you can always use DNAStrider in MacOSX, there is a new 
version (1.4). It is a carbon version, so it is running natively under 
MacOSX. To get it, contact the author (e-mail in about box, he is 
working in CEA Saclay, France).
Anthony

Yannick Wurm a ?crit :
> Thanks so much!
> the generated maps look great!
> Maybe I will be able to convince my biologists to trash DNA Strider 
> after all :)
> 
> Cheers,
> Yannick.
> 
> On 31-Mar-04, at 12:11 AM, Stefanie Lager wrote:
> 
>> The TACG program does restriction cleavage really nice, and complements
>> EMBOSS for some functions missing there http://tacg.sourceforge.net/ .
>> For plasmid maps I think you would have to buy a commercial software
>> package.
>>
>> Stefanie

-- 

Anthony Boureux
Tyrosine kinase Lab.                      CRBM, CNRS FRE2593
1919, route de Mende                     34293 Montpellier Cedex 5
+33 (0)467613373                           Anthony.Boureux at crbm.cnrs-mop.fr

-- 
passerelle antivirus du campus CNRS de Montpellier
-- 


From lj at bio-code.com  Wed Mar 31 16:23:02 2004
From: lj at bio-code.com (LJ)
Date: Wed, 31 Mar 2004 13:23:02 -0800
Subject: [BiO BB] DNA Strider
In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1>
Message-ID: <BJEGKHOJKHPLDCCBKJMFCELJDAAA.lj@bio-code.com>

i like this idea quite a bit. i know several programmers/bioinformaticists
who might want to help out. they are really impressed with python these days

lon james
postgres, inc
1 fair oaks
san francisco, ca 94110
415-573-9192
lon at efcodd.org

-----Original Message-----
From: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of Ryan
Golhar
Sent: Tuesday, March 30, 2004 3:31 PM
To: bio_bulletin_board at bioinformatics.org
Subject: RE: [BiO BB] DNA Strider


You know, I'm constantly finding different programs to perform different
tasks.  Either client applications, or web-based.  Some run on Linux,
others Windows.

I would like to see 1 application for multiple platforms to performs dna
sequence analysis.  I started writing something in Java to do this but
haven't touched in awhile.

I'm wondering how many people would be interested in helping to develop
a  platform-independent application to perform all sorts of sequence
analysis - alignments, snp analysis, assembly, etc.  Sort of like GCG,
but free and actually user-friendly and useful.  If people are
interested, I think we should talk about a framework and start building
something as needed.

Any comments?

-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam at umdnj.edu

-----Original Message-----
From: bio_bulletin_board-admin at bioinformatics.org
[mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of
Yannick Wurm
Sent: Tuesday, March 30, 2004 10:05 AM
To: bio_bulletin_board at bioinformatics.org
Subject: [BiO BB] DNA Strider


Hi,
I'm a student in Bioinformatics and Modeling at a French engineering
school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in
my last year, I'm currently doing a six month internship in a C.
elegans lab at McGill University in Montreal.
The lab's computer are Macs, and besides standard browsing, word
processing and image processing, lab members also use them to aid them
in their molecular biology work.
One of the programs they use is called DNA Strider. This piece of
software has not been updated in a long time (probably since Apple's
System 6.x - window sizes are fixed to the small old mac screen size!)
and could require a face-lift.

In the lab, it is mainly used for managing and manipulating sequences
of genes, primers and constructs. The main features of interest here
are:
	- Sequence management
     - Graphical (circular or linear) restriction maps of a given
sequence (or part of it), showing restriction site data concerning the
part or whole sequence (for each enzyme, you get the number of
restriction sites, and the obtained fragement sizes)
	- Reverse complementary sequence
	- Quick and simple alignment between two sequences

I've searched the web and could not find an all-in-one package that
seemed as user friendly and coherent as DNA Strider. Individual web
sites and software tools do offer these features, but
	- the internet is slow (you click and need to wait before
getting your
result)
	- having everything in one place is nice

Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be
aiming to do what DNA Strider does, but is still very young (and
closed-source, but thats a different debate).

http://www.mekentosj.com/ has some very nice tools as well, but they're
very problem-specific.

Have I missed something? Is there a really cool java app or web
software (that I could install locally for speed) that would replace
DNA Strider? What does your molecular biology lab use in for it's day
to day work?
Oh and buying something expensive is not a solution.

Thanks for any leads,

Yannick.

\\\\\\\\\\\\\\\\\\\
\\  http://yannick.poulet.org icq: 22044361
\\  idh at poulet.org  tel: ++33.6.16.41.71.92

_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
_______________________________________________
BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board