From jiye at eden.rutgers.edu Mon Mar 1 16:26:43 2004 From: jiye at eden.rutgers.edu (jiye at eden.rutgers.edu) Date: Mon, 1 Mar 2004 16:26:43 -0500 (EST) Subject: [BiO BB] confusion about the psi-blast profile Message-ID: <3633168464jiye@eden.rutgers.edu> hi, I'm a graduate student at Rutgers Univ. NJ, USA. I'm seeking some kind help on a question I have recently regarding to the profile from psi-blast. I use the standlone blast program and run the blastpgp for 3 iterations and then the makemat to get the position specific scoring matrix. The commands I use are like blastpgp -d protein/nr -i test.seq -o test.rst -j 3 -C test.chk makemat -P test -d protein/nr And I put the following sequence in the test.seq, the nr protein database is downloaded from the ncbi web site. > S0_Sinorhizobium meliloti aa_pep18 mggfidiqapleqegtkavvrnwlrkigdpvksgdplveletdkvtqevs apadgvlaeilmrngddatpgavlgrigseaagaghaphyspavrhaaee ygldpatvtgtgrggrvtradmdraftarqegpasvaaeagdrgaapksr riphsgmraaiaehmlnsvttaphvtavfeadfsavmrhrdehgkrlaad gtklsytayvvsacvaamravpevnsrwhedaletfddinigvgislgdk glvvpvihraqdlslaeiaarlqdlttrarsnalsradvtggtftisnhg asgsllaapiiinqpqsailgvgkldkrvivrevdgadtiqirpmayvsl tidhraldghqtnawlthfvrvietwpk The result score I read back from the test.mtx is like: -32768 -291 -32768 -341 -521 -407 -185 -480 -359 -64 -338 80 980 -425 -457 -238 -338 -351 -261 -113 -343 -100 -296 - 32768 -32768 -397 -32768 153 -32768 -361 -298 -178 -500 423 -364 -495 -291 - 502 -408 -216 -352 -288 -360 477 -70 -412 -504 -100 -448 - 32768 -32768 -397 ..... ..... Since I have no experience with psi-blast before, I'm not very sure it's right or not. But I feel that something is wrong. The minimum number -32678 is repeatedly appears at all position of column 1, 3, 19, and 20. And other numbers also look either too small or too large. However, I couldn't find where is the problem. I also tried the database of swissprot, the result is similar. It's really appreciated if someone can tell me whether there is anything wrong with such kind of result. Best regards, -Jiankuan From B.A.T.Svensson at lumc.nl Mon Mar 1 20:13:28 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: Tue, 2 Mar 2004 02:13:28 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question Message-ID: > Normalization is the process of designing a good data model. Please, explain this statement. From jrambla at hotmail.com Tue Mar 2 04:28:30 2004 From: jrambla at hotmail.com (JRambla) Date: Tue, 2 Mar 2004 10:28:30 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: Message-ID: Hi all, Normalization is a concept that comes from relational database theory, created by the recently deceased Dr.Edgar F. Codd, a mathematician at IBM. That theory is the base of all SQL-whatever world. Normalization is a group of rules (5, if my memory is right) to apply to table design in order, basically, to eliminate redundancy on data. That redundancy will arise in the form of embarrassing, and sometimes hard to find, consistency problems on data stored in the database. As was suggested, following those rules is a good starting point to design a database. You can find a good introduction in http://www.sequoia.be/consult/method/english.htm Hope this helps, Jordi Rambla Barcelona (Spain) -----Mensaje original----- De: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, B.A.T. (HKG) Enviado el: martes, 02 de marzo de 2004 2:13 Para: 'bio_bulletin_board at bioinformatics.org ' Asunto: [BiO BB] RE: Normalization WAS: database design question > Normalization is the process of designing a good data model. Please, explain this statement. _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From B.A.T.Svensson at lumc.nl Tue Mar 2 05:46:01 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: Tue, 2 Mar 2004 11:46:01 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question Message-ID: Thank u for you suggested readings, but I did seek an elaboration on why (high/er?) normalization should be regarded as a good design? -----Original Message----- From: JRambla To: bio_bulletin_board at bioinformatics.org Sent: 2004-03-02 10:28 Subject: RE: [BiO BB] RE: Normalization WAS: database design question Hi all, Normalization is a concept that comes from relational database theory, created by the recently deceased Dr.Edgar F. Codd, a mathematician at IBM. That theory is the base of all SQL-whatever world. Normalization is a group of rules (5, if my memory is right) to apply to table design in order, basically, to eliminate redundancy on data. That redundancy will arise in the form of embarrassing, and sometimes hard to find, consistency problems on data stored in the database. As was suggested, following those rules is a good starting point to design a database. You can find a good introduction in http://www.sequoia.be/consult/method/english.htm Hope this helps, Jordi Rambla Barcelona (Spain) -----Mensaje original----- De: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, B.A.T. (HKG) Enviado el: martes, 02 de marzo de 2004 2:13 Para: 'bio_bulletin_board at bioinformatics.org ' Asunto: [BiO BB] RE: Normalization WAS: database design question > Normalization is the process of designing a good data model. Please, explain this statement. _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From jrambla at hotmail.com Tue Mar 2 08:21:01 2004 From: jrambla at hotmail.com (JRambla) Date: Tue, 2 Mar 2004 14:21:01 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: Message-ID: Hi, According to my experience (near 20 years now) in designing/consulting about enterprise databases: - Normalization is good/desirable in all online systems (like ones where several users can be reading and updating data simultaneously), usually called OLTP systems. Exceptions are not significant at all. - De-normalization is good (indeed mandatory) for datawarehouse & data mining systems where grouping, sorting and summarized data is the real interest. This is due to performance reasons associated to intensive calculations. Also, we will apply de-normalization in history files or logs, where you actually need a snapshot of relationships and data in the moment of the entry. The kind of database I remember starting the thread is a sequence database. I will classify it in the first group, although I have little experience in that field nowadays. As I mentioned in the previous e-mail, normalization (usually only the higher levels count as normalized) means not allowing repetitive data to live in the system. I.e. not copying customer address data in every invoice in the Invoices table. That way any change to the data is done only in the "master" record, and you don't need to keep track of all places where those data can be copied before. Keeping track of data copies is, usually, a tricky and error prone affair. So, you keep out of it as much as you can. Opposing to that, using record keys (primary and foreign keys) is good, because you define relationships at database design time, and database engine helps enforcing those relationships when entering data. Further details or more concrete questions will allow being more specific. Hope this clarifies a bit more. Jordi Rambla Barcelona (Spain) -----Mensaje original----- De: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, B.A.T. (HKG) Enviado el: martes, 02 de marzo de 2004 11:46 Para: 'bio_bulletin_board at bioinformatics.org' Asunto: RE: [BiO BB] RE: Normalization WAS: database design question Thank u for you suggested readings, but I did seek an elaboration on why (high/er?) normalization should be regarded as a good design? -----Original Message----- From: JRambla To: bio_bulletin_board at bioinformatics.org Sent: 2004-03-02 10:28 Subject: RE: [BiO BB] RE: Normalization WAS: database design question Hi all, Normalization is a concept that comes from relational database theory, created by the recently deceased Dr.Edgar F. Codd, a mathematician at IBM. That theory is the base of all SQL-whatever world. Normalization is a group of rules (5, if my memory is right) to apply to table design in order, basically, to eliminate redundancy on data. That redundancy will arise in the form of embarrassing, and sometimes hard to find, consistency problems on data stored in the database. As was suggested, following those rules is a good starting point to design a database. You can find a good introduction in http://www.sequoia.be/consult/method/english.htm Hope this helps, Jordi Rambla Barcelona (Spain) -----Mensaje original----- De: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, B.A.T. (HKG) Enviado el: martes, 02 de marzo de 2004 2:13 Para: 'bio_bulletin_board at bioinformatics.org ' Asunto: [BiO BB] RE: Normalization WAS: database design question > Normalization is the process of designing a good data model. Please, explain this statement. _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From B.A.T.Svensson at lumc.nl Tue Mar 2 08:57:02 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: 02 Mar 2004 14:57:02 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: References: Message-ID: <1078235822.26146.24.camel@ander> The only problem with (de)normalization theory is that it is not that very useful for everyday practical purposes. On Tue, 2004-03-02 at 14:21, JRambla wrote: > Hi, > > According to my experience (near 20 years now) in designing/consulting about > enterprise databases: > > - Normalization is good/desirable in all online systems (like ones where > several users can be reading and updating data simultaneously), usually > called OLTP systems. Exceptions are not significant at all. > - De-normalization is good (indeed mandatory) for datawarehouse & data > mining systems where grouping, sorting and summarized data is the real > interest. This is due to performance reasons associated to intensive > calculations. Also, we will apply de-normalization in history files or logs, > where you actually need a snapshot of relationships and data in the moment > of the entry. > > The kind of database I remember starting the thread is a sequence database. > I will classify it in the first group, although I have little experience in > that field nowadays. > > As I mentioned in the previous e-mail, normalization (usually only the > higher levels count as normalized) means not allowing repetitive data to > live in the system. I.e. not copying customer address data in every invoice > in the Invoices table. > > That way any change to the data is done only in the "master" record, and you > don't need to keep track of all places where those data can be copied > before. Keeping track of data copies is, usually, a tricky and error prone > affair. So, you keep out of it as much as you can. > > Opposing to that, using record keys (primary and foreign keys) is good, > because you define relationships at database design time, and database > engine helps enforcing those relationships when entering data. > > Further details or more concrete questions will allow being more specific. > > Hope this clarifies a bit more. > > Jordi Rambla > Barcelona (Spain) > > > > -----Mensaje original----- > De: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, > B.A.T. (HKG) > Enviado el: martes, 02 de marzo de 2004 11:46 > Para: 'bio_bulletin_board at bioinformatics.org' > Asunto: RE: [BiO BB] RE: Normalization WAS: database design question > > Thank u for you suggested readings, but I did seek an elaboration on > why (high/er?) normalization should be regarded as a good design? > > -----Original Message----- > From: JRambla > To: bio_bulletin_board at bioinformatics.org > Sent: 2004-03-02 10:28 > Subject: RE: [BiO BB] RE: Normalization WAS: database design question > > Hi all, > > Normalization is a concept that comes from relational database theory, > created by the recently deceased Dr.Edgar F. Codd, a mathematician at > IBM. That theory is the base of all SQL-whatever world. > > Normalization is a group of rules (5, if my memory is right) to apply to > table design in order, basically, to eliminate redundancy on data. That > redundancy will arise in the form of embarrassing, and sometimes hard to > find, consistency problems on data stored in the database. > > As was suggested, following those rules is a good starting point to > design a database. > > You can find a good introduction in > > http://www.sequoia.be/consult/method/english.htm > > Hope this helps, > > Jordi Rambla > Barcelona (Spain) > > -----Mensaje original----- > De: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de > Svensson, > B.A.T. (HKG) > Enviado el: martes, 02 de marzo de 2004 2:13 > Para: 'bio_bulletin_board at bioinformatics.org ' > Asunto: [BiO BB] RE: Normalization WAS: database design question > > > Normalization is the process of designing a good data model. > > Please, explain this statement. > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From jrambla at hotmail.com Tue Mar 2 09:29:40 2004 From: jrambla at hotmail.com (JRambla) Date: Tue, 2 Mar 2004 15:29:40 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: <1078235822.26146.24.camel@ander> Message-ID: Although I'm not sure I understand your comment, normalization must be eventually an attitude, an inner practice, essential for a good design having a long lifetime. Is like a good laboratory practice, you can obtain results without it, but it's a lot more probably to have good results having it by default. They can look a bit abstract, but when understood they're very practical, close to a methodology. However, you're right that they're not a checklist, step by step guide. Jordi Rambla Barcelona (Spain) -----Mensaje original----- De: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, B.A.T. (HKG) Enviado el: martes, 02 de marzo de 2004 14:57 Para: Bio Bulletin Asunto: RE: [BiO BB] RE: Normalization WAS: database design question The only problem with (de)normalization theory is that it is not that very useful for everyday practical purposes. On Tue, 2004-03-02 at 14:21, JRambla wrote: > Hi, > > According to my experience (near 20 years now) in designing/consulting about > enterprise databases: > > - Normalization is good/desirable in all online systems (like ones where > several users can be reading and updating data simultaneously), usually > called OLTP systems. Exceptions are not significant at all. > - De-normalization is good (indeed mandatory) for datawarehouse & data > mining systems where grouping, sorting and summarized data is the real > interest. This is due to performance reasons associated to intensive > calculations. Also, we will apply de-normalization in history files or logs, > where you actually need a snapshot of relationships and data in the moment > of the entry. > > The kind of database I remember starting the thread is a sequence database. > I will classify it in the first group, although I have little experience in > that field nowadays. > > As I mentioned in the previous e-mail, normalization (usually only the > higher levels count as normalized) means not allowing repetitive data to > live in the system. I.e. not copying customer address data in every invoice > in the Invoices table. > > That way any change to the data is done only in the "master" record, and you > don't need to keep track of all places where those data can be copied > before. Keeping track of data copies is, usually, a tricky and error prone > affair. So, you keep out of it as much as you can. > > Opposing to that, using record keys (primary and foreign keys) is good, > because you define relationships at database design time, and database > engine helps enforcing those relationships when entering data. > > Further details or more concrete questions will allow being more specific. > > Hope this clarifies a bit more. > > Jordi Rambla > Barcelona (Spain) > > > > -----Mensaje original----- > De: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, > B.A.T. (HKG) > Enviado el: martes, 02 de marzo de 2004 11:46 > Para: 'bio_bulletin_board at bioinformatics.org' > Asunto: RE: [BiO BB] RE: Normalization WAS: database design question > > Thank u for you suggested readings, but I did seek an elaboration on > why (high/er?) normalization should be regarded as a good design? > > -----Original Message----- > From: JRambla > To: bio_bulletin_board at bioinformatics.org > Sent: 2004-03-02 10:28 > Subject: RE: [BiO BB] RE: Normalization WAS: database design question > > Hi all, > > Normalization is a concept that comes from relational database theory, > created by the recently deceased Dr.Edgar F. Codd, a mathematician at > IBM. That theory is the base of all SQL-whatever world. > > Normalization is a group of rules (5, if my memory is right) to apply to > table design in order, basically, to eliminate redundancy on data. That > redundancy will arise in the form of embarrassing, and sometimes hard to > find, consistency problems on data stored in the database. > > As was suggested, following those rules is a good starting point to > design a database. > > You can find a good introduction in > > http://www.sequoia.be/consult/method/english.htm > > Hope this helps, > > Jordi Rambla > Barcelona (Spain) > > -----Mensaje original----- > De: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de > Svensson, > B.A.T. (HKG) > Enviado el: martes, 02 de marzo de 2004 2:13 > Para: 'bio_bulletin_board at bioinformatics.org ' > Asunto: [BiO BB] RE: Normalization WAS: database design question > > > Normalization is the process of designing a good data model. > > Please, explain this statement. > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From B.A.T.Svensson at lumc.nl Tue Mar 2 10:29:23 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: 02 Mar 2004 16:29:23 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: References: Message-ID: <1078241362.26146.117.camel@ander> 2nd order tensors calcus in fluid mechanics is also "easy" once you get used with the concept. However the difference we talking about here is that tensor calcus is simplifying the task of understanding the problem, while normalization theory does not, rather the opposite. Any student of computer science who has been forced to do formal deduction with horn clauses know that this is the most rigorous and silly way to conclude the most trivial and obvious facts, and as a matter of fact normalization theory is based on this kind of logical reasoning. P.S. Mentioning Barcelona, I just returned from there yesterday after a visit of four days. It a very fine city, and I were blessed with a very nice Monday - though the snow last Sunday was less appreciated. ;) D.S. On Tue, 2004-03-02 at 15:29, JRambla wrote: > Although I'm not sure I understand your comment, normalization must be > eventually an attitude, an inner practice, essential for a good design > having a long lifetime. Is like a good laboratory practice, you can obtain > results without it, but it's a lot more probably to have good results having > it by default. > > They can look a bit abstract, but when understood they're very practical, > close to a methodology. > However, you're right that they're not a checklist, step by step guide. > > Jordi Rambla > Barcelona (Spain) > > -----Mensaje original----- > De: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Svensson, > B.A.T. (HKG) > Enviado el: martes, 02 de marzo de 2004 14:57 > Para: Bio Bulletin > Asunto: RE: [BiO BB] RE: Normalization WAS: database design question > > The only problem with (de)normalization theory is that > it is not that very useful for everyday practical purposes. > > > On Tue, 2004-03-02 at 14:21, JRambla wrote: > > Hi, > > > > According to my experience (near 20 years now) in designing/consulting > about > > enterprise databases: > > > > - Normalization is good/desirable in all online systems (like ones where > > several users can be reading and updating data simultaneously), usually > > called OLTP systems. Exceptions are not significant at all. > > - De-normalization is good (indeed mandatory) for datawarehouse & data > > mining systems where grouping, sorting and summarized data is the real > > interest. This is due to performance reasons associated to intensive > > calculations. Also, we will apply de-normalization in history files or > logs, > > where you actually need a snapshot of relationships and data in the moment > > of the entry. > > > > The kind of database I remember starting the thread is a sequence > database. > > I will classify it in the first group, although I have little experience > in > > that field nowadays. > > > > As I mentioned in the previous e-mail, normalization (usually only the > > higher levels count as normalized) means not allowing repetitive data to > > live in the system. I.e. not copying customer address data in every > invoice > > in the Invoices table. > > > > That way any change to the data is done only in the "master" record, and > you > > don't need to keep track of all places where those data can be copied > > before. Keeping track of data copies is, usually, a tricky and error prone > > affair. So, you keep out of it as much as you can. > > > > Opposing to that, using record keys (primary and foreign keys) is good, > > because you define relationships at database design time, and database > > engine helps enforcing those relationships when entering data. > > > > Further details or more concrete questions will allow being more specific. > > > > Hope this clarifies a bit more. > > > > Jordi Rambla > > Barcelona (Spain) > > > > > > > > -----Mensaje original----- > > De: bio_bulletin_board-admin at bioinformatics.org > > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de > Svensson, > > B.A.T. (HKG) > > Enviado el: martes, 02 de marzo de 2004 11:46 > > Para: 'bio_bulletin_board at bioinformatics.org' > > Asunto: RE: [BiO BB] RE: Normalization WAS: database design question > > > > Thank u for you suggested readings, but I did seek an elaboration on > > why (high/er?) normalization should be regarded as a good design? > > > > -----Original Message----- > > From: JRambla > > To: bio_bulletin_board at bioinformatics.org > > Sent: 2004-03-02 10:28 > > Subject: RE: [BiO BB] RE: Normalization WAS: database design question > > > > Hi all, > > > > Normalization is a concept that comes from relational database theory, > > created by the recently deceased Dr.Edgar F. Codd, a mathematician at > > IBM. That theory is the base of all SQL-whatever world. > > > > Normalization is a group of rules (5, if my memory is right) to apply to > > table design in order, basically, to eliminate redundancy on data. That > > redundancy will arise in the form of embarrassing, and sometimes hard to > > find, consistency problems on data stored in the database. > > > > As was suggested, following those rules is a good starting point to > > design a database. > > > > You can find a good introduction in > > > > http://www.sequoia.be/consult/method/english.htm > > > > Hope this helps, > > > > Jordi Rambla > > Barcelona (Spain) > > > > -----Mensaje original----- > > De: bio_bulletin_board-admin at bioinformatics.org > > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de > > Svensson, > > B.A.T. (HKG) > > Enviado el: martes, 02 de marzo de 2004 2:13 > > Para: 'bio_bulletin_board at bioinformatics.org ' > > Asunto: [BiO BB] RE: Normalization WAS: database design question > > > > > Normalization is the process of designing a good data model. > > > > Please, explain this statement. > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From mgruenb at gmx.net Tue Mar 2 10:50:41 2004 From: mgruenb at gmx.net (Michael Gruenberger) Date: Tue, 02 Mar 2004 15:50:41 +0000 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: <1078241362.26146.117.camel@ander> References: <1078241362.26146.117.camel@ander> Message-ID: <1078242641.3392.67.camel@vogel> I don't quite understand your reasoning: You claimed that normalization has no practical use and that is because it's hard to understand?! Normalization is not meant to simplify database design, but rather to formalise it and to give guidelines as to what works and what doesn't. I completely agree with Jordi, normalization just becomes part of your 'inner feeling' of what works and what doesn't after you've done it a couple of times. Of course it isn't applicaple in all cases, but it surely is better than no guidelines at all, especially for people who have never designed a database before and who are looking for some guidelines to point them in the right direction. And there are ways to explain normalization in simple terms and with good examples. So if you are saying normalization hasn't got any practical use, what would you suggest to a newcomer to database design? On Tue, 2004-03-02 at 15:29, Svensson, B.A.T. (HKG) wrote: > 2nd order tensors calcus in fluid mechanics is also "easy" > once you get used with the concept. However the difference > we talking about here is that tensor calcus is simplifying > the task of understanding the problem, while normalization > theory does not, rather the opposite. > > Any student of computer science who has been forced to do > formal deduction with horn clauses know that this is the > most rigorous and silly way to conclude the most trivial > and obvious facts, and as a matter of fact normalization > theory is based on this kind of logical reasoning. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From B.A.T.Svensson at lumc.nl Tue Mar 2 12:00:45 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: 02 Mar 2004 18:00:45 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: <1078242641.3392.67.camel@vogel> References: <1078242641.3392.67.camel@vogel> Message-ID: <1078246845.26152.259.camel@ander> On Tue, 2004-03-02 at 16:50, Michael Gruenberger wrote: > I don't quite understand your reasoning: You claimed that normalization > has no practical use and that is because it's hard to understand?! That's not what I tried to say; Applying the theory makes things more complicated then they are. In short: it's a waste of time in everyday work. Also my apologizes that I forgot to mentioning/stress my point: I don't suggest to through out the baby with the water. By no means /learning/ the theory is no waste of time, however trying to /apply/ the theory in everyday work is a) a waste of time and b) a misunderstanding of the concept to be used. It might be a bit to though for a beginner to first start out with these subjects, there are other things that first better be learned (i.e. principles op pf programming, etc) then the these issues will, in time, be solved by them self. However the original question was about why normalization equals a good design. As already stressed by Jordi, this can't be held as a simple truth. Except for the answers already provided: high normalization optimize storage efficiency and insert/update speed, while low normalization address optimization of reading speed, but makes inserts/update a more complex task. Related with these problems is that any extreme normal from will create difficulties on its own. To low and updates becomes a night mare for a database programmer, to high and the very most simplistic query approaches monstrous constructions. I most case one needs to find a balance between these two factors, because pulling toward one end will only create problem in the other end. I think BCNF (Boey-Codd Normal Form) tried to address these issues, but can't tell for sure. However, this balance is found with experience (system stress testing) and commons sense judgment - no data decomposition course will teach you how top do this for a particular real system. One way to work around this is to enforce a list of constraints protected with a set of additional triggers in the database, but even this has an tendency to make otherwise a simple data model become a dinosaur in its implementation. And in addition nothing comes for free a performance hit will be put on top on this - sometimes it might be worth it, other times not. There is no way to establishing what is meant with a good design. This is depending on the project requirements, and in the end it up to the designer and users of the system to judge on it. In any case normalization theory wont tell you how to deal with it, and at the end of the day it wont even tell you if you made any good decisions what so ever. > Normalization is not meant to simplify database design, but rather to > formalise it and to give guidelines as to what works and what doesn't. > I completely agree with Jordi, normalization just becomes part of your > 'inner feeling' of what works and what doesn't after you've done it a > couple of times. Of course it isn't applicaple in all cases, but it > surely is better than no guidelines at all, especially for people who > have never designed a database before and who are looking for some > guidelines to point them in the right direction. And there are ways to > explain normalization in simple terms and with good examples. I do not object this. > So if you are saying normalization hasn't got any practical use, > what would you suggest to a newcomer to database design? To learn the theory, and then forget it. Data decomposition can be taught from the the books, however there is no book that tells you how to decompose a real problem in general, since decomposition can be done in several ways fore the very same data set, i.e. how to decompose is dependent on the question you want to ask to the data set. For a new comer, I do not think learning abstract things like relational algebra and normalization theory is the best end to start at. Programming and design is difficult as it already is, mainly because it does not reflect the way we humans normally solve a problem (that's why for instance programming is tricky to learn in the first place). But in any case; design&programming is a skill learned, and it takes many years to get them. Nobody would suggest that anybody can be a surgeon just because everybody can walk into the closets store and by a knife - but it seams like most people believes anybody can do database development just because the platforms is available. That's is simply not the case. So what do I suggest? Well, we all seams to tend to forget all the difficulties and troubles we had to go through when we once had to learn to learn to walk. One may get oneself a good education, either by study your self, or by formal education, and then add on some few years of practical experience, or just buy the knowledge from somebody who knows how to do it. That's my advice, because it does not exist a simple way to learn these things in "24 hours" or "7 days" courses, that's just a illusion. > On Tue, 2004-03-02 at 15:29, Svensson, B.A.T. (HKG) wrote: > > 2nd order tensors calcus in fluid mechanics is also "easy" > > once you get used with the concept. However the difference > > we talking about here is that tensor calcus is simplifying > > the task of understanding the problem, while normalization > > theory does not, rather the opposite. > > > > Any student of computer science who has been forced to do > > formal deduction with horn clauses know that this is the > > most rigorous and silly way to conclude the most trivial > > and obvious facts, and as a matter of fact normalization > > theory is based on this kind of logical reasoning. > > > From mgruenb at gmx.net Tue Mar 2 12:50:21 2004 From: mgruenb at gmx.net (Michael Gruenberger) Date: Tue, 02 Mar 2004 17:50:21 +0000 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: <1078246845.26152.259.camel@ander> References: <1078242641.3392.67.camel@vogel> <1078246845.26152.259.camel@ander> Message-ID: <1078249821.3394.105.camel@vogel> Thanks for the clarifications! I pretty much agree with everything you said. Of course you need a lot of experience to be able to design good data models and once you have the experience you probably don't have to go through the formal process of normalizing a database, because you just instinctively know what works best. But still in order to get the experience you have to start somewhere and you have to give a beginner an entry point and showing them normalization with some good examples has worked for me in most cases... This thread started by someone asking for some guidance on how to design his database and I still think that pointing that guy to normalization was a good idea. It would be interesting to get some feedback though and to know whether reading about normalization helped that guy! Cheers, Michael. On Tue, 2004-03-02 at 17:00, Svensson, B.A.T. (HKG) wrote: > That's not what I tried to say; Applying the theory makes things > more complicated then they are. In short: it's a waste of time > in everyday work. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jrambla at hotmail.com Tue Mar 2 13:36:28 2004 From: jrambla at hotmail.com (JRambla) Date: Tue, 2 Mar 2004 19:36:28 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: <1078249821.3394.105.camel@vogel> Message-ID: Svensson, For me, learning normalization is like learning to drive. When you're experienced, you can make a comfortable use of driving rules: semaphore timing, how to measure incoming cars speed, safe car to car distance according to speed, when your car will fit in a parking space and so on. However, if you're a beginner, you better stay with strict rules and driving professor rules-of-thumb. I will advise any beginner to stay with normalization except if the problem domain mandates otherwise. From my experience, when the designer wants to be smarter than that, they fail, 99% of times. This means having to redesign the database after lots of data coherence pains. Most of times, you will need to edit some data, also. Some of your issues with normalization (too much joins, etc.) can be addressed by database objects like views. That's a principle for me: if you stay normalized, database engine can probably help you with your troubles, if you stay apart, you're usually on your own. You're a "desperado". When not talking to beginners, and they can't hear us, we can use another principle: "queries drives database design". You must design your database (or a version of it) optimizing against the queries you will perform the most usually or the most critical. In my humble opinion, Jordi Rambla Barcelona (Spain) -----Mensaje original----- De: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Michael Gruenberger Enviado el: martes, 02 de marzo de 2004 18:50 Para: bio_bulletin_board at bioinformatics.org Asunto: RE: [BiO BB] RE: Normalization WAS: database design question Thanks for the clarifications! I pretty much agree with everything you said. Of course you need a lot of experience to be able to design good data models and once you have the experience you probably don't have to go through the formal process of normalizing a database, because you just instinctively know what works best. But still in order to get the experience you have to start somewhere and you have to give a beginner an entry point and showing them normalization with some good examples has worked for me in most cases... This thread started by someone asking for some guidance on how to design his database and I still think that pointing that guy to normalization was a good idea. It would be interesting to get some feedback though and to know whether reading about normalization helped that guy! Cheers, Michael. On Tue, 2004-03-02 at 17:00, Svensson, B.A.T. (HKG) wrote: > That's not what I tried to say; Applying the theory makes things > more complicated then they are. In short: it's a waste of time > in everyday work. From B.A.T.Svensson at lumc.nl Wed Mar 3 07:23:07 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: 03 Mar 2004 13:23:07 +0100 Subject: [BiO BB] RE: Normalization WAS: database design question In-Reply-To: References: Message-ID: <1078316587.26146.383.camel@ander> Dear Jordi, It seams to me that your "beginner" is a quite advance fellow(?). I still argue that one need to crawl before one can start to walk. Even in advanced text book on database system, normalization (i.e. a database design issues) are not described until the basic of database system has been covered. Like conceptual data models, schema's, keys attributes, relationships, roles, indexes, domains, anomalies, data definition/manipulation languages, etc, etc, etc... In my opinion starting with normalization without learning the basic concept of the above will result in nothing, since you wont know what it is you want to decompose anyway. But on the other hand learning the above will implicit teach you how to do normalization - namely with the common sense approach. Normalization theory is just a formal way to (read: a way to make a machines be able to) do what we human already know by heart and guts. But even before learning these concept you still need other foundation to stand on. In my opinion it is simply not enough to buy an a desktop DBMS and think everything will solve it self once one know normalization theory - it a little bit tricker than that. Put this another way; let say the novice know how to do normalization. Further assume the novice have normalized some data set. And now then? What to do next? How do the novice get raw data in the right format to load it into the RDBMS? And what means should be used to load data? Does data need to be updated/replaced frequently - are we talking genomic databases then we are into a nightmare. But even if these things are are achieved how will the novice know how to query the data? Etc, etc, etc... Yes, there are many ways to Rome, but they are all long... * * * You also claim there is 1 to 100 success ratio in diverting from normalization. As you might understand, referring to your experience can't be regarded as a solid proof of your claim. Especially when there are unanswered question mark, as to what normalization you are referring to, what is being normalized, the expertise level of the designer, a comparison with the unknown, etc, etc... The 1st primary goal of /any/ software development is: *** MAKE IT WORK! *** This is the primary goal for the beginner to achieve, because it doesn't matter how fancy your design is if it does not work at the end of the day. By creating things that works, the novice will learn from experience what will work smoothly and what does not, and eventually become a prof. If the novice just get the data into an RDBMS - in what- so-ever-normal-form, he will eventually be able to construct a query that can solves a specific problem for him, simply because humans are creative beings. However an expert may say that "the solution was a bit unusual". In fact quite a lot novices creates, or uses, one single tab delimited files (with God know what kind of normalization forms), dump them into some kind of database manger (Excel, MS Access), and then happily ignore any database design issues and just uses a "flat" query to get what ever data they need. And this works in most simplistic cases for the novices. In this way there will never be a need to changes the layout, just pick what they need, if something missing, another column will fix that. I assume you never counted these guys in your 99% failure ratio? * * * On database views: as you know, view are only predefined queries in the system, and is only of limited help (a database system is dynamic in time), since they wont address the issues I already been mentioning in longer terms. * * * I am satisfied with your last remark, that queries drives the design - that is to identify the mini-world we want to model. And when one design like this, a "good design" will no longer be measured by the normalization level of the system, but by the real usability of the system, i.e. according to the primary goal: make it work! Kind regards, //Anders On Tue, 2004-03-02 at 19:36, JRambla wrote: > Svensson, > > For me, learning normalization is like learning to drive. > When you're experienced, you can make a comfortable use of driving rules: > semaphore timing, how to measure incoming cars speed, safe car to car > distance according to speed, when your car will fit in a parking space and > so on. > > However, if you're a beginner, you better stay with strict rules and driving > professor rules-of-thumb. > > I will advise any beginner to stay with normalization except if the problem > domain mandates otherwise. From my experience, when the designer wants to be > smarter than that, they fail, 99% of times. This means having to redesign > the database after lots of data coherence pains. Most of times, you will > need to edit some data, also. > > Some of your issues with normalization (too much joins, etc.) can be > addressed by database objects like views. That's a principle for me: if you > stay normalized, database engine can probably help you with your troubles, > if you stay apart, you're usually on your own. You're a "desperado". > > When not talking to beginners, and they can't hear us, we can use another > principle: "queries drives database design". You must design your database > (or a version of it) optimizing against the queries you will perform the > most usually or the most critical. > > In my humble opinion, > > Jordi Rambla > Barcelona (Spain) > > -----Mensaje original----- > De: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] En nombre de Michael > Gruenberger > Enviado el: martes, 02 de marzo de 2004 18:50 > Para: bio_bulletin_board at bioinformatics.org > Asunto: RE: [BiO BB] RE: Normalization WAS: database design question > > Thanks for the clarifications! I pretty much agree with everything you > said. Of course you need a lot of experience to be able to design good > data models and once you have the experience you probably don't have to > go through the formal process of normalizing a database, because you > just instinctively know what works best. > > But still in order to get the experience you have to start somewhere and > you have to give a beginner an entry point and showing them > normalization with some good examples has worked for me in most cases... > > This thread started by someone asking for some guidance on how to design > his database and I still think that pointing that guy to normalization > was a good idea. It would be interesting to get some feedback though and > to know whether reading about normalization helped that guy! > > Cheers, > > Michael. > On Tue, 2004-03-02 at 17:00, Svensson, B.A.T. (HKG) wrote: > > > That's not what I tried to say; Applying the theory makes things > > more complicated then they are. In short: it's a waste of time > > in everyday work. > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From roy at colibase.bham.ac.uk Wed Mar 3 07:49:12 2004 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Wed, 03 Mar 2004 12:49:12 +0000 Subject: [BiO BB] clustalw In-Reply-To: <20040303122417.78F24D1F1D@www.bioinformatics.org> References: <20040303122417.78F24D1F1D@www.bioinformatics.org> Message-ID: <4045D448.2050405@colibase.bham.ac.uk> > Has anyone used Clustalw in the profile alignment mode? I tried to and found > that none of the menu items except the first one work. I downloaded the latest > version (1.83 for DOS) but it seems to have the same problem. If anyone has > successfully used Clustalw in the profile alignment mode could you please let > me know how you went about it? A problem with ClustalW is that error messages sometimes get hidden, as the menu is redisplayed. I'm guessing there is some problem with your second profile/sequence set (such as a duplicated sequence name). This would prevent the second set from loading correctly, and hence disable the other options. Try scrolling up in your DOS window after attempting to load the second set, and you should be able to see the error message and fix the problem. It should say "(loaded)" next to options 1 and 2 if you have been successful. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow, Division of Immunity and Infection, University of Birmingham, UK http://colibase.bham.ac.uk From john_abraham_bio at yahoo.com Mon Mar 8 10:04:38 2004 From: john_abraham_bio at yahoo.com (John Abraham) Date: Mon, 8 Mar 2004 07:04:38 -0800 (PST) Subject: [BiO BB] Universal Primer Message-ID: <20040308150438.56831.qmail@web60805.mail.yahoo.com> Hi I am looking at universal primer design (either using a single rRNA gene or cocktail of rRNA genes for both prokaryotes and eukaryotes) by using traditional Bioinformatics tools Does any of the group has experience in it.Any suggestion and reference in this regard to proceed in the above direction is highly helpful for me Looking for members and experts valuble inputs Thanks John --------------------------------- Do you Yahoo!? Yahoo! Search - Find what you?re looking for faster. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kalita at pikespeak.uccs.edu Tue Mar 9 22:27:55 2004 From: kalita at pikespeak.uccs.edu (J Kalita) Date: Tue, 9 Mar 2004 20:27:55 -0700 (MST) Subject: [BiO BB] A question about BLOSUM matrices Message-ID: <52031.128.198.168.105.1078889275.squirrel@pikespeak.uccs.edu> Hello, I have a question about BLOSUM matrices such as the one you see on the Web page at http://eta.embl-heidelberg.de:8000/misc/mat/blosum80.html. Can anyone please explain to me what the last four columns and last four rows labeled "B", "Z", "X" and "*" are? Thank you! Jugal Kalita Computer Science Department University of Colorado at Colorado Springs From pankaj at nii.res.in Wed Mar 10 00:06:12 2004 From: pankaj at nii.res.in (Pankaj) Date: Wed, 10 Mar 2004 10:36:12 +0530 (IST) Subject: [BiO BB] structural bioinformatics Message-ID: Hi all, i want to know about any good strcutural bioinformatics tutorial/book which talks (from preliminary to detail) about the basics of structural bioinformatics like: what is molecular dynamics what is molecular mechanics various softwares for them etc if any1 knows about about it kindly tell me thanks in advance Pankaj From idoerg at burnham.org Wed Mar 10 02:56:11 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Tue, 9 Mar 2004 23:56:11 -0800 Subject: [BiO BB] A question about BLOSUM matrices In-Reply-To: <52031.128.198.168.105.1078889275.squirrel@pikespeak.uccs.edu> Message-ID: Hi Jugal, B: aspartic acid / asparagine Z: glutamic acid / glutamine X: unknown residue *: anything which is not in {ABCDEFGHIKLMNPQRSTVWXYZ}. Alternatively, it might be the "translation" of an end codon. -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo On Tue, 9 Mar 2004, J Kalita wrote: > Hello, > > I have a question about BLOSUM matrices such as the one you see > on the Web page at > http://eta.embl-heidelberg.de:8000/misc/mat/blosum80.html. > Can anyone please explain to me what the last four columns and last four > rows labeled "B", "Z", "X" and "*" are? > > Thank you! > > Jugal Kalita > Computer Science Department > University of Colorado at Colorado Springs > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From idoerg at burnham.org Wed Mar 10 03:04:22 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 10 Mar 2004 00:04:22 -0800 Subject: [BiO BB] structural bioinformatics In-Reply-To: Message-ID: Try: Structural Bioinformatics H. Weissig & P. Bourne eds. http://www.amazon.com/exec/obidos/tg/detail/-/0471201995/qid=1078905457/sr=1-1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books Good basic coverage. Each chapter written by a different researcher in the field. If you would like to learn more about protein physics (as implied by your question): Protein Physics (A course of lectures) A. Finkelstein and O. Ptitsyn Not much bioinformatics per-se, but a treasure trove for learning about the physical principles of folding, structure & function. http://www.amazon.com/exec/obidos/tg/detail/-/0122567811/qid=1078905680/sr=1-1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books Hope this helps, ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo On Wed, 10 Mar 2004, Pankaj wrote: > Hi all, > i want to know about any good strcutural bioinformatics tutorial/book > which talks (from preliminary to detail) about the basics of > structural bioinformatics like: > what is molecular dynamics > what is molecular mechanics > various softwares for them > etc > if any1 knows about about it kindly tell me > thanks in advance > Pankaj > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From lon at bio-code.com Wed Mar 10 03:32:58 2004 From: lon at bio-code.com (L. James) Date: Wed, 10 Mar 2004 00:32:58 -0800 Subject: [BiO BB] open source bioinformatics project | Caltech In-Reply-To: Message-ID: Anybody using Cartwheel - an open source bioinformatics project out of Caltech? lon james managing director postgres, inc. san francisco, ca 94110 415-573-9192 lon at efcodd.org lon at bio-code.com bizscience at jabber.com -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of Iddo Friedberg Sent: Wednesday, March 10, 2004 12:04 AM To: bio_bulletin_board at bioinformatics.org Subject: Re: [BiO BB] structural bioinformatics Try: Structural Bioinformatics H. Weissig & P. Bourne eds. http://www.amazon.com/exec/obidos/tg/detail/-/0471201995/qid=1078905457/sr=1 -1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books Good basic coverage. Each chapter written by a different researcher in the field. If you would like to learn more about protein physics (as implied by your question): Protein Physics (A course of lectures) A. Finkelstein and O. Ptitsyn Not much bioinformatics per-se, but a treasure trove for learning about the physical principles of folding, structure & function. http://www.amazon.com/exec/obidos/tg/detail/-/0122567811/qid=1078905680/sr=1 -1/ref=sr_1_1/102-4375680-0112137?v=glance&s=books Hope this helps, ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo On Wed, 10 Mar 2004, Pankaj wrote: > Hi all, > i want to know about any good strcutural bioinformatics tutorial/book > which talks (from preliminary to detail) about the basics of > structural bioinformatics like: > what is molecular dynamics > what is molecular mechanics > various softwares for them > etc > if any1 knows about about it kindly tell me > thanks in advance > Pankaj > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From dana.reichmann at weizmann.ac.il Wed Mar 10 12:20:37 2004 From: dana.reichmann at weizmann.ac.il (Dana Reichmann) Date: Wed, 10 Mar 2004 19:20:37 +0200 Subject: [BiO BB] computational mutagenesis Message-ID: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il> Hi all, I am looking for good programs and algorithms for computational mutagenesis that can define hot spots in protein-protein interactions. I am interesting in different approach such as thermodynamical, structural etc. I want to understand what kind of approach are better, what is maximal prediction % of each. Thanks a lot, Dana -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 354 bytes Desc: not available URL: From idoerg at burnham.org Wed Mar 10 12:50:36 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 10 Mar 2004 09:50:36 -0800 Subject: [BiO BB] computational mutagenesis In-Reply-To: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il> References: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il> Message-ID: <404F556C.10600@burnham.org> Dana, Have you looked at the CAPRI site (http://capri.ebi.ac.uk/) and papers in Proteins? (http://www3.interscience.wiley.com/cgi-bin/jissue/104531597) Iddo Dana Reichmann wrote: > Hi all, > > I am looking for good programs and algorithms for computational > mutagenesis that can define hot spots in *protein-protein *interactions. > I am interesting in different approach such as thermodynamical, > structural etc. I want to understand what kind of approach are better, > what is maximal prediction % of each. > > Thanks a lot, > Dana -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From dana.reichmann at weizmann.ac.il Wed Mar 10 13:30:30 2004 From: dana.reichmann at weizmann.ac.il (Dana Reichmann) Date: Wed, 10 Mar 2004 20:30:30 +0200 Subject: [BiO BB] computational mutagenesis In-Reply-To: <404F556C.10600@burnham.org> References: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il> <404F556C.10600@burnham.org> Message-ID: <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il> Iddo hi, thanks for the suggestion, i'll check it, I am more interesting on evaluation of changes upon mutation (structure and thermodynamic changes) when the structure of a wild type complex is known. Something similar to FOLDEFF program (from L.Serrano group) or Rosetta from D. Baker. Thanks, Dana On Mar 10, 2004, at 7:50 PM, Iddo Friedberg wrote: > Dana, > > Have you looked at the CAPRI site (http://capri.ebi.ac.uk/) > and papers in Proteins? > > (http://www3.interscience.wiley.com/cgi-bin/jissue/104531597) > > Iddo > > Dana Reichmann wrote: >> Hi all, >> I am looking for good programs and algorithms for computational >> mutagenesis that can define hot spots in *protein-protein >> *interactions. I am interesting in different approach such as >> thermodynamical, structural etc. I want to understand what kind of >> approach are better, what is maximal prediction % of each. >> Thanks a lot, >> Dana > > -- > Iddo Friedberg, Ph.D. > The Burnham Institute > 10901 N. Torrey Pines Rd. > La Jolla, CA 92037 > USA > Tel: +1 (858) 646 3100 x3516 > Fax: +1 (858) 713 9930 > http://ffas.ljcrf.edu/~iddo > From idoerg at burnham.org Wed Mar 10 13:45:59 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Wed, 10 Mar 2004 10:45:59 -0800 Subject: [BiO BB] computational mutagenesis In-Reply-To: <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il> References: <3FA66142-72B7-11D8-8B14-000393BB411E@weizmann.ac.il> <404F556C.10600@burnham.org> <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il> Message-ID: <404F6267.5040806@burnham.org> Dana, Andrei Sali has been doing some work with exending MODELLER to include protein complexes. You might want to check his site to see what's new on this front. ./I Dana Reichmann wrote: > Iddo hi, > > thanks for the suggestion, i'll check it, > > I am more interesting on evaluation of changes upon mutation > (structure and thermodynamic changes) when the structure of a wild > type complex is known. Something similar to FOLDEFF program (from > L.Serrano group) or Rosetta from D. Baker. > > Thanks, > Dana > > > On Mar 10, 2004, at 7:50 PM, Iddo Friedberg wrote: > >> Dana, >> >> Have you looked at the CAPRI site (http://capri.ebi.ac.uk/) >> and papers in Proteins? >> >> (http://www3.interscience.wiley.com/cgi-bin/jissue/104531597) >> >> Iddo >> >> Dana Reichmann wrote: >> >>> Hi all, >>> I am looking for good programs and algorithms for computational >>> mutagenesis that can define hot spots in *protein-protein >>> *interactions. I am interesting in different approach such as >>> thermodynamical, structural etc. I want to understand what kind of >>> approach are better, what is maximal prediction % of each. >>> Thanks a lot, >>> Dana >> >> >> -- >> Iddo Friedberg, Ph.D. >> The Burnham Institute >> 10901 N. Torrey Pines Rd. >> La Jolla, CA 92037 >> USA >> Tel: +1 (858) 646 3100 x3516 >> Fax: +1 (858) 713 9930 >> http://ffas.ljcrf.edu/~iddo >> > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 646 3171 http://ffas.ljcrf.edu/~iddo From dmb at mrc-dunn.cam.ac.uk Wed Mar 10 18:38:47 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Wed, 10 Mar 2004 23:38:47 +0000 (GMT) Subject: [BiO BB] computational mutagenesis In-Reply-To: <02963A84-72C1-11D8-8B14-000393BB411E@weizmann.ac.il> Message-ID: I heard about a technique called 'computational alanine scanning' which simulates an experimental technique to probe protein-protein interaction (I think). Sounds like a good thing to play with / compare with other measures. Dan On Wed, 10 Mar 2004, Dana Reichmann wrote: > Iddo hi, > > thanks for the suggestion, i'll check it, > > I am more interesting on evaluation of changes upon mutation (structure > and thermodynamic changes) when the structure of a wild type complex is > known. Something similar to FOLDEFF program (from L.Serrano group) or > Rosetta from D. Baker. > > Thanks, > Dana > > > On Mar 10, 2004, at 7:50 PM, Iddo Friedberg wrote: > > > Dana, > > > > Have you looked at the CAPRI site (http://capri.ebi.ac.uk/) > > and papers in Proteins? > > > > (http://www3.interscience.wiley.com/cgi-bin/jissue/104531597) > > > > Iddo > > > > Dana Reichmann wrote: > >> Hi all, > >> I am looking for good programs and algorithms for computational > >> mutagenesis that can define hot spots in *protein-protein > >> *interactions. I am interesting in different approach such as > >> thermodynamical, structural etc. I want to understand what kind of > >> approach are better, what is maximal prediction % of each. > >> Thanks a lot, > >> Dana > > > > -- > > Iddo Friedberg, Ph.D. > > The Burnham Institute > > 10901 N. Torrey Pines Rd. > > La Jolla, CA 92037 > > USA > > Tel: +1 (858) 646 3100 x3516 > > Fax: +1 (858) 713 9930 > > http://ffas.ljcrf.edu/~iddo > > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From prathibha_562 at yahoo.co.in Thu Mar 11 04:23:36 2004 From: prathibha_562 at yahoo.co.in (=?iso-8859-1?q?prathibha=20bharathi?=) Date: Thu, 11 Mar 2004 09:23:36 +0000 (GMT) Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of time to complete a single database similarity search In-Reply-To: <404F6267.5040806@burnham.org> Message-ID: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com> Hai all, My protein sequence analysis tool is taking a lot of time to complete a single request for database similarity search.My database is a relational database for MySQL which contains 16 tables and 2,83,366 sequence entries. My Sequence analysis tool is currently running on a Local intranet server with 1.9GHz processor and 256MB RAM. For a single pairwise alignment it is taking around 10msecs depending on the length of query sequence and was taking more than 24 hours to complete single request with 4 threads working on 4 partitions .By making only 2 threads to be alive at a time working on 2 partitions(I partitioned my Database in to 8 based on sequence chesk sum) ,now it is taking around 9 hours to complete a single request for database similarity search. Is it really possible to reduce the time further with hardware configuration of 1.9Ghz and 256MB RAM. Or have I to go for more more powerful hardware configuration. Now i'm using MySQL database server and Apache HTTP server with JRun application server.Have i to go for more powerful application server than JRun . My implementation platform is Java and algorithm being used is" SMITH-WATERMAN LOCAL ALIGNMENT" algorithm. Thanking You, Prathibha. Yahoo! India Insurance Special: Be informed on the best policies, services, tools and more. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgollery at unr.edu Thu Mar 11 09:22:12 2004 From: mgollery at unr.edu (Martin Gollery) Date: Thu, 11 Mar 2004 06:22:12 -0800 Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of time to complete a single database similarity search In-Reply-To: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com> References: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com> Message-ID: <1079014932.40507614190a0@webmail.unr.edu> The Smith-Waterman algorithm is quite sensitive, but yes it is slow. Switch to FASTA or BLAST with some of the sequences that you have already run and see what you miss with your particular data- in most cases, it will be several percent that you will miss, but it might be worth it if it allows you to get the job done in a reasonable amount of time. Cheers, Marty Quoting prathibha bharathi : > Hai all, > > My protein sequence analysis tool is taking a lot of time to > complete a single request for database similarity search.My database is a > relational database for MySQL which contains 16 tables and 2,83,366 sequence > entries. > > My Sequence analysis tool is currently running on a Local intranet server > with 1.9GHz processor and 256MB RAM. > > For a single pairwise alignment it is taking around 10msecs depending on the > length of query sequence and was taking more than 24 hours to complete > single request with 4 threads working on 4 partitions .By making only 2 > threads to be alive at a time working on 2 partitions(I partitioned my > Database in to 8 based on sequence chesk sum) ,now it is taking around 9 > hours to complete a single request for database similarity search. > > Is it really possible to reduce the time further with hardware configuration > of 1.9Ghz and 256MB RAM. > Or have I to go for more more powerful hardware configuration. > Now i'm using MySQL database server and Apache HTTP server with JRun > application server.Have i to go for more powerful application server than > JRun . > My implementation platform is Java and algorithm being used is" > SMITH-WATERMAN LOCAL ALIGNMENT" algorithm. > Thanking You, > Prathibha. > > > > Yahoo! India Insurance Special: Be informed on the best policies, services, > tools and more. Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ------------------------------------------------- This mail sent through https://webmail.unr.edu From hamid at ibb.ut.ac.ir Thu Mar 11 09:44:46 2004 From: hamid at ibb.ut.ac.ir (hamid) Date: Thu, 11 Mar 2004 19:14:46 +0430 Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of time to complete a single database similarity search In-Reply-To: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com> References: <404F6267.5040806@burnham.org> <20040311092336.50395.qmail@web8202.mail.in.yahoo.com> Message-ID: Why do you use 18 tables? the relation among theses tables help to reduce speed! I think its better to use less tables. i think most of consumed time is for searching sequences in tables!! /* ?Hamid Nikbakht, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?M.Sc of Cell and Molecular Biology, ? ? ? ? ? ? ?Laboratory of Biophysics and Molecular Biology, Bioinformatics Department, ?Institute of Biochemistry and Biophysics(IBB), ?University of Tehran, ? ? ? ? ? ? ? ? ? ? ? ? ? ?Tehran,Iran. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Tel: +98-21-611-3322 ? ? ? ? ? ? ? ? ? ? ? ? ? ?Fax: +98-21-640-4680 ? ? ? ? ? ? ? ? ? ? ? ? ? ?E-Mail: hamid at ibb.ut.ac.ir ? ? ? ? ? ? ? ? ? ? ?Alt. E-mail: nikbakht at ibb.ut.ac.ir ? ? ? ? ? ? */ -----Original Message----- From: prathibha bharathi To: bio_bulletin_board at bioinformatics.org Date: Thu, 11 Mar 2004 09:23:36 +0000 (GMT) Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of time to complete a single database similarity search > Hai all, > > My protein sequence analysis tool is taking a lot of time to > complete a single request for database similarity search.My database is > a relational database for MySQL which contains 16 tables and 2,83,366 > sequence entries. > > My Sequence analysis tool is currently running on a Local intranet > server > with 1.9GHz processor and 256MB RAM. > > For a single pairwise alignment it is taking around 10msecs depending > on the length of query sequence and was taking more than 24 hours to > complete single request with 4 threads working on 4 partitions .By > making only 2 threads to be alive at a time working on 2 partitions(I > partitioned my Database in to 8 based on sequence chesk sum) ,now it is > taking around 9 hours to complete a single request for database > similarity search. > > Is it really possible to reduce the time further with hardware > configuration of 1.9Ghz and 256MB RAM. > Or have I to go for more more powerful hardware configuration. > Now i'm using MySQL database server and Apache HTTP server with JRun > application server.Have i to go for more powerful application server > than JRun . > My implementation platform is Java and algorithm being used is" > SMITH-WATERMAN LOCAL ALIGNMENT" algorithm. > Thanking You, > Prathibha. > > > > Yahoo! India Insurance Special: Be informed on the best policies, > services, tools and more. From lxyiwc at yahoo.com Thu Mar 11 12:03:33 2004 From: lxyiwc at yahoo.com (l x yi) Date: Thu, 11 Mar 2004 09:03:33 -0800 (PST) Subject: [BiO BB] protein database Message-ID: <20040311170333.96300.qmail@web21207.mail.yahoo.com> Hi, for my research, I need to download about 800 protein sequences each with length more than 1000 amino acids, could anyone tell me what is the best way to do this? I've looked at some batch retrieval options on the internet, but they only allow text searching using ids instead of specifying length.. Thanks very much for your help. Lily --------------------------------- Do you Yahoo!? Yahoo! Search - Find what you?re looking for faster. -------------- next part -------------- An HTML attachment was scrubbed... URL: From schlitt at ebi.ac.uk Thu Mar 11 12:57:15 2004 From: schlitt at ebi.ac.uk (Thomas Schlitt) Date: Thu, 11 Mar 2004 17:57:15 +0000 (GMT) Subject: [BiO BB] protein database In-Reply-To: <20040311170333.96300.qmail@web21207.mail.yahoo.com> Message-ID: Hi Lily from where do you want to download the data? srs lets you do something like this (I think) for the databases at EBI try www.ebi.ac.uk/srs select a protein Database (uniprot?) go to query and click on field to choose Sequence length you might want to define a "view" which contains the data fields you want ... dont know if this helps... Cheers Thomas On Thu, 11 Mar 2004, l x yi wrote: > Hi, > for my research, I need to download about 800 protein sequences each with length more than 1000 amino acids, could anyone tell me what is the best way to do this? I've looked at some batch retrieval options on the internet, but they only allow text searching using ids instead of specifying length.. > > Thanks very much for your help. > > Lily > > > --------------------------------- > Do you Yahoo!? > Yahoo! Search - Find what you?re looking for faster. _____________________________________________________________ Thomas Schlitt - Bioinformatics Research Fellow EMBL-EBI, Hinxton British Antarctic Survey Wellcome Trust Genome Campus High Cross, Madingley Road Cambridge CB10 1SD, UK Cambridge CB3 0ET, UK Tel. ++44-1223-494651 Tel. ++44-1223-221656 eMail schlitt at ebi.ac.uk tsc at bas.ac.uk From logan at cacs.louisiana.edu Thu Mar 11 14:01:25 2004 From: logan at cacs.louisiana.edu (Rasaiah Loganantharaj) Date: Thu, 11 Mar 2004 13:01:25 -0600 Subject: [BiO BB] Day symposium on Bioinformatics at University of Louisiana on April 8 Message-ID: <1079031685.3056.94.camel@logan.cacs.louisiana.edu> We are organizing a day symposium on Bioinformatics at the University of Louisiana, Lafayette. The keynote presentation at the symposium include Dr. David Mount from University of Arizona, Dr. Peter Good from NIH, Dr. Luxmy Parida from IBM T. J. Watson Lab. and Dr. Mark Borodovsky from Georgia Institute of Technology. The objective of the symposium is to help the new and existing bioinformatics researchers and to facilitate networking for future collaborative research and funding opportunity. There is no registration fee to attend/present at the symposium, but must register to attend the symposium. Those who want to present at the symposium, must submit his/her abstract before March 19th, 2004. The details of the symposium is given at http://www.cacs.louisiana.edu/bioinformatics/index.html Best regards from Raja Loganantharaj From B.A.T.Svensson at lumc.nl Fri Mar 12 04:57:32 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: 12 Mar 2004 10:57:32 +0100 Subject: [BiO BB] My Protein Sequence analysis tool is taking a lot of time to complete a single database similarity search In-Reply-To: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com> References: <20040311092336.50395.qmail@web8202.mail.in.yahoo.com> Message-ID: <1079024916.2958.259.camel@ander> This 10ms is that the true execution time, or just the measured time between call and return? ~2G Hz means that you are executing instructions in the nanoseconds interval with the CPU. So any single operation that consumes 1ms of CPU time have done quite a lot things - or did just butterflied for a while. It might be that your machine is swapping and is spending time in doing paging (disk I/O /IS/ slow). The standard solution for this is to put in more RAM in the machine. You mentioning that 2 threads runs faster than 4, this might just be a coincident (dependent on current load, weather you uses light weight thread, available system resources, etc, etc) but it might also suggest that you are low on memory, and the operating system is forced to swap memory pages when doing context switches. (This can be true for light weight threads if they uses a lot of private memory for data.) But you say you are using Java. And for me that explains it all. :) In my experience (been watching java process executing) Java is really bad in handling memory garbage collection when it gets stressed. This is due to the fact that the decision on when to do garbage collection is left to a machine - and as such it might not be at the most optimal time memory is being released - and that might very well be your case - you can figure this out by watching the memory allocation/deallocation statistics from your application. In any case, you may be able to handle this is two ways: write your application in C++_ and take control of the memory handling, or just by sufficient enough memory so the Java app never runs out of memory. On Thu, 2004-03-11 at 10:23, prathibha bharathi wrote: > Hai all, > > My protein sequence analysis tool is taking a lot of time to > complete a single request for database similarity search.My database > is a relational database for MySQL which contains 16 tables and > 2,83,366 sequence entries. > > My Sequence analysis tool is currently running on a Local intranet > server with 1.9GHz processor and 256MB RAM. > > For a single pairwise alignment it is taking around 10msecs depending > on the length of query sequence and was taking more than 24 hours to > complete single request with 4 threads working on 4 partitions .By > making only 2 threads to be alive at a time working on 2 partitions(I > partitioned my Database in to 8 based on sequence chesk sum) ,now it > is taking around 9 hours to complete a single request for database > similarity search. > > Is it really possible to reduce the time further with hardware > configuration of 1.9Ghz and 256MB RAM. > Or have I to go for more more powerful hardware configuration. > Now i'm using MySQL database server and Apache HTTP server with JRun > application server.Have i to go for more powerful application server > than JRun . > My implementation platform is Java and algorithm being used is" > SMITH-WATERMAN LOCAL ALIGNMENT" algorithm. > Thanking You, > Prathibha. > > > > Yahoo! India Insurance Special: Be informed on the best policies, > services, tools and more. From ssaikumar at yahoo.co.uk Sun Mar 14 21:49:38 2004 From: ssaikumar at yahoo.co.uk (sadhu saikumar) Date: Sun, 14 Mar 2004 18:49:38 -0800 (PST) Subject: [BiO BB] problems encountered while using tools Message-ID: <20040315024938.18805.qmail@web9608.mail.yahoo.com> Hi all, We are doing a survey in which we are trying to talk to Bioinformatics or biology people regarding 1. The problems that they are facing while using the web based tools. Ex: Filtering of records is not available based on species for BLAST. 2. You might want to have particular tool online instead of installing it on your local system. Ex: Sequence Viewer which is not hosted on the web whereas BLAST is hosted by NCBI. 3. Features such as results from multiple databases is missing. And so on. I know this community uses the Tools very much in their daily life. So It would be glad for us to know such problems and solve it for you. thanks, Sai. __________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com From landman at scalableinformatics.com Mon Mar 15 07:14:59 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 15 Mar 2004 07:14:59 -0500 Subject: [BiO BB] NCBI BLAST 2.2.8 RPMs available Message-ID: <40559E43.7010004@scalableinformatics.com> Folks: Redid packaging of the NCBI toolkit RPMs. The binaries are better captured by the newer system, and the sizes are somewhat larger. Packages built for P4 (labeled i686), Opteron (x86_64), Athlon, and source (src). The packages are located at http://downloads.scalableinformatics.com/downloads/ncbi/ and labeled 2.2.8-1 . The x86_64 was built under Fedora Core for AMD64, working on getting a SUSE load on the machine as well. The athlon and p4 packages were built under RH9.0. If you try to install a binary and it fails to work (crashes and dumps core), pull the source and run (as root) rpmbuild --rebuild NCBI-2.2.8-1.src.rpm on your machine. It will eventually generate the RPM barring compilation/permissions problems. You might need to do this if you are using a pre-RedHat 9.0 machine (the NPTL issue). Joe -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From t.fiedler at umiami.edu Thu Mar 18 13:16:10 2004 From: t.fiedler at umiami.edu (Tristan Fiedler) Date: Thu, 18 Mar 2004 13:16:10 -0500 (EST) Subject: [BiO BB] blastall in php Message-ID: <53089.129.171.111.22.1079633770.squirrel@webmail.rsmas.miami.edu> Greetings to All! If possible, please comment on http://www.phphelp.com/phpBB2/viewtopic.php?t=6040 Thanks -- Tristan J. Fiedler, Ph.D. Postdoctoral Research Fellow - Walsh Laboratory NIEHS Marine & Freshwater Biomedical Sciences Center Rosenstiel School of Marine & Atmospheric Sciences University of Miami tfiedler at rsmas.miami.edu t.fiedler at umiami.edu (alias) 305-361-4626 From B.A.T.Svensson at lumc.nl Thu Mar 18 13:28:06 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: Thu, 18 Mar 2004 19:28:06 +0100 Subject: [BiO BB] blastall in php Message-ID: If you want to execute external program with php in general you might like to have a look at http://www.php.net/popen. But I see no reason in general that would prevent you to start a blast with php. When it comes the that guys question about his program, it does not make sense at all. -----Original Message----- From: Tristan Fiedler To: bio_bulletin_board at bioinformatics.org Sent: 18-3-2004 19:16 Subject: [BiO BB] blastall in php Greetings to All! If possible, please comment on http://www.phphelp.com/phpBB2/viewtopic.php?t=6040 Thanks -- Tristan J. Fiedler, Ph.D. Postdoctoral Research Fellow - Walsh Laboratory NIEHS Marine & Freshwater Biomedical Sciences Center Rosenstiel School of Marine & Atmospheric Sciences University of Miami tfiedler at rsmas.miami.edu t.fiedler at umiami.edu (alias) 305-361-4626 _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From idoerg at burnham.org Thu Mar 18 13:53:11 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Thu, 18 Mar 2004 10:53:11 -0800 Subject: [BiO BB] blastall in php In-Reply-To: <53089.129.171.111.22.1079633770.squirrel@webmail.rsmas.miami.edu> References: <53089.129.171.111.22.1079633770.squirrel@webmail.rsmas.miami.edu> Message-ID: <4059F017.5020609@burnham.org> Tristan, Could it have something to do with the .ncbirc file not being properly read when called from php? Just a thought. ./I Tristan Fiedler wrote: > Greetings to All! > > If possible, please comment on > > http://www.phphelp.com/phpBB2/viewtopic.php?t=6040 > > Thanks > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From pculpep at hotmail.com Thu Mar 18 14:30:15 2004 From: pculpep at hotmail.com (Pamela Culpepper) Date: Thu, 18 Mar 2004 19:30:15 +0000 Subject: [BiO BB] blastall in php Message-ID: Tristan, You will need to fork another process to run the blastall program. Pam LifeFormulae, L.L.C >From: Iddo Friedberg >Reply-To: bio_bulletin_board at bioinformatics.org >To: bio_bulletin_board at bioinformatics.org >Subject: Re: [BiO BB] blastall in php >Date: Thu, 18 Mar 2004 10:53:11 -0800 > >Tristan, > >Could it have something to do with the .ncbirc file not being properly read >when called from php? Just a thought. > >./I > > >Tristan Fiedler wrote: >>Greetings to All! >> >>If possible, please comment on >> >>http://www.phphelp.com/phpBB2/viewtopic.php?t=6040 >> >>Thanks >> >> >> > >-- >Iddo Friedberg, Ph.D. >The Burnham Institute >10901 N. Torrey Pines Rd. >La Jolla, CA 92037 >USA >Tel: +1 (858) 646 3100 x3516 >Fax: +1 (858) 713 9930 >http://ffas.ljcrf.edu/~iddo > >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ FREE pop-up blocking with the new MSN Toolbar ? get it now! http://clk.atdmt.com/AVE/go/onm00200415ave/direct/01/ From rwang at bccancer.bc.ca Thu Mar 18 18:25:35 2004 From: rwang at bccancer.bc.ca (Renxue Wang) Date: Thu, 18 Mar 2004 15:25:35 -0800 Subject: [BiO BB] Error message with nrdb Message-ID: <6BAF4D075F07D411B30900508B94CBA00D846708@SERVER20> Hi, There, I am using NRDB implemented in WUBLAST to eliminate the replicated entries in my custom sequence database. While I am processing one of my sequence file (about 50 mb), NRDB gave me an error message read like this, $ nrdb test.fa >testnr FATAL: Report: fwrite error: Success $ and it appears that the operation stopped running at this point. I tried the same thing both on the server and on my linux, both stopped at same sequence (does not seem anything wrong with this seq, when I move this seq to the end of the sequence file. The program stopped somewhere else). The last line of the output file is, >gi|28574491:CDS(1), original length:2213. ORF(+) length:465,1..465, FATAL: Report: fwrite error: Success Does anyone know what the error message means and how to deal with it? Thanks a lot. Renxue -------------- next part -------------- An HTML attachment was scrubbed... URL: From brent at scihelp.com Tue Mar 9 23:52:47 2004 From: brent at scihelp.com (Brent Silver) Date: Tue, 9 Mar 2004 20:52:47 -0800 (PST) Subject: [BiO BB] request for info Message-ID: <20040310045247.92605.qmail@web109.biz.mail.yahoo.com> Hi, I am a Bay area biotech researcher who is interested in career opportunities in the Bay area (bioinformatics specifically). Can you please advise? Regards, Brent Silver www.scihelp.com/cv.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: From straberry_fizza at yahoo.co.in Mon Mar 15 23:59:29 2004 From: straberry_fizza at yahoo.co.in (=?iso-8859-1?q?sunil=20kumar?=) Date: Tue, 16 Mar 2004 04:59:29 +0000 (GMT) Subject: [BiO BB] about bioinformatics Message-ID: <20040316045929.20908.qmail@web8304.mail.in.yahoo.com> respected sir, I am studying msc in madras university.Here nobody is there to give good suggetions for bioinformatics projects as well as about the course also.This april I am going to complete my course.Just give your advices after this cource what jobs avialable and which is the good project.Please inform me sir which companies are good For this I will always greatefull to you sir. Thanking you sir SUNIL Yahoo! India Promos: Win a trip for 2 to Britain. Click here. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mourad12345678 at yahoo.com Wed Mar 17 21:08:15 2004 From: mourad12345678 at yahoo.com (Mourad Elloumi) Date: Wed, 17 Mar 2004 18:08:15 -0800 (PST) Subject: [BiO BB] CfP : Algorithms in Molecular Biology (ALBIO'04) Message-ID: <20040318020815.90890.qmail@web12304.mail.yahoo.com> CALL FOR PAPERS Algorithms in Molecular Biology (ALBIO'04) Session of 6th Int.Conf. on Algorithms, Scientific Computing, Modelling and Simulation (ASCOMS'04) Cancun, Mexico, May 12-15, 2004 http://www.worldses.org/conferences/2004/mexico/ascoms/index.html Computational Molecular Biology, has emerged from the Human Genome Project as an important discipline for academic research and industrial application. The growing size of biological databases, the complexity of biological problems and the necessity to deal with errors in biological sequences all result in large run time and memory requirements. Biological sequence databases are growing at an exponential rate. All of these factors will make the development of fast, low memory requirements and high-performances algorithms increasingly important in Computational Molecular Biology. In our session, we are interested in papers that deal with all aspects of algorithms in Molecular Biology. We are, particularly, interested in algorithms that address fundamental and/or applied problems in Molecular Biology, that are computationally efficient, that have been implemented and experimented on simulated and/or on real biological sequences and that provide interesting new results. The submitted papers should present recent research results and identify and explore directions for future research. Topics include, but not limited to: (i) strings processing, (ii) biological sequences comparison, (iii) structures prediction, (iv) phylogeny reconstruction, (v) DNA sequences assembly, clustering, and mapping, (vi) molecular evolution, (vii) genes prediction/recognition, (viii) genes expression INSTRUCTIONS TO AUTHORS You are invited to submit a hardcopy or a pdf version of a draft paper, about 4 to 5 pages including figures and references, before April 2, 2004 to the Session Chair : Dr. Mourad Elloumi : Mailing Address : Cit? Intilak bloc 6, app. 7, El Menzah 6, 2091 Tunis, Tunisia. E.Mail: Mourad.Elloumi at fsegt.rnu.tn how to send your paper? please, fill in the form http://www.worldses.org/conferences/2004/mexico/internet.htm and declare that your paper belong to this session __________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com From mourad12345678 at yahoo.com Thu Mar 18 16:38:06 2004 From: mourad12345678 at yahoo.com (Mourad Elloumi) Date: Thu, 18 Mar 2004 13:38:06 -0800 (PST) Subject: [BiO BB] 2004 RECOMB Satellite Meeting on DNA Sequencing Technologies and Computation Message-ID: <20040318213806.40829.qmail@web12308.mail.yahoo.com> Dear colleagues, The fourth annual RECOMB Satellite meeting on DNA Sequencing Technologies and Computation (http://recomb-satellite.stanford.edu/) will take place on May 22-23, at Stanford University. This year we will have another exciting, focussed meeting, with emphasis on new sequencing technologies and on the future of sequencing efforts. Confirmed speakers include Robert Waterston (Washington University School of Medicine), Jeff Schloss (NIH), Bjorn Andersson (Karolinska Institute, Sweden), Lene V. Hau (Harvard University), Tony Smith (Solexa), Jonathan Rothberg (CuraGen), Mostafa Ronaghi (Stanford Genome Technology Center), Paul Havlak (Baylor College of Medicine), Serafim Batzoglou (Stanford University), James Galagan (Whitehead Institute/MIT Center for Genome Research). CALL FOR ABSTRACTS Fourth Annual RECOMB Satellite meeting on DNA Sequencing Technologies and Computation May 22-23, Stanford University, Stanford, CA Abstract Submission Deadline: April 4; Notification of acceptance: April 20, 2004. Genome sequencing has been truly flourishing the past several years. Recent achievements include the completion of the human, mouse, rat, fugu, mosquito, malaria, and several other genomes. April 2003 marked the completion of the finished version of the human genome. The recent sequencing achievements have been possible because of advances in both lab techniques, and computational methods and capabilities. Notably, computational assembly has been an essential part of sequencing since the conception of the sequencing technology, and recent advances to computational assembly systems and algorithms were instrumental in recent sequencing successes. Despite the success of recent sequencing projects, genome sequencing is still extremely costly, time-consuming, and error-prone. Some efforts in making sequencing vastly easier, potentially reducing time and cost by several orders of magnitude are starting to emerge. Novel sequencing methods hold great potential for the future, and developing such technologies will be a focus of NIH for the next 5 to 10 years. The ultimate goal is to sequence or re-sequence a mammalian-size genome for as little as $1,000. Once the genome is at hand, the next step is analysis. The first steps in analyzing genomes are to annotate genes, common repeats, and other biologically important elements, and to compare genomes of related organisms. High-throughput pipelines and servers for that purpose are instrumental to making the genomic data useful to the research community. The purpose of this meeting is to bring together many of the people working on algorithms and software for large-scale sequencing and analysis of genomes, and novel technologies for genome determination. The main themes will be: ? Whole Genome Sequencing and Assembly. ? New and exotic sequencing technologies. ? Whole-genome analysis. Topics of interest include: whole-genome sequencing and assembly, novel sequencing technologies and the computational assembly problems they motivate, improved methods for sequencing and finishing, comparison and reconciliation of whole genome assemblies, high-throughput experimental techniques for genome analysis, pipelines for whole-genome annotation, comparison, and analysis Successful submissions will be invited for a 15-minute presentation, and a 1-2 page abstract will be printed on the conference proceedings, to be distributed to the meeting attendees. Abstracts should be 1 to 2 pages, and submitted in plain text or WORD format. Abstract Submission Deadline: April 4; Notification of acceptance: April 20, 2004. http://recomb-satellite.stanford.edu/ __________________________________ Do you Yahoo!? Yahoo! Mail - More reliable, more storage, less spam http://mail.yahoo.com From biotelerock at yahoo.com Mon Mar 22 12:29:36 2004 From: biotelerock at yahoo.com (Samantha Austin) Date: Mon, 22 Mar 2004 09:29:36 -0800 (PST) Subject: [BiO BB] Bay Area Bioinformatic Startup Message-ID: <20040322172936.31541.qmail@web41410.mail.yahoo.com> Hello Folks, I'm looking for bench trained, bay area molecular biologists (PhD) who would be interested in starting up a bioinformatic support company for life scientists. Skills should include Java/HTML/MySQL, extensive hands on experience with molecular biology techniques, and a bioentrepreneurial spirit. This isn't a job offer, just a request for folks interested in starting up something that might grow into a great day job. __________________________________ Do you Yahoo!? Yahoo! Finance Tax Center - File online. File on time. http://taxes.yahoo.com/filing.html From lon at bio-code.com Mon Mar 22 12:52:42 2004 From: lon at bio-code.com (L. James) Date: Mon, 22 Mar 2004 09:52:42 -0800 Subject: [BiO BB] Bay Area Bioinformatic Startup In-Reply-To: <20040322172936.31541.qmail@web41410.mail.yahoo.com> Message-ID: i'm interested. 415-573-9192 lon -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of Samantha Austin Sent: Monday, March 22, 2004 9:30 AM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] Bay Area Bioinformatic Startup Hello Folks, I'm looking for bench trained, bay area molecular biologists (PhD) who would be interested in starting up a bioinformatic support company for life scientists. Skills should include Java/HTML/MySQL, extensive hands on experience with molecular biology techniques, and a bioentrepreneurial spirit. This isn't a job offer, just a request for folks interested in starting up something that might grow into a great day job. __________________________________ Do you Yahoo!? Yahoo! Finance Tax Center - File online. File on time. http://taxes.yahoo.com/filing.html _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From monty_afs at yahoo.com Wed Mar 24 05:54:15 2004 From: monty_afs at yahoo.com (monty monty) Date: Wed, 24 Mar 2004 02:54:15 -0800 (PST) Subject: hello MR sunil ----Re: [BiO BB] about bioinformatics In-Reply-To: <20040316045929.20908.qmail@web8304.mail.in.yahoo.com> Message-ID: <20040324105415.86367.qmail@web40707.mail.yahoo.com> hello Sunil So regarding the projects in bioinformatics -presently there is no good research going on in it- and most of the bioinfo companies placing students of computer science only. i suggest you to apply to all big companies concerning lifesciences,biomedicine,bioinformatics -even though u dont have work experience,i am sure if you try hard u will be placed in good position. all the best. sridhar.M --- sunil kumar wrote: > respected sir, > I am studying msc in madras university.Here > nobody is there to give good suggetions for > bioinformatics projects as well as about the course > also.This april I am going to complete my > course.Just give your advices after this cource what > jobs avialable and which is the good project.Please > inform me sir which companies are good For this I > will always greatefull to you sir. > Thanking you sir > > SUNIL > > Yahoo! India Promos: Win a trip for 2 to Britain. > Click here. __________________________________ Do you Yahoo!? Yahoo! Finance Tax Center - File online. File on time. http://taxes.yahoo.com/filing.html From sean_s_sun at hotmail.com Thu Mar 25 11:26:42 2004 From: sean_s_sun at hotmail.com (S S) Date: Thu, 25 Mar 2004 16:26:42 +0000 Subject: [BiO BB] Bay Area Bioinformatic Startup Message-ID: An HTML attachment was scrubbed... URL: From pkerrwall at psu.edu Thu Mar 25 13:51:38 2004 From: pkerrwall at psu.edu (Kerr Wall) Date: Thu, 25 Mar 2004 13:51:38 -0500 Subject: [BiO BB] BLAST problem: limiting # of HSPs Message-ID: We are runing tBLASTx locally against our own data set. We're looking for ways to reduce the output size produced by BLAST and have set the alignment view to tabular (-m 8). The problem that we've come across is that a query will have multiple hits to the same sequence but for different HSPs. We need BLAST to retain only one result instead of filling the BLAST report with multiple E values for HSPs from the same gene. In the default blast output, there are summary statistics for the overall hit, is there an option for the tab-deliminated BLAST output that would give us this overall hit statistic instead of one for each HSP? If not, is there an option to limit the number of HSPs returned in the tab-deliminated output? Thanks, Kerr Wall From dmb at mrc-dunn.cam.ac.uk Thu Mar 25 15:03:48 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Thu, 25 Mar 2004 20:03:48 +0000 (GMT) Subject: [BiO BB] BLAST problem: limiting # of HSPs In-Reply-To: Message-ID: > In the default blast output, there are summary statistics for the overall > hit, is there an option for the tab-deliminated BLAST output that would give > us this overall hit statistic instead of one for each HSP? I think you can simply sum the e-values for each non overlapping HSP (I think they shouldn't overlap). Anybody know the correct formula? > If not, is there an option to limit the number of HSPs returned in the > tab-deliminated output? I am sure there is a way to do this, but I can't find any mention of this option in the ncbi/doc/blast.txt file. Hmm.... Not sure if these have anything to do with it... -K N (blastall, blastcl3, blastpgp) Number of best hits from a region to keep (off by default, if used a value of 100 is recommended) -P N (blastall, blastpgp, rpsblast) Set to 1 for single-hit mode or 0 for multiple-hit mode (default) -b N (blastall, blastcl3, blastpgp, impala, megablast, rpsblast, seed- top) Number of database sequences to show alignments for (B) (default is 250) If you get an answer from blast-help at ncbi.nlm.nih.gov can you please post it up? (these emails get archived). Cheers, Dan. > > Thanks, > > Kerr Wall > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From pkerrwall at psu.edu Fri Mar 26 17:18:42 2004 From: pkerrwall at psu.edu (Kerr Wall) Date: Fri, 26 Mar 2004 17:18:42 -0500 Subject: [BiO BB] BLAST problem: limiting # of HSPs In-Reply-To: <20040326170107.D2A93D1F11@www.bioinformatics.org> Message-ID: On 3/26/04 12:01 PM, "Dan Bolser " wrote: >> In the default blast output, there are summary statistics for the overall >> hit, is there an option for the tab-deliminated BLAST output that would give >> us this overall hit statistic instead of one for each HSP? > > > I think you can simply sum the e-values for each non overlapping HSP (I > think they shouldn't overlap). Anybody know the correct formula? I can handle non overlapping HSP's because I would only be parsing out the best evalue from each hit. I'm just trying to avoid it if at all possible. I'm running a tblastx of ~ 1,000,000 cdna's against themselves to produce a similarity matrix. Therefore, I'm more worried about the size of the output files and making sure that I don't run out of similarities between more distantly related genes that might get left out of the output when the maximum number of hits is reached (for some of the larger gene families). I need to make sure the matrix is as symmetrical as possible. >> If not, is there an option to limit the number of HSPs returned in the >> tab-deliminated output? > > I am sure there is a way to do this, but I can't find any mention of this > option in the > > ncbi/doc/blast.txt Yes, I know. They don?t even discuss all of the options in that file. You would think that the documentation for blast would be complete considering how long it has been around. > Hmm.... Not sure if these have anything to do with it... > > -K N (blastall, blastcl3, blastpgp) > Number of best hits from a region to keep (off by default, if > used a value of 100 is recommended) > > -P N (blastall, blastpgp, rpsblast) > Set to 1 for single-hit mode or 0 for multiple-hit > mode (default) > > -b N (blastall, blastcl3, blastpgp, impala, megablast, rpsblast, seed- > top) > Number of database sequences to show alignments for (B) (default > is 250) Thanks. Those are the parameters I've been working with so far. I did find a paragraph in the documentation that might be on this same track. Specifically #4 in the section "Notes for 2.0.6 release": ############################################################################ Notes for 2.0.6 release: Enhancements: ... 4.) BLAST has been changed to reduce the number of redundant hits that a user may see. This is acheived by keeping track of the number of hits completely contained in a certain region and eliminating those lower scoring hits that are redundant with others. This behavior may be controlled with the -K and -L options: -K Number of best hits from a region to keep [Integer] default = 50 -L Length of region used to judge hits [Integer] default = 20 Setting -K to zero turns off this feature. This is the default only on blastall. ############################################################################ Of course, when you get a list of all the options 'blastall -', the L option is labeled as '-L Location on query sequence [String] Optional'. Not sure what to make of that? I wonder if they have changed parameter names from 2.0.6 to 2.2.8? It looks as if setting K = 1 and using L > 100 (or much larger) would help me reduce the number of output. I think also using P = 1 as you stated above would probably help out the most. > If you get an answer from blast-help at ncbi.nlm.nih.gov can you please post > it up? (these emails get archived). I will. I sent them an email yesterday afternoon so I won't be expecting anything back until sometime next week. I usually have solved the problem by the time they get back to me. Thanks for the help, Kerr > Cheers, > Dan. > >> >> Thanks, >> >> Kerr Wall >> >> _______________________________________________ >> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> > > > > --__--__-- > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > End of BiO_Bulletin_Board Digest > From dmb at mrc-dunn.cam.ac.uk Sat Mar 27 07:47:15 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Sat, 27 Mar 2004 12:47:15 +0000 (GMT) Subject: [BiO BB] BLAST problem: limiting # of HSPs In-Reply-To: Message-ID: On Fri, 26 Mar 2004, Kerr Wall wrote: > On 3/26/04 12:01 PM, "Dan Bolser " wrote: > > >> In the default blast output, there are summary statistics for the overall > >> hit, is there an option for the tab-deliminated BLAST output that would give > >> us this overall hit statistic instead of one for each HSP? > > > > > > I think you can simply sum the e-values for each non overlapping HSP (I > > think they shouldn't overlap). Anybody know the correct formula? > > I can handle non overlapping HSP's because I would only be parsing out the > best evalue from each hit. I'm just trying to avoid it if at all possible. > I'm running a tblastx of ~ 1,000,000 cdna's against themselves to produce a > similarity matrix. Therefore, I'm more worried about the size of the output > files and making sure that I don't run out of similarities between more > distantly related genes that might get left out of the output when the > maximum number of hits is reached (for some of the larger gene families). I > need to make sure the matrix is as symmetrical as possible. Have you seen http://www.ebi.ac.uk/research/cgg/tribe/ and http://micans.org/mcl/ ? They provide tools to make a symmetrical all V all similarity matrix (I think it is an interface to blastall). > >> If not, is there an option to limit the number of HSPs returned in the > >> tab-deliminated output? > > > > I am sure there is a way to do this, but I can't find any mention of this > > option in the > > > > ncbi/doc/blast.txt > > Yes, I know. They don?t even discuss all of the options in that file. You > would think that the documentation for blast would be complete considering > how long it has been around. :) Have you tried the man pages? ncbi/doc/man/ > > Hmm.... Not sure if these have anything to do with it... > > > > -K N (blastall, blastcl3, blastpgp) > > Number of best hits from a region to keep (off by default, if > > used a value of 100 is recommended) > > > > -P N (blastall, blastpgp, rpsblast) > > Set to 1 for single-hit mode or 0 for multiple-hit > > mode (default) > > > > -b N (blastall, blastcl3, blastpgp, impala, megablast, rpsblast, seed- > > top) > > Number of database sequences to show alignments for (B) (default > > is 250) > > Thanks. Those are the parameters I've been working with so far. I did find > a paragraph in the documentation that might be on this same track. > Specifically #4 in the section "Notes for 2.0.6 release": > > > ############################################################################ > Notes for 2.0.6 release: > > Enhancements: > > ... > > 4.) BLAST has been changed to reduce the number of redundant hits that a > user may see. This is acheived by keeping track of the number of hits > completely contained in a certain region and eliminating those lower scoring > hits that are redundant with others. This behavior may be controlled with > the -K and -L options: > > -K Number of best hits from a region to keep [Integer] > default = 50 > -L Length of region used to judge hits [Integer] > default = 20 > > Setting -K to zero turns off this feature. This is the default only on > blastall. > ############################################################################ Cheers. > Of course, when you get a list of all the options 'blastall -', the L option > is labeled as '-L Location on query sequence [String] Optional'. Not sure > what to make of that? I wonder if they have changed parameter names from > 2.0.6 to 2.2.8? Tipical problem! blast.1 -L start,stop (blastall, blastcl3, megablast, rpsblast) Location on query sequence (for rpsblast, only valid in blastp mode) blastclust.1 -L X Length coverage threshold (default = 0.9) ? > It looks as if setting K = 1 and using L > 100 (or much larger) would help > me reduce the number of output. I think also using P = 1 as you stated > above would probably help out the most. > > > If you get an answer from blast-help at ncbi.nlm.nih.gov can you please post > > it up? (these emails get archived). > > I will. I sent them an email yesterday afternoon so I won't be expecting > anything back until sometime next week. I usually have solved the problem > by the time they get back to me. They are very buisy I guess. Best of luck! Dan. > > Thanks for the help, > > Kerr > > > > Cheers, > > Dan. > > > >> > >> Thanks, > >> > >> Kerr Wall > >> > >> _______________________________________________ > >> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > >> > > > > > > > > --__--__-- > > > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > End of BiO_Bulletin_Board Digest > > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From jeff at bioinformatics.org Sun Mar 28 21:24:44 2004 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Sun, 28 Mar 2004 21:24:44 -0500 Subject: [BiO BB] Agenda for the 4th Annual Meeting of Bioinformatics.Org Message-ID: <406788EC.5000700@bioinformatics.org> Greetings. The following is the agenda for the 4th Annual Meeting of Bioinformatics.Org, taking place this week at the Bio-IT World Conference + Expo in Boston, Mass. Tuesday, March 30 ----------------- Bioinformatics.Org is co-organizing the Bioclusters Workshop 2004, which developed out of our very own Bioclusters mailing list: 10:00 am to 5:00 pm We also have a booth (#523) in the exhibit hall, which opens briefly in the evening. We will be giving away BioBrew Linux DVDs while they last (50 total). Booth personnel include J.W. Bizzaro, Organization president, and Glen Otero, BioBrew Linux Brewmeister: 6:00 pm to 7:15 pm Wednesday, March 31 ------------------- Bioinformatics.Org is presenting its 2004 Benjamin Franklin Award in Bioinformatics to Lincoln Stein of Cold Spring Harbor Laboratory. Lincoln will give a 20 minute talk after receiving the Award from J.W. Bizzaro: 8:30 am to 9:00 am Stop by and see us at our booth (#523). We will be giving away BioBrew Linux DVDs while they last. Booth personnel include J.W. Bizzaro and Gene Ioffe, Organization treasurer: 10:00 am to 6:00 pm Thursday, April 1 ----------------- Bioinformatics.Org will have banquet room #204 for the entire day. We will try to have coffee and other things available as we can afford them. At 3:00 pm (tentative), J.W. Bizzaro will give a talk about the Organization and future directions: 8:00 am to 5:00 pm Stop by and see us at our booth (#523). We will be giving away BioBrew Linux DVDs while they last: 10:00 am to 2:00 pm Registration ------------ The free guest pass will give you access to everything Bioinformatics.Org is involved in, except for the Workshop. Just print out this pass (PDF) and bring it to the registration counter: http://bioinformatics.org/events/2004/guest_pass.pdf Registration for the full expo and conference will give you access to other events (and will make Bio-IT World want to invite us back next year): http://www.bioitworldexpo.com/ Cheers. Jeff -- J.W. Bizzaro jeff at bioinformatics.org President, Bioinformatics.Org http://bioinformatics.org/~jeff "As we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously." -- Benjamin Franklin -- From nidhi.jain at sbcglobal.net Mon Mar 29 12:37:27 2004 From: nidhi.jain at sbcglobal.net (Nidhi Jain) Date: Mon, 29 Mar 2004 09:37:27 -0800 Subject: [BiO BB] Numerical libraries Message-ID: Hi, I am working with a software company which is building the platform for bioinformatics. I was wondering if there are good, high performance numerical libraries like LAPACK available on the desktop. I will really appreciate sharing this knowledge with me. Thanks Nidhi From B.A.T.Svensson at lumc.nl Mon Mar 29 15:40:16 2004 From: B.A.T.Svensson at lumc.nl (Svensson, B.A.T. (HKG)) Date: Mon, 29 Mar 2004 22:40:16 +0200 Subject: [BiO BB] Numerical libraries Message-ID: A google search with "numerical libraries C high performance" gave the two first hits as: Java: http://hoschek.home.cern.ch/hoschek/colt/ C/Fortran: http://www.nag.com/numeric/numerical_libraries.asp -----Original Message----- From: Nidhi Jain To: bio_bulletin_board at bioinformatics.org Sent: 29-3-2004 19:37 Subject: [BiO BB] Numerical libraries Hi, I am working with a software company which is building the platform for bioinformatics. I was wondering if there are good, high performance numerical libraries like LAPACK available on the desktop. I will really appreciate sharing this knowledge with me. Thanks Nidhi From Austin.Tanney at arragen.com Tue Mar 30 08:19:56 2004 From: Austin.Tanney at arragen.com (Austin Tanney) Date: Tue, 30 Mar 2004 14:19:56 +0100 Subject: [BiO BB] Bay Area Bioinformatic Startup Message-ID: Hi Samantha How are your plans going for this startup? Austin Dr. Austin Tanney Senior Scientist Arragen Ltd. E-mail: Austin.Tanney at Arragen.com Phone: +44 283839 5750 Fax: +44 283839 8676 Mobile: +44 7968 013939 -----Original Message----- From: Samantha Austin [mailto:biotelerock at yahoo.com] Sent: 22 March 2004 17:30 To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] Bay Area Bioinformatic Startup Hello Folks, I'm looking for bench trained, bay area molecular biologists (PhD) who would be interested in starting up a bioinformatic support company for life scientists. Skills should include Java/HTML/MySQL, extensive hands on experience with molecular biology techniques, and a bioentrepreneurial spirit. This isn't a job offer, just a request for folks interested in starting up something that might grow into a great day job. __________________________________ Do you Yahoo!? Yahoo! Finance Tax Center - File online. File on time. http://taxes.yahoo.com/filing.html _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board This e-mail is from ArraGen Ltd The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. If you have received the e-mail in error please notify helpdesk at arragen.com or telephone +44 28 38 363841 and delete the e-mail from your system. E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient. Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free. ArraGen Ltd. Registration Number NI 43067 Registered Address : Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD From rfsouza at citri.iq.usp.br Fri Mar 26 12:48:18 2004 From: rfsouza at citri.iq.usp.br (Robson Francisco de Souza) Date: Fri, 26 Mar 2004 14:48:18 -0300 Subject: [BiO BB] GI numbers Message-ID: <20040326174818.GH23629@genoma4.iq.usp.br> Hi, I'm analyzing a set of sequences with regard to their classifications as homologs from both COG and Kegg databases of orthologs. Although both COG and Kegg provide tables relating gene names to GI (PID) numbers, I'm, up to this moment, unable to map GIs from one dataset to the other, in order to check classifications for genes in both catalogs. GIs from COG appear to be from RefSeq and those from Kegg seem to be from GenPept. How can I map GI numbers from Kegg to GI numbers from COG database? Is there any query I can make to download such info for 185904 proteins in COG and their equivalents on Kegg Orthologs database? Here is an example: Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum pernix complete genome, as described in COG's table myva=gb. The same sequence is identified by GI 5103570 in Kegg. In this case, I was able map COG's GI to Kegg's GI by using the gene identifier and annotation, a procedure that is not easily automated. How can I retrive equivalent IDs for the whole COG gene set? Thanks in advance for any help. Robson From idh at poulet.org Tue Mar 30 10:04:33 2004 From: idh at poulet.org (Yannick Wurm) Date: Tue, 30 Mar 2004 10:04:33 -0500 Subject: [BiO BB] DNA Strider Message-ID: <8D9C0586-825B-11D8-AF58-000393CAA04A@poulet.org> Hi, I'm a student in Bioinformatics and Modeling at a French engineering school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in my last year, I'm currently doing a six month internship in a C. elegans lab at McGill University in Montreal. The lab's computer are Macs, and besides standard browsing, word processing and image processing, lab members also use them to aid them in their molecular biology work. One of the programs they use is called DNA Strider. This piece of software has not been updated in a long time (probably since Apple's System 6.x - window sizes are fixed to the small old mac screen size!) and could require a face-lift. In the lab, it is mainly used for managing and manipulating sequences of genes, primers and constructs. The main features of interest here are: - Sequence management - Graphical (circular or linear) restriction maps of a given sequence (or part of it), showing restriction site data concerning the part or whole sequence (for each enzyme, you get the number of restriction sites, and the obtained fragement sizes) - Reverse complementary sequence - Quick and simple alignment between two sequences I've searched the web and could not find an all-in-one package that seemed as user friendly and coherent as DNA Strider. Individual web sites and software tools do offer these features, but - the internet is slow (you click and need to wait before getting your result) - having everything in one place is nice Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be aiming to do what DNA Strider does, but is still very young (and closed-source, but thats a different debate). http://www.mekentosj.com/ has some very nice tools as well, but they're very problem-specific. Have I missed something? Is there a really cool java app or web software (that I could install locally for speed) that would replace DNA Strider? What does your molecular biology lab use in for it's day to day work? Oh and buying something expensive is not a solution. Thanks for any leads, Yannick. \\\\\\\\\\\\\\\\\\\ \\ http://yannick.poulet.org icq: 22044361 \\ idh at poulet.org tel: ++33.6.16.41.71.92 From ryangolhar at hotmail.com Tue Mar 30 18:30:36 2004 From: ryangolhar at hotmail.com (Ryan Golhar) Date: Tue, 30 Mar 2004 18:30:36 -0500 Subject: [BiO BB] DNA Strider In-Reply-To: <8D9C0586-825B-11D8-AF58-000393CAA04A@poulet.org> Message-ID: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> You know, I'm constantly finding different programs to perform different tasks. Either client applications, or web-based. Some run on Linux, others Windows. I would like to see 1 application for multiple platforms to performs dna sequence analysis. I started writing something in Java to do this but haven't touched in awhile. I'm wondering how many people would be interested in helping to develop a platform-independent application to perform all sorts of sequence analysis - alignments, snp analysis, assembly, etc. Sort of like GCG, but free and actually user-friendly and useful. If people are interested, I think we should talk about a framework and start building something as needed. Any comments? ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam at umdnj.edu -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of Yannick Wurm Sent: Tuesday, March 30, 2004 10:05 AM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] DNA Strider Hi, I'm a student in Bioinformatics and Modeling at a French engineering school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in my last year, I'm currently doing a six month internship in a C. elegans lab at McGill University in Montreal. The lab's computer are Macs, and besides standard browsing, word processing and image processing, lab members also use them to aid them in their molecular biology work. One of the programs they use is called DNA Strider. This piece of software has not been updated in a long time (probably since Apple's System 6.x - window sizes are fixed to the small old mac screen size!) and could require a face-lift. In the lab, it is mainly used for managing and manipulating sequences of genes, primers and constructs. The main features of interest here are: - Sequence management - Graphical (circular or linear) restriction maps of a given sequence (or part of it), showing restriction site data concerning the part or whole sequence (for each enzyme, you get the number of restriction sites, and the obtained fragement sizes) - Reverse complementary sequence - Quick and simple alignment between two sequences I've searched the web and could not find an all-in-one package that seemed as user friendly and coherent as DNA Strider. Individual web sites and software tools do offer these features, but - the internet is slow (you click and need to wait before getting your result) - having everything in one place is nice Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be aiming to do what DNA Strider does, but is still very young (and closed-source, but thats a different debate). http://www.mekentosj.com/ has some very nice tools as well, but they're very problem-specific. Have I missed something? Is there a really cool java app or web software (that I could install locally for speed) that would replace DNA Strider? What does your molecular biology lab use in for it's day to day work? Oh and buying something expensive is not a solution. Thanks for any leads, Yannick. \\\\\\\\\\\\\\\\\\\ \\ http://yannick.poulet.org icq: 22044361 \\ idh at poulet.org tel: ++33.6.16.41.71.92 _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From mgollery at unr.edu Tue Mar 30 18:38:33 2004 From: mgollery at unr.edu (Martin Gollery) Date: Tue, 30 Mar 2004 15:38:33 -0800 Subject: [BiO BB] DNA Strider In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> References: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> Message-ID: <406A04F9.6030902@unr.edu> Sounds like EMBOSS... Ryan Golhar wrote: > You know, I'm constantly finding different programs to perform different > tasks. Either client applications, or web-based. Some run on Linux, > others Windows. > > I would like to see 1 application for multiple platforms to performs dna > sequence analysis. I started writing something in Java to do this but > haven't touched in awhile. > > I'm wondering how many people would be interested in helping to develop > a platform-independent application to perform all sorts of sequence > analysis - alignments, snp analysis, assembly, etc. Sort of like GCG, > but free and actually user-friendly and useful. If people are > interested, I think we should talk about a framework and start building > something as needed. > > Any comments? > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam at umdnj.edu > > -----Original Message----- > From: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of > Yannick Wurm > Sent: Tuesday, March 30, 2004 10:05 AM > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] DNA Strider > > > Hi, > I'm a student in Bioinformatics and Modeling at a French engineering > school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in > my last year, I'm currently doing a six month internship in a C. > elegans lab at McGill University in Montreal. > The lab's computer are Macs, and besides standard browsing, word > processing and image processing, lab members also use them to aid them > in their molecular biology work. > One of the programs they use is called DNA Strider. This piece of > software has not been updated in a long time (probably since Apple's > System 6.x - window sizes are fixed to the small old mac screen size!) > and could require a face-lift. > > In the lab, it is mainly used for managing and manipulating sequences > of genes, primers and constructs. The main features of interest here > are: > - Sequence management > - Graphical (circular or linear) restriction maps of a given > sequence (or part of it), showing restriction site data concerning the > part or whole sequence (for each enzyme, you get the number of > restriction sites, and the obtained fragement sizes) > - Reverse complementary sequence > - Quick and simple alignment between two sequences > > I've searched the web and could not find an all-in-one package that > seemed as user friendly and coherent as DNA Strider. Individual web > sites and software tools do offer these features, but > - the internet is slow (you click and need to wait before > getting your > result) > - having everything in one place is nice > > Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be > aiming to do what DNA Strider does, but is still very young (and > closed-source, but thats a different debate). > > http://www.mekentosj.com/ has some very nice tools as well, but they're > very problem-specific. > > Have I missed something? Is there a really cool java app or web > software (that I could install locally for speed) that would replace > DNA Strider? What does your molecular biology lab use in for it's day > to day work? > Oh and buying something expensive is not a solution. > > Thanks for any leads, > > Yannick. > > \\\\\\\\\\\\\\\\\\\ > \\ http://yannick.poulet.org icq: 22044361 > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 'Don't worry that the world will end today- it's already tomorrow in Australia...' -Charles Schulz From idoerg at burnham.org Tue Mar 30 18:49:38 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Tue, 30 Mar 2004 15:49:38 -0800 Subject: [BiO BB] DNA Strider In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> References: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> Message-ID: <406A0792.3000407@burnham.org> Hi Ryan, EMBOSS works on Un*x, Linux and FreeBSD machines, and on Mac OS-X, so that covers everything but Windows. http://emboss.org Apparently you can get full functionality using Cygwin in Windows, or there is an EMBOSS for Windows project going on, http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html Seems to be alive, their latest update is from 02/04. They claim to have 158 programs from the original EMBOSS suite implemented. JEMBOSS is the Java-based point-and-click interface, works on Linux, Mac OS-X, AND Windows. Cheers, Iddo Ryan Golhar wrote: > You know, I'm constantly finding different programs to perform different > tasks. Either client applications, or web-based. Some run on Linux, > others Windows. > > I would like to see 1 application for multiple platforms to performs dna > sequence analysis. I started writing something in Java to do this but > haven't touched in awhile. > > I'm wondering how many people would be interested in helping to develop > a platform-independent application to perform all sorts of sequence > analysis - alignments, snp analysis, assembly, etc. Sort of like GCG, > but free and actually user-friendly and useful. If people are > interested, I think we should talk about a framework and start building > something as needed. > > Any comments? > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam at umdnj.edu > > -----Original Message----- > From: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of > Yannick Wurm > Sent: Tuesday, March 30, 2004 10:05 AM > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] DNA Strider > > > Hi, > I'm a student in Bioinformatics and Modeling at a French engineering > school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in > my last year, I'm currently doing a six month internship in a C. > elegans lab at McGill University in Montreal. > The lab's computer are Macs, and besides standard browsing, word > processing and image processing, lab members also use them to aid them > in their molecular biology work. > One of the programs they use is called DNA Strider. This piece of > software has not been updated in a long time (probably since Apple's > System 6.x - window sizes are fixed to the small old mac screen size!) > and could require a face-lift. > > In the lab, it is mainly used for managing and manipulating sequences > of genes, primers and constructs. The main features of interest here > are: > - Sequence management > - Graphical (circular or linear) restriction maps of a given > sequence (or part of it), showing restriction site data concerning the > part or whole sequence (for each enzyme, you get the number of > restriction sites, and the obtained fragement sizes) > - Reverse complementary sequence > - Quick and simple alignment between two sequences > > I've searched the web and could not find an all-in-one package that > seemed as user friendly and coherent as DNA Strider. Individual web > sites and software tools do offer these features, but > - the internet is slow (you click and need to wait before > getting your > result) > - having everything in one place is nice > > Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be > aiming to do what DNA Strider does, but is still very young (and > closed-source, but thats a different debate). > > http://www.mekentosj.com/ has some very nice tools as well, but they're > very problem-specific. > > Have I missed something? Is there a really cool java app or web > software (that I could install locally for speed) that would replace > DNA Strider? What does your molecular biology lab use in for it's day > to day work? > Oh and buying something expensive is not a solution. > > Thanks for any leads, > > Yannick. > > \\\\\\\\\\\\\\\\\\\ > \\ http://yannick.poulet.org icq: 22044361 > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From gary at www.bioinformatics.org Tue Mar 30 19:14:25 2004 From: gary at www.bioinformatics.org (Gary Van Domselaar) Date: Tue, 30 Mar 2004 19:14:25 -0500 (EST) Subject: [BiO BB] DNA Strider In-Reply-To: <406A0792.3000407@burnham.org> Message-ID: Hey Gang, For platform-independant sequence _manipulation_ there is the sequence manipulation suite: http://bioinformatics.org/sms2 SMS2 and EMBOSS together cover most of what you can expect to get out of an old sequence analysis package like DNA strider. Decidedly, g. On Tue, 30 Mar 2004, Iddo Friedberg wrote: > Hi Ryan, > > EMBOSS works on Un*x, Linux and FreeBSD machines, and on Mac OS-X, so > that covers everything but Windows. > > http://emboss.org > > Apparently you can get full functionality using Cygwin in Windows, or > > there is an EMBOSS for Windows project going on, > > http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html > > Seems to be alive, their latest update is from 02/04. They claim to have > 158 programs from the original EMBOSS suite implemented. > > JEMBOSS is the Java-based point-and-click interface, works on Linux, Mac > OS-X, AND Windows. > > Cheers, > > Iddo > > > Ryan Golhar wrote: > > You know, I'm constantly finding different programs to perform different > > tasks. Either client applications, or web-based. Some run on Linux, > > others Windows. > > > > I would like to see 1 application for multiple platforms to performs dna > > sequence analysis. I started writing something in Java to do this but > > haven't touched in awhile. > > > > I'm wondering how many people would be interested in helping to develop > > a platform-independent application to perform all sorts of sequence > > analysis - alignments, snp analysis, assembly, etc. Sort of like GCG, > > but free and actually user-friendly and useful. If people are > > interested, I think we should talk about a framework and start building > > something as needed. > > > > Any comments? > > > > ----- > > Ryan Golhar > > Computational Biologist > > The Informatics Institute at > > The University of Medicine & Dentistry of NJ > > > > Phone: 973-972-5034 > > Fax: 973-972-7412 > > Email: golharam at umdnj.edu > > > > -----Original Message----- > > From: bio_bulletin_board-admin at bioinformatics.org > > [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of > > Yannick Wurm > > Sent: Tuesday, March 30, 2004 10:05 AM > > To: bio_bulletin_board at bioinformatics.org > > Subject: [BiO BB] DNA Strider > > > > > > Hi, > > I'm a student in Bioinformatics and Modeling at a French engineering > > school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in > > my last year, I'm currently doing a six month internship in a C. > > elegans lab at McGill University in Montreal. > > The lab's computer are Macs, and besides standard browsing, word > > processing and image processing, lab members also use them to aid them > > in their molecular biology work. > > One of the programs they use is called DNA Strider. This piece of > > software has not been updated in a long time (probably since Apple's > > System 6.x - window sizes are fixed to the small old mac screen size!) > > and could require a face-lift. > > > > In the lab, it is mainly used for managing and manipulating sequences > > of genes, primers and constructs. The main features of interest here > > are: > > - Sequence management > > - Graphical (circular or linear) restriction maps of a given > > sequence (or part of it), showing restriction site data concerning the > > part or whole sequence (for each enzyme, you get the number of > > restriction sites, and the obtained fragement sizes) > > - Reverse complementary sequence > > - Quick and simple alignment between two sequences > > > > I've searched the web and could not find an all-in-one package that > > seemed as user friendly and coherent as DNA Strider. Individual web > > sites and software tools do offer these features, but > > - the internet is slow (you click and need to wait before > > getting your > > result) > > - having everything in one place is nice > > > > Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be > > aiming to do what DNA Strider does, but is still very young (and > > closed-source, but thats a different debate). > > > > http://www.mekentosj.com/ has some very nice tools as well, but they're > > very problem-specific. > > > > Have I missed something? Is there a really cool java app or web > > software (that I could install locally for speed) that would replace > > DNA Strider? What does your molecular biology lab use in for it's day > > to day work? > > Oh and buying something expensive is not a solution. > > > > Thanks for any leads, > > > > Yannick. > > > > \\\\\\\\\\\\\\\\\\\ > > \\ http://yannick.poulet.org icq: 22044361 > > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > > > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > > From idh at poulet.org Tue Mar 30 19:47:35 2004 From: idh at poulet.org (Yannick Wurm) Date: Tue, 30 Mar 2004 19:47:35 -0500 Subject: [BiO BB] DNA Strider In-Reply-To: References: Message-ID: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org> Hi, Thanks for all the feedback! I have checked out both sms2 and emboss, and both seem very powerfull. From what I gather though, both only seem to output text! For restriction enzyme cleavage sites, getting an overview with a schema such as this one can be a great help though: http://bcr.musc.edu/images/dnastrider.gif I know I won't be able to replace DNA Strider before I find something that makes nice visual maps as well... Did I miss something? yannick. On 30-Mar-04, at 7:14 PM, Gary Van Domselaar wrote: > Hey Gang, > > For platform-independant sequence _manipulation_ there is the sequence > manipulation suite: > > http://bioinformatics.org/sms2 > > SMS2 and EMBOSS together cover most of what you can expect to get out > of > an old sequence analysis package like DNA strider. > > Decidedly, > > g. \\\\\\\\\\\\\\\\\\\ \\ http://yannick.poulet.org icq: 22044361 \\ idh at poulet.org tel: ++33.6.16.41.71.92 From idoerg at burnham.org Tue Mar 30 20:19:11 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Tue, 30 Mar 2004 17:19:11 -0800 Subject: [BiO BB] DNA Strider In-Reply-To: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org> References: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org> Message-ID: <406A1C8F.2080701@burnham.org> Yannick Wurm wrote: > Hi, > > > I know I won't be able to replace DNA Strider before I find something > that makes nice visual maps as well... > > Did I miss something? > Yes. EMBOSS does have graphics. Which commands were you looking at? ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From pagarwal at linus.ornl.gov Tue Mar 30 20:20:23 2004 From: pagarwal at linus.ornl.gov (Pratul K. Agarwal) Date: Tue, 30 Mar 2004 20:20:23 -0500 (EST) Subject: [BiO BB] Announcing live CD for bio/chemical modeling Message-ID: http://www.vigyaancd.org/ Vigyaan is an electronic workbench for computational biology and computational chemistry. It has been designed to meet the needs of both beginners and experts. VigyaanCD is a Linux-live CD containing all the required software to boot the computer with ready to use modeling software. From gary at www.bioinformatics.org Tue Mar 30 23:34:05 2004 From: gary at www.bioinformatics.org (Gary Van Domselaar) Date: Tue, 30 Mar 2004 23:34:05 -0500 (EST) Subject: [BiO BB] DNA Strider In-Reply-To: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org> Message-ID: Hi Yannick, The guy who made sms2 also makes a map viewer called CGView. It has been implemented in the web server 'PlasMapper': http://wishart.biology.ualberta.ca/PlasMapper/index.html You can contact him to see about the availability of CGView itself: stothard at ualberta.ca Regards, g. On Tue, 30 Mar 2004, Yannick Wurm wrote: > Hi, > Thanks for all the feedback! > I have checked out both sms2 and emboss, and both seem very powerfull. > From what I gather though, both only seem to output text! > For restriction enzyme cleavage sites, getting an overview with a > schema such as this one can be a great help though: > http://bcr.musc.edu/images/dnastrider.gif > > I know I won't be able to replace DNA Strider before I find something > that makes nice visual maps as well... > > Did I miss something? > > yannick. > > On 30-Mar-04, at 7:14 PM, Gary Van Domselaar wrote: > > Hey Gang, > > > > For platform-independant sequence _manipulation_ there is the sequence > > manipulation suite: > > > > http://bioinformatics.org/sms2 > > > > SMS2 and EMBOSS together cover most of what you can expect to get out > > of > > an old sequence analysis package like DNA strider. > > > > Decidedly, > > > > g. > \\\\\\\\\\\\\\\\\\\ > \\ http://yannick.poulet.org icq: 22044361 > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From stefanielager at fastmail.ca Wed Mar 31 00:11:57 2004 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Wed, 31 Mar 2004 05:11:57 +0000 (UTC) Subject: [BiO BB] DNA Strider In-Reply-To: <0034D92B-82AD-11D8-AF58-000393CAA04A@poulet.org> Message-ID: <20040331051157.37FDF863566@mail.interchange.ca> The TACG program does restriction cleavage really nice, and complements EMBOSS for some functions missing there http://tacg.sourceforge.net/ . For plasmid maps I think you would have to buy a commercial software package. Stefanie > Hi, > Thanks for all the feedback! > I have checked out both sms2 and emboss, and both seem very powerfull. > From what I gather though, both only seem to output text! > For restriction enzyme cleavage sites, getting an overview with a > schema such as this one can be a great help though: > http://bcr.musc.edu/images/dnastrider.gif > > I know I won't be able to replace DNA Strider before I find something > that makes nice visual maps as well... > > Did I miss something? > > yannick. > > On 30-Mar-04, at 7:14 PM, Gary Van Domselaar wrote: >> Hey Gang, >> >> For platform-independant sequence _manipulation_ there is the >> sequence manipulation suite: >> >> http://bioinformatics.org/sms2 >> >> SMS2 and EMBOSS together cover most of what you can expect to get out >> of >> an old sequence analysis package like DNA strider. >> >> Decidedly, >> >> g. > \\\\\\\\\\\\\\\\\\\ > \\ http://yannick.poulet.org icq: 22044361 > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From idh at poulet.org Wed Mar 31 00:22:19 2004 From: idh at poulet.org (Yannick Wurm) Date: Wed, 31 Mar 2004 00:22:19 -0500 Subject: [BiO BB] DNA Strider In-Reply-To: <20040331051157.37FDF863566@mail.interchange.ca> References: <20040331051157.37FDF863566@mail.interchange.ca> Message-ID: <61C5E606-82D3-11D8-AF58-000393CAA04A@poulet.org> Thanks so much! the generated maps look great! Maybe I will be able to convince my biologists to trash DNA Strider after all :) Cheers, Yannick. On 31-Mar-04, at 12:11 AM, Stefanie Lager wrote: > The TACG program does restriction cleavage really nice, and complements > EMBOSS for some functions missing there http://tacg.sourceforge.net/ . > For plasmid maps I think you would have to buy a commercial software > package. > > Stefanie and On 30-Mar-04, at 11:34 PM, Gary Van Domselaar wrote: > Hi Yannick, > > The guy who made sms2 also makes a map viewer called CGView. It has > been > implemented in the web server 'PlasMapper': > http://wishart.biology.ualberta.ca/PlasMapper/index.html > > You can contact him to see about the availability of CGView itself: > stothard at ualberta.ca > > Regards, > > g. \\\\\\\\\\\\\\\\\\\ \\ http://yannick.poulet.org icq: 22044361 \\ idh at poulet.org tel: ++33.6.16.41.71.92 From stefanielager at fastmail.ca Wed Mar 31 05:44:26 2004 From: stefanielager at fastmail.ca (Stefanie Lager) Date: Wed, 31 Mar 2004 10:44:26 +0000 (UTC) Subject: [BiO BB] GI numbers In-Reply-To: <20040326174818.GH23629@genoma4.iq.usp.br> Message-ID: <20040331104426.AB5DF86258A@mail.interchange.ca> Try linking them through LocusLink, either using one of the mapping tables found at: ftp://ftp.ncbi.nih.gov/refseq/LocusLink/ or (a bit more complicaated) using a system like OpenBNS: http://openbns.sourceforge.net/ Stefanie > Hi, > > I'm analyzing a set of sequences with regard to their classifications > as homologs from both COG and Kegg databases of orthologs. Although > both COG and Kegg provide tables relating gene names to GI (PID) > numbers, I'm, up to this moment, unable to map GIs from one dataset to > the other, in order to check classifications for genes in both > catalogs. > > GIs from COG appear to be from RefSeq and those from Kegg seem to be > from GenPept. How can I map GI numbers from Kegg to GI numbers from > COG database? Is there any query I can make to download such info for > 185904 proteins in COG and their equivalents on Kegg Orthologs > database? > > Here is an example: > > Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum > pernix complete genome, as described in COG's table myva=gb. The same > sequence is identified by GI 5103570 in Kegg. In this case, I was able > map COG's GI to Kegg's GI by using the gene identifier and annotation, > a procedure that is not easily automated. > > How can I retrive equivalent IDs for the whole COG gene set? > > Thanks in advance for any help. > Robson _________________________________________________________________ http://fastmail.ca/ - Fast Secure Web Email for Canadians From ml at mb.au.dk Wed Mar 31 06:38:20 2004 From: ml at mb.au.dk (Martin Luetzelberger) Date: Wed, 31 Mar 2004 13:38:20 +0200 (CEST) Subject: [BiO BB] Seqio and fmtseq In-Reply-To: <20040331112503.B6D20D1F06@www.bioinformatics.org> References: <20040331112503.B6D20D1F06@www.bioinformatics.org> Message-ID: Hi, I've seen recently a message no this board about James Knight's Seqio package. There seem to be some problems compiling it under linux with gcc-3.2. Has anybody solved this problem, yet? Is there an official website where the package is maintained? Martin From micheld at mshri.on.ca Wed Mar 31 10:40:37 2004 From: micheld at mshri.on.ca (Michel Dumontier) Date: Wed, 31 Mar 2004 10:40:37 -0500 Subject: [BiO BB] GI numbers References: <20040326174818.GH23629@genoma4.iq.usp.br> Message-ID: <002601c41736$849a3cf0$6400000a@moose> Hi Robson, Since 14600509 and 5103570 are identifiers for identical sequences but from different sources, they can be found in the same definition line in the non-redundant fasta file that NCBI provides on it's FTP site (as a BLAST database distribution - nr.gz). This file and each definition line entry has been imported into Seqhound (http://seqhound.mshri.on.ca), and is searchable under the redundant group module with a variety of programming interfaces (C/C++/Perl/Java). -=Michel=- ----- Original Message ----- From: "Robson Francisco de Souza" To: Cc: Sent: Friday, March 26, 2004 12:48 PM Subject: [BiO BB] GI numbers > Hi, > > I'm analyzing a set of sequences with regard to their classifications as > homologs from both COG and Kegg databases of orthologs. Although both > COG and Kegg provide tables relating gene names to GI (PID) numbers, > I'm, up to this moment, unable to map GIs from one dataset to the other, > in order to check classifications for genes in both catalogs. > > GIs from COG appear to be from RefSeq and those from Kegg seem to be > from GenPept. How can I map GI numbers from Kegg to GI numbers from COG > database? Is there any query I can make to download such info for 185904 > proteins in COG and their equivalents on Kegg Orthologs database? > > Here is an example: > > Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum > pernix complete genome, as described in COG's table myva=gb. The same > sequence is identified by GI 5103570 in Kegg. In this case, I was able map > COG's GI to Kegg's GI by using the gene identifier and annotation, a > procedure that is not easily automated. > > How can I retrive equivalent IDs for the whole COG gene set? > > Thanks in advance for any help. > Robson > From pculpep at hotmail.com Wed Mar 31 11:44:06 2004 From: pculpep at hotmail.com (Pamela Culpepper) Date: Wed, 31 Mar 2004 16:44:06 +0000 Subject: [BiO BB] GI numbers Message-ID: A package, Integrated Genomic Data System, has been submitted to the Baylor College of Medicine Office of Technology . A description of the system is as follows: Integrated Genomics Data System (IGDS) OVERVIEW The Integrated Genomics Data System (IGDS) integrates data from multiple publicly available genomic databases into a relational database format. The core of the IGDS system is a C/C++ program that data mines National Center for Biological Information (NCBI) binary ASN1 files for sequence data. This data is integrated by means of Perl scripts with data from Locus Link, UniGene, Gene Ontology Association, Protein Data Bank, and other sites. The resulting information is uploaded by a Java program into a relational database defined by the Integrated Genomics Database System schema. FEATURES The IGDS is a computational tool for data gathering and interpretation of genomic data, which saves time and reduces repetition of rote processes. A methodology using tested relationships among various pieces of data in different files reduces the necessity for the accrual and processing of massive amounts of data. Data mining tactic based on the NCBI toolkit, thereby utilizing code that has been approved for interpretation of NCBI ANS1. Native representation of pertinent data elements is maintained as are the nesting levels inherent in the ASN1 structure. Data download and interpretation is performed on the most compact representation of NCBI data - ASN1 binary files. Less computer disk space is required to store data files when data mining processes are invoked. Processing of NCBI data in binary format provides optimal computer performance with quick results. Configurable interface affords various levels of processing granularity. Processing may be allocated among many processes on one computer or across several computers. Final relational representation of genomic data provides dynamic inference not possible with flat file or ASN1 data representation The system is fully configurable and will download and interpret the entire NCBI ASN1 sequence library or a few select sequence sets. A separate series of Perl scripts cross-references the NCBI Locus Link/Unigene libraries providing Accession and GI Number, Gene names, Aliase Gene Names, Preferred Gene Names, Clone, Lib, UniGene Id, Tissue, Vector, Organ, Cyto_Genetic_Loc, and relevant Gene Ontology information such as GO Id, catagories, etc. This data can be merged with the ASN1 data to create a a fully integrated DB system of genomic information. A Rational Rose UML Data Model is provided as well as relevant SQL tables. SYSTEM REQUIREMENTS C/C++, Perl, a compiled version of the NCBI toolkit, and a relational database management system. Contact information for the Baylor College of Medicine Office of Technology is as follows -- LarryHope Baylor College of Medicine Office of Technology Administration (i.e. Baylor Licensing) One Baylor Plaza Mail Stop: BCM210 600D Houston, TX 77030 P (713) 798-6821 F (713) 798-1252 lhope at bcm.tmc.edu http://research.bcm.tmc.edu/OTA/index.htm Sincerely, Pam Culpepper >From: "Stefanie Lager" >Reply-To: bio_bulletin_board at bioinformatics.org >To: bio_bulletin_board at bioinformatics.org >Subject: Re: [BiO BB] GI numbers >Date: Wed, 31 Mar 2004 10:44:26 +0000 (UTC) > >Try linking them through LocusLink, either using one of the mapping >tables found at: ftp://ftp.ncbi.nih.gov/refseq/LocusLink/ or (a bit more >complicaated) using a system like OpenBNS: http://openbns.sourceforge.net/ > >Stefanie > > > > Hi, > > > > I'm analyzing a set of sequences with regard to their classifications > > as homologs from both COG and Kegg databases of orthologs. Although > > both COG and Kegg provide tables relating gene names to GI (PID) > > numbers, I'm, up to this moment, unable to map GIs from one dataset to > > the other, in order to check classifications for genes in both > > catalogs. > > > > GIs from COG appear to be from RefSeq and those from Kegg seem to be > > from GenPept. How can I map GI numbers from Kegg to GI numbers from > > COG database? Is there any query I can make to download such info for > > 185904 proteins in COG and their equivalents on Kegg Orthologs > > database? > > > > Here is an example: > > > > Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum > > pernix complete genome, as described in COG's table myva=gb. The same > > sequence is identified by GI 5103570 in Kegg. In this case, I was able > > map COG's GI to Kegg's GI by using the gene identifier and annotation, > > a procedure that is not easily automated. > > > > How can I retrive equivalent IDs for the whole COG gene set? > > > > Thanks in advance for any help. > > Robson >_________________________________________________________________ > http://fastmail.ca/ - Fast Secure Web Email for Canadians >_______________________________________________ >BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _________________________________________________________________ All the action. All the drama. Get NCAA hoops coverage at MSN Sports by ESPN. http://msn.espn.go.com/index.html?partnersite=espn From ulimard at yahoo.com.br Wed Mar 31 12:50:41 2004 From: ulimard at yahoo.com.br (Ulisses) Date: Wed, 31 Mar 2004 14:50:41 -0300 Subject: [BiO BB] Distributed System and Bioinformatic References: Message-ID: <00d001c41748$af3ed030$8902100a@clonline> I'm graduating in computer science and I'm writing a paper about distributed system, but I would like to join with some concepts of bioinformatic. So, I'd like to receive some links about any works in this area. Grato pela aten??o. Ulisses Dias From boris.steipe at utoronto.ca Wed Mar 31 17:43:11 2004 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Wed, 31 Mar 2004 17:43:11 -0500 Subject: [BiO BB] Distributed System and Bioinformatic In-Reply-To: <00d001c41748$af3ed030$8902100a@clonline> Message-ID: Google for "mygrid". best, Boris On Wednesday, Mar 31, 2004, at 12:50 Canada/Eastern, Ulisses wrote: > I'm graduating in computer science and I'm writing a paper about > distributed > system, but I would like to join with some concepts of bioinformatic. > So, > I'd like to receive some links about any works in this area. > > Grato pela aten??o. > Ulisses Dias > > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > From operon at www.ufsm.br Tue Mar 30 19:21:28 2004 From: operon at www.ufsm.br (Marcos Oliveira de Carvalho) Date: Tue, 30 Mar 2004 21:21:28 -0300 Subject: [BiO BB] DNA Strider In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> References: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> Message-ID: It would be nice to have a bioinformatics IDRE ( Integrated Development and Research Environment) with some basic features (sequence visualization, editing, annotation, data management, etc...) and a well designed plug-in API, for easy extension. An idea could be build it on top of the NetBeans platform , and with java we get platform-independent software. (with other languages too, but java do the job) One can suggest add bindings with python/perl/ruby/java and theirs Bio* libraries, interfaces to bioinformatics tools, including output capture (emboss, phred, phrap, BLAST, FASTA, Clustal,R/bioconductor, Mummer,...) . Also interfaces to webservices and data retrieving from major databases. Well, there are lots of possibilities. cheers Marcos On Tue, 30 Mar 2004 18:30:36 -0500, Ryan Golhar wrote: > You know, I'm constantly finding different programs to perform different > tasks. Either client applications, or web-based. Some run on Linux, > others Windows. > > I would like to see 1 application for multiple platforms to performs dna > sequence analysis. I started writing something in Java to do this but > haven't touched in awhile. > > I'm wondering how many people would be interested in helping to develop > a platform-independent application to perform all sorts of sequence > analysis - alignments, snp analysis, assembly, etc. Sort of like GCG, > but free and actually user-friendly and useful. If people are > interested, I think we should talk about a framework and start building > something as needed. > > Any comments? > > ----- > Ryan Golhar > Computational Biologist > The Informatics Institute at > The University of Medicine & Dentistry of NJ > > Phone: 973-972-5034 > Fax: 973-972-7412 > Email: golharam at umdnj.edu > > -----Original Message----- > From: bio_bulletin_board-admin at bioinformatics.org > [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of > Yannick Wurm > Sent: Tuesday, March 30, 2004 10:05 AM > To: bio_bulletin_board at bioinformatics.org > Subject: [BiO BB] DNA Strider > > > Hi, > I'm a student in Bioinformatics and Modeling at a French engineering > school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in > my last year, I'm currently doing a six month internship in a C. > elegans lab at McGill University in Montreal. > The lab's computer are Macs, and besides standard browsing, word > processing and image processing, lab members also use them to aid them > in their molecular biology work. > One of the programs they use is called DNA Strider. This piece of > software has not been updated in a long time (probably since Apple's > System 6.x - window sizes are fixed to the small old mac screen size!) > and could require a face-lift. > > In the lab, it is mainly used for managing and manipulating sequences > of genes, primers and constructs. The main features of interest here > are: > - Sequence management > - Graphical (circular or linear) restriction maps of a given > sequence (or part of it), showing restriction site data concerning the > part or whole sequence (for each enzyme, you get the number of > restriction sites, and the obtained fragement sizes) > - Reverse complementary sequence > - Quick and simple alignment between two sequences > > I've searched the web and could not find an all-in-one package that > seemed as user friendly and coherent as DNA Strider. Individual web > sites and software tools do offer these features, but > - the internet is slow (you click and need to wait before > getting your > result) > - having everything in one place is nice > > Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be > aiming to do what DNA Strider does, but is still very young (and > closed-source, but thats a different debate). > > http://www.mekentosj.com/ has some very nice tools as well, but they're > very problem-specific. > > Have I missed something? Is there a really cool java app or web > software (that I could install locally for speed) that would replace > DNA Strider? What does your molecular biology lab use in for it's day > to day work? > Oh and buying something expensive is not a solution. > > Thanks for any leads, > > Yannick. > > \\\\\\\\\\\\\\\\\\\ > \\ http://yannick.poulet.org icq: 22044361 > \\ idh at poulet.org tel: ++33.6.16.41.71.92 > > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > _______________________________________________ > BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- "Science knows no country, because knowledge belongs to humanity, and is the torch which illuminates the world. " Louis Pasteur From anthony.boureux at crbm.cnrs-mop.fr Wed Mar 31 03:46:37 2004 From: anthony.boureux at crbm.cnrs-mop.fr (Anthony Boureux) Date: Wed, 31 Mar 2004 10:46:37 +0200 Subject: [BiO BB] DNA Strider In-Reply-To: <61C5E606-82D3-11D8-AF58-000393CAA04A@poulet.org> References: <20040331051157.37FDF863566@mail.interchange.ca> <61C5E606-82D3-11D8-AF58-000393CAA04A@poulet.org> Message-ID: <406A856D.3090604@crbm.cnrs-mop.fr> If you want you can always use DNAStrider in MacOSX, there is a new version (1.4). It is a carbon version, so it is running natively under MacOSX. To get it, contact the author (e-mail in about box, he is working in CEA Saclay, France). Anthony Yannick Wurm a ?crit : > Thanks so much! > the generated maps look great! > Maybe I will be able to convince my biologists to trash DNA Strider > after all :) > > Cheers, > Yannick. > > On 31-Mar-04, at 12:11 AM, Stefanie Lager wrote: > >> The TACG program does restriction cleavage really nice, and complements >> EMBOSS for some functions missing there http://tacg.sourceforge.net/ . >> For plasmid maps I think you would have to buy a commercial software >> package. >> >> Stefanie -- Anthony Boureux Tyrosine kinase Lab. CRBM, CNRS FRE2593 1919, route de Mende 34293 Montpellier Cedex 5 +33 (0)467613373 Anthony.Boureux at crbm.cnrs-mop.fr -- passerelle antivirus du campus CNRS de Montpellier -- From lj at bio-code.com Wed Mar 31 16:23:02 2004 From: lj at bio-code.com (LJ) Date: Wed, 31 Mar 2004 13:23:02 -0800 Subject: [BiO BB] DNA Strider In-Reply-To: <016b01c416af$00fcec80$4322db82@GOLHARMOBILE1> Message-ID: i like this idea quite a bit. i know several programmers/bioinformaticists who might want to help out. they are really impressed with python these days lon james postgres, inc 1 fair oaks san francisco, ca 94110 415-573-9192 lon at efcodd.org -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org]On Behalf Of Ryan Golhar Sent: Tuesday, March 30, 2004 3:31 PM To: bio_bulletin_board at bioinformatics.org Subject: RE: [BiO BB] DNA Strider You know, I'm constantly finding different programs to perform different tasks. Either client applications, or web-based. Some run on Linux, others Windows. I would like to see 1 application for multiple platforms to performs dna sequence analysis. I started writing something in Java to do this but haven't touched in awhile. I'm wondering how many people would be interested in helping to develop a platform-independent application to perform all sorts of sequence analysis - alignments, snp analysis, assembly, etc. Sort of like GCG, but free and actually user-friendly and useful. If people are interested, I think we should talk about a framework and start building something as needed. Any comments? ----- Ryan Golhar Computational Biologist The Informatics Institute at The University of Medicine & Dentistry of NJ Phone: 973-972-5034 Fax: 973-972-7412 Email: golharam at umdnj.edu -----Original Message----- From: bio_bulletin_board-admin at bioinformatics.org [mailto:bio_bulletin_board-admin at bioinformatics.org] On Behalf Of Yannick Wurm Sent: Tuesday, March 30, 2004 10:05 AM To: bio_bulletin_board at bioinformatics.org Subject: [BiO BB] DNA Strider Hi, I'm a student in Bioinformatics and Modeling at a French engineering school in Lyon, France (http://biosciences.insa-lyon.fr). Currently in my last year, I'm currently doing a six month internship in a C. elegans lab at McGill University in Montreal. The lab's computer are Macs, and besides standard browsing, word processing and image processing, lab members also use them to aid them in their molecular biology work. One of the programs they use is called DNA Strider. This piece of software has not been updated in a long time (probably since Apple's System 6.x - window sizes are fixed to the small old mac screen size!) and could require a face-lift. In the lab, it is mainly used for managing and manipulating sequences of genes, primers and constructs. The main features of interest here are: - Sequence management - Graphical (circular or linear) restriction maps of a given sequence (or part of it), showing restriction site data concerning the part or whole sequence (for each enzyme, you get the number of restriction sites, and the obtained fragement sizes) - Reverse complementary sequence - Quick and simple alignment between two sequences I've searched the web and could not find an all-in-one package that seemed as user friendly and coherent as DNA Strider. Individual web sites and software tools do offer these features, but - the internet is slow (you click and need to wait before getting your result) - having everything in one place is nice Sequence Analysis (for Mac OS X) http://informagen.com/SA/ seems to be aiming to do what DNA Strider does, but is still very young (and closed-source, but thats a different debate). http://www.mekentosj.com/ has some very nice tools as well, but they're very problem-specific. Have I missed something? Is there a really cool java app or web software (that I could install locally for speed) that would replace DNA Strider? What does your molecular biology lab use in for it's day to day work? Oh and buying something expensive is not a solution. Thanks for any leads, Yannick. \\\\\\\\\\\\\\\\\\\ \\ http://yannick.poulet.org icq: 22044361 \\ idh at poulet.org tel: ++33.6.16.41.71.92 _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board _______________________________________________ BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board