From golharam at umdnj.edu  Mon Aug  3 13:45:57 2009
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 03 Aug 2009 13:45:57 -0400
Subject: [BiO BB] time efficient global alignment algorithm
Message-ID: <4A772255.4030301@umdnj.edu>

I'm trying to perform a large amount of sequence alignments of long DNA 
sequences, some up to 163,000+ bp in length.  I was trying to use the 
standard Needleman-Wunsch algorithm, but the matrix used requires a 
large amount of memory...about 100 GB of memory.  This obviously won't work.

I tried using stretcher from the EMBOSS package, but it takes way too 
long to align each pair of sequences.  I'm looking for something that 
can perform alignments fast using a reasonable amount of memory.

I found one tool, called AVID, but have been unsuccessful in getting it 
to run to the sequence set I have.

Before I go an try to develop a new solution to this, does anyone have 
or recommend a program to perform a large number of global pairwise 
alignments for long sequences?

Ideally, something with the speed similar to BLAST.

Ryan


From marchywka at hotmail.com  Mon Aug  3 16:29:02 2009
From: marchywka at hotmail.com (Mike Marchywka)
Date: Mon, 3 Aug 2009 16:29:02 -0400
Subject: [BiO BB] time efficient global alignment algorithm
In-Reply-To: <4A772255.4030301@umdnj.edu>
References: <4A772255.4030301@umdnj.edu>
Message-ID: <BLU113-W120CD5AE5E34036D8E1A32BE0F0@phx.gbl>


----------------------------------------
> Date: Mon, 3 Aug 2009 13:45:57 -0400
> From: golhara
> To: bbb at bioinformatics.org
> Subject: [BiO BB] time efficient global alignment algorithm
>
> I'm trying to perform a large amount of sequence alignments of long DNA
> sequences, some up to 163,000+ bp in length. I was trying to use the
> standard Needleman-Wunsch algorithm, but the matrix used requires a
> large amount of memory...about 100 GB of memory. This obviously won't work.

How many were you trying to align? You mean 163kb or 163Mb?
I was looking for test or comparisons for some alignment code I 
had which indexed the target sequences, don't recall the suggestions
for that discussion but I was able to do simple genomes reasonably well 
( I think I used 2 strains of e coli or something about 5 megs long)
on a desktop. If you can find responses to my request from a few years 
ago that may ( or may not ) help. I'd offer my code, and indeed I think
I have it on a website, but I stopped development and not sure
it is nearly useful as-is unless you just want coarse alignment on
two similar sequences. 

Many implementations of just about anything are bad with
memory management- sometimes just blocking or sorting or
compacting the internal representation can make a big improvement.
Not sure what exists along these lines but often some simplifcations
don't change results but decrease time/memory on futile possibilities. 


>
> I tried using stretcher from the EMBOSS package, but it takes way too
> long to align each pair of sequences. I'm looking for something that
> can perform alignments fast using a reasonable amount of memory.
>
> I found one tool, called AVID, but have been unsuccessful in getting it
> to run to the sequence set I have.
>
> Before I go an try to develop a new solution to this, does anyone have
> or recommend a program to perform a large number of global pairwise
> alignments for long sequences?

Are all of these nominally the same or are you trying to align
noise to noise? 

>
> Ideally, something with the speed similar to BLAST.

I guess in an odd way my approach could get there as it essentially 
queries each string for "interesting" short sequences but I'd have to
check order ( howmany of these does it use etc). Last time I checked the
academic lit, IIRC this exact-string matching was an open research area maybe there have been advancements
in last few years that are trivial to code or exist in an academic's lab.

>
> Ryan
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Get your vacation photos on your phone!
http://windowsliveformobile.com/en-us/photos/default.aspx?&OCID=0809TL-HM


From dan.bolser at gmail.com  Tue Aug  4 04:41:23 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Tue, 4 Aug 2009 09:41:23 +0100
Subject: [BiO BB] time efficient global alignment algorithm
In-Reply-To: <4A772255.4030301@umdnj.edu>
References: <4A772255.4030301@umdnj.edu>
Message-ID: <2c8757af0908040141p116e87e7rbfa9caab5006feec@mail.gmail.com>

2009/8/3 Ryan Golhar <golharam at umdnj.edu>:
> I'm trying to perform a large amount of sequence alignments of long DNA
> sequences, some up to 163,000+ bp in length. ?I was trying to use the
> standard Needleman-Wunsch algorithm, but the matrix used requires a large
> amount of memory...about 100 GB of memory. ?This obviously won't work.

For two sequences in the region of > 85% similarity, MUMMER [1] works
very well.

For example, aligning two strains of e. coli on my desktop, both in
the region of 460 kb:

* U00096 (Escherichia coli str. K-12 substr. MG1655)
* CP000948 (Escherichia coli str. K12 substr. DH10B)


time nucmer U00096.fasta CP000948.fasta

real    0m14.035s
user    0m11.370s
sys     0m0.400s


It uses k-mer based alignment heuristics to do things very quickly and
efficiently.


HTH,
Dan.

[1] http://mummer.sourceforge.net/


> I tried using stretcher from the EMBOSS package, but it takes way too long
> to align each pair of sequences. ?I'm looking for something that can perform
> alignments fast using a reasonable amount of memory.
>
> I found one tool, called AVID, but have been unsuccessful in getting it to
> run to the sequence set I have.
>
> Before I go an try to develop a new solution to this, does anyone have or
> recommend a program to perform a large number of global pairwise alignments
> for long sequences?
>
> Ideally, something with the speed similar to BLAST.
>
> Ryan
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From nuin at genedrift.org  Mon Aug  3 16:14:01 2009
From: nuin at genedrift.org (Paulo Nuin)
Date: Mon, 3 Aug 2009 16:14:01 -0400
Subject: [BiO BB] time efficient global alignment algorithm
In-Reply-To: <4A772255.4030301@umdnj.edu>
References: <4A772255.4030301@umdnj.edu>
Message-ID: <6EA37FCD-3E26-4571-895A-ABA50BD97F49@genedrift.org>

Hi


MUMMer never failed me. Check at

http://mummer.sourceforge.net/

HTH

Paulo

On 3-Aug-09, at 1:45 PM, Ryan Golhar wrote:

> I'm trying to perform a large amount of sequence alignments of long  
> DNA sequences, some up to 163,000+ bp in length.  I was trying to  
> use the standard Needleman-Wunsch algorithm, but the matrix used  
> requires a large amount of memory...about 100 GB of memory.  This  
> obviously won't work.
>
> I tried using stretcher from the EMBOSS package, but it takes way  
> too long to align each pair of sequences.  I'm looking for something  
> that can perform alignments fast using a reasonable amount of memory.
>
> I found one tool, called AVID, but have been unsuccessful in getting  
> it to run to the sequence set I have.
>
> Before I go an try to develop a new solution to this, does anyone  
> have or recommend a program to perform a large number of global  
> pairwise alignments for long sequences?
>
> Ideally, something with the speed similar to BLAST.
>
> Ryan
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb


From golharam at umdnj.edu  Tue Aug  4 10:26:20 2009
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 04 Aug 2009 10:26:20 -0400
Subject: [BiO BB] time efficient global alignment algorithm
In-Reply-To: <BLU113-W120CD5AE5E34036D8E1A32BE0F0@phx.gbl>
References: <4A772255.4030301@umdnj.edu>
	<BLU113-W120CD5AE5E34036D8E1A32BE0F0@phx.gbl>
Message-ID: <4A78450C.2030700@umdnj.edu>

>> I'm trying to perform a large amount of sequence alignments of long DNA
>> sequences, some up to 163,000+ bp in length. I was trying to use the
>> standard Needleman-Wunsch algorithm, but the matrix used requires a
>> large amount of memory...about 100 GB of memory. This obviously won't work.
> 
> How many were you trying to align? You mean 163kb or 163Mb?
> I was looking for test or comparisons for some alignment code I 
> had which indexed the target sequences, don't recall the suggestions
> for that discussion but I was able to do simple genomes reasonably well 
> ( I think I used 2 strains of e coli or something about 5 megs long)
> on a desktop. If you can find responses to my request from a few years 
> ago that may ( or may not ) help. I'd offer my code, and indeed I think
> I have it on a website, but I stopped development and not sure
> it is nearly useful as-is unless you just want coarse alignment on
> two similar sequences. 

Hundreds of thousands.  I'm trying to eliminate duplicates or near 
duplicates (>90% similarity).  I'm using the methodology from 
cd-hit-est.  However I'm not successful in getting that application to 
run on the number of sequences I have.  Right now, I'm trying to cluster 
the nt database, however later I would like to cluster other sequences 
from other sources.

> Many implementations of just about anything are bad with
> memory management- sometimes just blocking or sorting or
> compacting the internal representation can make a big improvement.
> Not sure what exists along these lines but often some simplifcations
> don't change results but decrease time/memory on futile possibilities. 

Agreed.  However in doing the dynamic programming matrix, you still need 
to allocate an m x n matrix of ints.  With sequences of 163,000 bp in 
length, you need about 100GB of RAM.  Unless there is a way to using a 
compact representation of the DP matrix that I'm not aware of.

> Are all of these nominally the same or are you trying to align
> noise to noise? 

Yes, they are nominally the same...they have at least 50% of the 
non-overlapping words of the shorter of the two sequences.

>> Ideally, something with the speed similar to BLAST.
> 
> I guess in an odd way my approach could get there as it essentially 
> queries each string for "interesting" short sequences but I'd have to
> check order ( howmany of these does it use etc). Last time I checked the
> academic lit, IIRC this exact-string matching was an open research area maybe there have been advancements
> in last few years that are trivial to code or exist in an academic's lab.

If there are, I haven't heard of any.  My thought was to run a BLAST 
alignment on the two sequences using bl2seq.  Then string together the 
non-overlapping HSPs and perform a global alignment on the regions in 
between the HSPs.  This is easy enough, but I want to see if there is a 
solution already out there first.

Ryan


From marty.gollery at gmail.com  Tue Aug  4 16:54:07 2009
From: marty.gollery at gmail.com (Martin Gollery)
Date: Tue, 4 Aug 2009 13:54:07 -0700
Subject: [BiO BB] time efficient global alignment algorithm
In-Reply-To: <4A772255.4030301@umdnj.edu>
References: <4A772255.4030301@umdnj.edu>
Message-ID: <bdd10c2a0908041354g4b8f5897h651cf187bec17480@mail.gmail.com>

Ryan, are you trying to do lots of one-to-one alignments, or one very large
multiple sequence alignment?
Marty

On Mon, Aug 3, 2009 at 10:45 AM, Ryan Golhar <golharam at umdnj.edu> wrote:

> I'm trying to perform a large amount of sequence alignments of long DNA
> sequences, some up to 163,000+ bp in length.  I was trying to use the
> standard Needleman-Wunsch algorithm, but the matrix used requires a large
> amount of memory...about 100 GB of memory.  This obviously won't work.
>
> I tried using stretcher from the EMBOSS package, but it takes way too long
> to align each pair of sequences.  I'm looking for something that can perform
> alignments fast using a reasonable amount of memory.
>
> I found one tool, called AVID, but have been unsuccessful in getting it to
> run to the sequence set I have.
>
> Before I go an try to develop a new solution to this, does anyone have or
> recommend a program to perform a large number of global pairwise alignments
> for long sequences?
>
> Ideally, something with the speed similar to BLAST.
>
> Ryan
>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
-- 
Martin Gollery
Senior Bioinformatics Scientist
Tahoe Informatics
www.bioinformaticist.biz
www.hiddenmarkovmodels.com


From golharam at umdnj.edu  Tue Aug  4 17:00:47 2009
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 04 Aug 2009 17:00:47 -0400
Subject: [BiO BB] time efficient global alignment algorithm
In-Reply-To: <bdd10c2a0908041354g4b8f5897h651cf187bec17480@mail.gmail.com>
References: <4A772255.4030301@umdnj.edu>
	<bdd10c2a0908041354g4b8f5897h651cf187bec17480@mail.gmail.com>
Message-ID: <4A78A17F.8050902@umdnj.edu>

One-to-one alignments

Martin Gollery wrote:
> Ryan, are you trying to do lots of one-to-one alignments, or one very large
> multiple sequence alignment?
> Marty
> 
> On Mon, Aug 3, 2009 at 10:45 AM, Ryan Golhar <golharam at umdnj.edu> wrote:
> 
>> I'm trying to perform a large amount of sequence alignments of long DNA
>> sequences, some up to 163,000+ bp in length.  I was trying to use the
>> standard Needleman-Wunsch algorithm, but the matrix used requires a large
>> amount of memory...about 100 GB of memory.  This obviously won't work.
>>
>> I tried using stretcher from the EMBOSS package, but it takes way too long
>> to align each pair of sequences.  I'm looking for something that can perform
>> alignments fast using a reasonable amount of memory.
>>
>> I found one tool, called AVID, but have been unsuccessful in getting it to
>> run to the sequence set I have.
>>
>> Before I go an try to develop a new solution to this, does anyone have or
>> recommend a program to perform a large number of global pairwise alignments
>> for long sequences?
>>
>> Ideally, something with the speed similar to BLAST.
>>
>> Ryan
>>
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
> 
> 
> 


From u4113344 at dcsmail.anu.edu.au  Tue Aug  4 19:33:47 2009
From: u4113344 at dcsmail.anu.edu.au (Luke Nguyen-Hoan)
Date: Wed, 05 Aug 2009 09:33:47 +1000
Subject: [BiO BB] Short Survey of Scientific Software Development
Message-ID: <4A78C55B.3000207@dcsmail.anu.edu.au>

My name is Luke Nguyen-Hoan, and I am a PhD candidate in the Department 
of Computer Science at the Australian National University in the area of 
software intensive systems engineering. I am running a survey on current 
practices in scientific software development, and would like to invite 
you to take part.

The survey will take approximately 10 minutes to complete, and is 
intended for people who have had experience in developing scientific 
software applications. If this does not apply to you, please accept my 
apologies.

If you would like to participate, the survey is available online at
https://apollo.anu.edu.au/default.asp?pid=3900

Thank you for your help with this research,

Luke Nguyen-Hoan

http://cs.anu.edu.au/~Luke.Nguyen-Hoan


From dan.bolser at gmail.com  Wed Aug  5 03:36:14 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Wed, 5 Aug 2009 08:36:14 +0100
Subject: [BiO BB] time efficient global alignment algorithm
In-Reply-To: <4A78450C.2030700@umdnj.edu>
References: <4A772255.4030301@umdnj.edu>
	<BLU113-W120CD5AE5E34036D8E1A32BE0F0@phx.gbl>
	<4A78450C.2030700@umdnj.edu>
Message-ID: <2c8757af0908050036l122ce716vb8457e5bd80fe536@mail.gmail.com>

2009/8/4 Ryan Golhar <golharam at umdnj.edu>:
>>> I'm trying to perform a large amount of sequence alignments of long DNA
>>> sequences, some up to 163,000+ bp in length. I was trying to use the
>>> standard Needleman-Wunsch algorithm, but the matrix used requires a
>>> large amount of memory...about 100 GB of memory. This obviously won't
>>> work.
>>
>> How many were you trying to align? You mean 163kb or 163Mb?
>> I was looking for test or comparisons for some alignment code I had which
>> indexed the target sequences, don't recall the suggestions
>> for that discussion but I was able to do simple genomes reasonably well (
>> I think I used 2 strains of e coli or something about 5 megs long)
>> on a desktop. If you can find responses to my request from a few years ago
>> that may ( or may not ) help. I'd offer my code, and indeed I think
>> I have it on a website, but I stopped development and not sure
>> it is nearly useful as-is unless you just want coarse alignment on
>> two similar sequences.
>
> Hundreds of thousands. ?I'm trying to eliminate duplicates or near
> duplicates (>90% similarity). ?I'm using the methodology from cd-hit-est.
> ?However I'm not successful in getting that application to run on the number
> of sequences I have. ?Right now, I'm trying to cluster the nt database,
> however later I would like to cluster other sequences from other sources.

First thing that came to mind when I read the above was cd-hit. What
is cd-hit-est and how come it fails?

I'm curious because I'm maintaining (or was) the cd-hit website for
the project on bioinformatics.org:

http://www.bioinformatics.org/cd-hit/


I'm planning to move that over into the wiki where it can (hopefully)
stay more up to date.

Dan.


From paolo.romano at istge.it  Thu Aug  6 03:35:37 2009
From: paolo.romano at istge.it (Paolo Romano)
Date: Thu, 06 Aug 2009 09:35:37 +0200
Subject: [BiO BB] CFP: SWAT4LS 2009 Semantic Web Applications and Tools for
 Life Sciences
Message-ID: <200908060736.n767Zoq7010845@clus2.istge.it>

Apologies for possible multiple posts
-----------------------------------------------------

First CFP: SWAT4LS Semantic Web Applications and Tools for Life Sciences 2009

***Location and date
Amsterdam, Science Park, November 20th 2009
(http://www.swat4ls.org/2009/)

***Rationale

The adoption of semantic-enabled applications and 
collaborative social environments is ever more 
common in the Life Sciences. The Semantic Web 
provides a set of technologies and standards that 
are key to support semantic markup, ontology 
development, distributed information resources 
and collaborative social environemnts. Altogether 
the adoption of the Semantic Web in the Life 
Sciences has potential impact on the future of 
publishing, biological research and medecine. 
This workshop will provide a venue to present and 
discuss benefits and limits of the adoption of 
these technologies and tools in biomedical 
informatics and computational biology. It will 
showcase experiences, information resources, 
tools development and applications. It will bring 
together researchers, both developers and users, 
from the various fields of Biology, 
Bioinformatics and Computer Science, to discuss 
goals, current limits and some real use cases for 
Semantic Web technologies in Life Sciences.

***Topics

Topics of interest include, but are not limited to:

* Standards, Technologies, Tools for the Semantic Web
      o Semantic Web standards and new proposals (RDF, OWL, SKOS,... )
      o Biomedical Ontologies and related tools
      o Formal approaches to large biomedical knowledge bases
* Systems for a Semantic Web for Bioinformatics
      o RDF stores, Reasoners, query and 
visualization systems for life sciences
      o Semantic biomedical Web Services
      o Semantics aware Biological Data Integration Systems
* Existing and prospective applications of the Semantic Web for Bioinformatics
      o Semantics aware application tools
      o Semantic collaborative research environments
      o Case studies, use cases, and scenarios

***Scientific Committee (committed so far)

* Christopher J. O. Baker, Department of Computer 
Science and Applied Statistics, University of New Brunswick, Saint John, Canada
* Pedro Barahona, Department of Informatics, New 
University of Lisboa, Lisboa, Portugal
* Liliana Barrio-Alvers, Transinsight GmbH, Dresden, Germany
* Olivier Bodenreider, National Library of 
Medicine, Bethesda, United States of America
* Matt-Mouley Bouamrane, School of Computer 
Science, University of Manchester, Manchester, United Kingdom
* Werner Ceusters, NY CoE in Bioinformatics and 
Life Sciences, University at Buffalo, Buffalo, United States of America
* Kei Cheung, Center for Medical Informatics, 
Yale University School of Medicine, New Haven, United States of America
* Tim Clark, Center for Innovative Computing, 
Harvard University, United States of America
* Marie-Dominique Devignes, LORIA, Vandoeuvre les Nancy, France
* Olivier Dameron, INSERM U936, University of Rennes 1, Rennes, France
* Michel Dumontier, Carleton University, Ottawa, Ontario, Canada
* Huajun Chen, Zhejiang University, Hangzhou, China
* Duncan Hull, School of Chemistry, University of 
Manchester, Manchester, United Kingdom
* C. Maria Keet, Faculty of Computer Science, 
Free University of Bozen-Bolzano, Bolzano, Italy
* Graham Kemp, Chalmers University of Technology, Sweden
* Jacob Tilman Koehler, Department of Molecular 
Biotechnology, Institute of Medical Biology, 
University of Troms?, Troms?, Norway
* Michael Krauthammer, Department of Pathology, 
Yale University School of Medicine, United States of America
* Martin Kuiper, Department of Pathology, Systems 
Biology group, Department of Biology, Norwegian 
University of Science and Technology, Trondheim, Norway
* Patrick Lambrix, Department of Computer and 
Information Science, Link?ping University, Link?ping, Sweden
* Phillip Lord, School of Computing Science, 
Newcastle University, Newcastle-upon-Tyne, United Kingdom
* M. Scott Marshall, Leiden University Medical 
Center / University of Amsterdam, Amsterdam, The Netherlands
* Chris Mungall, Lawrence Berkeley National 
Laboratories, United States of America
* Stephan Philippi, Institute for Software 
Technology, University of Koblenz-Landau, Koblenz, Germany
* Marco Roos, Instituut voor Informatica, 
University of Amsterdam, Amsterdam, The Netherlands
* Alan Ruttenberg, Science Commons, Cambridge, United States of America
* Matthias Samwald, DERI, Galway, Ireland, and 
Konrad Lorenz Institute for Evolution and 
Cognition Research, Altenberg, Austria
* Nigam Shah, Center for Biomedical Informatics 
Research, Stanford, United States of America
* Michael Schr?der, Biotechnology Centre, TU Dresden, Dresden, Germany
* Robert Stevens, School of Computer Science, 
University of Manchester, Manchester, United Kingdom
* Tetsuro Toyoda, Genomic Sciences Center, RIKEN, Yokohama, Japan
* Mark D. Wilkinson, iCAPTURE Center, St. Paul Hospital, Vancouver, Canada

and the organizers

***Type of contributions

The following possible contributions are sought:

* Oral communications (regular papers)
* Posters
* Software demos

All accepted oral communications and posters will be published with
CEUR-WS.

***Deadlines

* Submission openinig: 1 September 2009
* Submission of oral communications: 28 September 2009
* Submission for posters and demos: 15 October 2009
* Communication of acceptance: 23 October 2009
* Camera ready: 6 November 2009

***Instructions

All papers and posters must be in English and 
must be submitted through the EasyChair review 
system at http://www.easychair.org/conferences/?conf=swat4ls-09 .
Please upload all submissions as PDF files in 
LNCS format (see http://www.springer.de/comp/lncs/authors.html).
To ensure high quality, submitted papers will be 
carefully peer-reviewed by at least three members of the Scientific Committee.

* Submissions for Oral communications should be between 10 and 15 pages.
* Posters submissions should be between 4 and 8 pages.
* Software demo proposals should also be between 4 and 8 pages.

***Proceedings

All accepted oral communications and posters will 
be published with the CEUR-WS.org Workshop 
Proceedings service (see http://ceur-ws.org/).
We are in the process of negotiating the 
possibility to have a special issue of a major 
bioinformatics journal related to the 2009 
edition of swat4ls. To this end, a special Call 
will be launched shortly after the workshop, for 
extended and revised versions of contributions 
submitted to the workshop and accepted either as oral communication or poster.

***Organization

* M. Scott Marshall, Leiden University Medical 
Center / University of Amsterdam, The Netherlands
* Albert Burger, School of Mathematical and 
Computer Sciences, Heriot-Watt University, and 
Human Genetics Unit, Medical Research Council, 
Edinburgh, Scotland, United Kingdom
* Adrian Paschke, Corporate Semantic Web, Freie Universitaet Berlin, Germany
* Paolo Romano, Bioinformatics, National Cancer 
Research Institute, Genova, Italy
* Andrea Splendiani, Biomathematics and 
Bioinformatics dept., Rothamsted Research, UK

-----------
For any further information or clarification, 
please visit the website at 
http://www.swat4ls.org/2009 or contact the 
organization by email at info @ swat4ls.org


Paolo Romano (paolo.romano at istge.it)
Bioinformatics
National Cancer Research Institute (IST)


From Lambert at Chatham.edu  Mon Aug 10 17:01:11 2009
From: Lambert at Chatham.edu (Lambert, Lisa)
Date: Mon, 10 Aug 2009 17:01:11 -0400
Subject: [BiO BB] What's happened to Softberry?
Message-ID: <370F994DA14AF6449AA6A1FFCBDF6D93F146E97EE9@MAILBOX.chatham.local>

  	
Does anyone know what's happened to Softberry.com? I use their FGENESH software on a regular basis, but I haven't been able to access their site at all for several days now.

Lisa Lambert
Chatham University


From sariego9 at yahoo.com  Tue Aug 11 15:11:07 2009
From: sariego9 at yahoo.com (Diego Martinez)
Date: Tue, 11 Aug 2009 12:11:07 -0700 (PDT)
Subject: [BiO BB] What's happened to Softberry?
In-Reply-To: <370F994DA14AF6449AA6A1FFCBDF6D93F146E97EE9@MAILBOX.chatham.local>
References: <370F994DA14AF6449AA6A1FFCBDF6D93F146E97EE9@MAILBOX.chatham.local>
Message-ID: <96197.65709.qm@web32501.mail.mud.yahoo.com>

http://www.softberry.ru/

russian site is up, I dont know why the .com does'nt resolve. if this goes down I will ask them,
I used to work for on of the owners.

Diego


----- Original Message ----
From: "Lambert, Lisa" <Lambert at Chatham.edu>
To: "bbb at bioinformatics.org" <bbb at bioinformatics.org>
Sent: Monday, August 10, 2009 3:01:11 PM
Subject: [BiO BB] What's happened to Softberry?

      
Does anyone know what's happened to Softberry.com? I use their FGENESH software on a regular basis, but I haven't been able to access their site at all for several days now.

Lisa Lambert
Chatham University

_______________________________________________
BBB mailing list
BBB at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bbb


From rsachdev at usc.edu  Sun Aug 23 01:03:46 2009
From: rsachdev at usc.edu (Rohan Sachdeva)
Date: Sat, 22 Aug 2009 22:03:46 -0700
Subject: [BiO BB] Creating animation from data?
Message-ID: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com>

Hi all,
I have time series data. TRFLP from environmental bacterial samples to be
exact. The outputs look like this
http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg

I would like that animate the data from month but using the actual data and
not just images so it looks like a smooth animation through time. Is there
anything out there that can do this?

Thanks,
Rohan


From mbhd.pha at gmail.com  Mon Aug 24 19:10:24 2009
From: mbhd.pha at gmail.com (BH)
Date: Mon, 24 Aug 2009 19:10:24 -0400
Subject: [BiO BB] how to find conserved genes among viral genomes?
In-Reply-To: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com>
References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com>
Message-ID: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>

Hi,

Does anyone know how to find the conserved genes among the genomes (virus or
phage genomes in particular)? Are there Bioinformatic tools/methods
available for this?

Will appreciate your suggestions. Thanks.


From kanagasa at i2r.a-star.edu.sg  Tue Aug 25 00:35:14 2009
From: kanagasa at i2r.a-star.edu.sg (Kanagasabai Rajaraman)
Date: Tue, 25 Aug 2009 12:35:14 +0800
Subject: [BiO BB] CFP: Bioinformatics Track @ ACM Symposium on Applied
	Computing (SAC) 2010 - due Sep 8, 2009
Message-ID: <162B8AFBFBBB2148A9A1B8F9C5753428053B5AA8@mailbe01.teak.local.net>


CALL FOR PAPERS - SAC BIO 2010

Bioinformatics and Computational Systems Biology Track
The 25th ACM Symposium on Applied Computing
22 - 26 March 2010
Sierre, Switzerland 
http://www.nrcbioinformatics.ca/acmsac2010/

*** Papers Due Sep 8, 2009

Track description and motivations

The publishing of the draft of the human genome and the recent advancements in high throughput sequencing and functional genomics technologies has ushered in a new era of rapid and exponential growth of data related to how organisms function at the molecular level. A major part of the information to support this understanding is available on large number of heterogeneous databases in both structured and unstructured formats. One challenge is to obtain information and knowledge from these databases and integrate them in a semantically consistent way, in order to be able to analyze them using novel quantitative conceptual and computational approaches smoothly connecting models and experiments. This can offer life scientists a deeper system-level understanding of fundamental biological principles. Examples of computational challenges in this new research paradigm, called systems biology, include identification of biological pathways, structure annotation of proteins, inference of biochemical networks and pathways using experimental data, information, and knowledge scattered over heterogeneous databases. 
   
The convergence of computer science and biology is both a data- and model- driven new science that necessitates the development of mathematical/computational models and data mining algorithms, that can enable scientists and bio-engineers to analyze with predictive ability biological information  that guide the development of therapeutic and biotechnology solutions.
   
This track is motivated by the rapidly growing importance of the informatics vision for novel levels of understanding in complex biological and biomedical systems, and will address research issues related to the whole spectrum of bioinformatics with a particular focus on integrative, inferential and translational bioinformatics.


List of topics

Papers are solicited in, but not limited to the following areas:

* Algebraic biology
* Bio imaging
* Bioinformatics for drug design & discovery
* Biological databases, warehousing and management
* Biomedical data integration, metadata & ontologies
* Biomarker identification and annotation
* Biomedical text mining
* Computational and Comparative genomics
* Data visualization and visual analytics
* Disease informatics
* Evolution and phylogenetics
* Gene expression/regulation & microarrays
* Healthcare applications
* High-performance bio-computing
* Inference of biochemical network models from experimental data
* Integrative bioinformatics
* Laboratory information management systems in biology
* Model driven analysis of biological systems
* Modeling, analysis and Inference of gene and protein networks
* Molecular modeling and simulation
* Molecular sequence analysis
* Pathways identification
* Population genetics
* Proteomics
* Protein & RNA structure and function
* Protein structure prediction and modeling
* Recognition of genes and regulatory elements
* Semantic technologies for life sciences
* Sequence analysis & alignment
* SNPs, mutations and haplotyping
* Structural bioinformatics
* Tool integration, web services and workflow systems


Papers submission

All submissions should represent original and previously unpublished works that are currently not under review in any conference or journal. Both basic and applied research papers are welcome. The author(s) name(s) and address(s) must NOT appear in the body of the submitted paper, and self-references should be in the third person. This is to facilitate blind review required by ACM. All submitted papers must include the paper identification number on the front page, above the title of the paper provided to you by the eCMS when you register your paper. 

All enquiries and questions should be directed to the Track Chairs. Additional details are available at the track home page at http://www.nrcbioinformatics.ca/acmsac2010/ . 


Important dates

Paper submission: September 8, 2009
Notification of paper acceptance/rejection: October 19, 2009
Camera ready: November 2, 2009


Conference Paper Publication 

All papers will be fully refereed and undergo a blind review process by at least three referees. The conference proceedings will be published by ACM. Hence, all accepted papers should be submitted in ACM 2-column camera-ready format for publication in the symposium proceedings. The final version of the paper should not be more than 5 pages long. An additional 3 pages are allowed with a charge of 80USD per extra page. Final Camera-ready submissions must follow the template available at: http://www.acm.org/conferences/sac/sac2010/.


Publication in Journal/Book

Expanded versions of selected papers will be published as a special IGI Global book volume.  Authors will be contacted after the presentation of these papers at the SAC Conference.

Poster Publication of Selected Papers 

A set of selected papers will be accepted as poster papers by invitation only and will be published as short papers in the symposium proceedings.


Track chairs

Paola Lecca, Ph.D.
Microsoft Research Center
University of Trento, Italy.
Email: lecca at cosbi.eu

Kanagasabai Rajaraman, Ph.D.
Institute for Infocomm Research, Singapore.
E-mail: kanagasa at i2r.a-star.edu.sg

Dan Tulpan, Ph.D.
Institute of Information Technology	
National Research Council of Canada, Canada
E-mail: dan.tulpan at nrc-cnrc.gc.ca

Institute for Infocomm Research disclaimer:  "This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you."


From dan.bolser at gmail.com  Tue Aug 25 02:32:32 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Tue, 25 Aug 2009 07:32:32 +0100
Subject: [BiO BB] Creating animation from data?
In-Reply-To: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com>
References: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com>
Message-ID: <2c8757af0908242332t2d7b2ca5wa52f88f714ed5239@mail.gmail.com>

2009/8/23 Rohan Sachdeva <rsachdev at usc.edu>:
> Hi all,
> I have time series data. TRFLP from environmental bacterial samples to be
> exact. The outputs look like this
> http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg
>
> I would like that animate the data from month but using the actual data and
> not just images so it looks like a smooth animation through time. Is there
> anything out there that can do this?

You can do this with imagemagick or ffmpeg (or both!)... Probably lots
of other tools too!


From dan.bolser at gmail.com  Tue Aug 25 02:34:50 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Tue, 25 Aug 2009 07:34:50 +0100
Subject: [BiO BB] how to find conserved genes among viral genomes?
In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com>
	<8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
Message-ID: <2c8757af0908242334vb5a93cai29f84915b3a639ea@mail.gmail.com>

2009/8/25 BH <mbhd.pha at gmail.com>:
> Hi,
>
> Does anyone know how to find the conserved genes among the genomes (virus or
> phage genomes in particular)? Are there Bioinformatic tools/methods
> available for this?

The very broad method that is directly applicable is simply 'sequence
alignment'.


> Will appreciate your suggestions. Thanks.
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From marchywka at hotmail.com  Tue Aug 25 07:40:25 2009
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 25 Aug 2009 07:40:25 -0400
Subject: [BiO BB] Creating animation from data?
In-Reply-To: <2c8757af0908242332t2d7b2ca5wa52f88f714ed5239@mail.gmail.com>
References: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com>
	<2c8757af0908242332t2d7b2ca5wa52f88f714ed5239@mail.gmail.com>
Message-ID: <BLU113-W33BAEBBC4978911515D59BEF80@phx.gbl>


----------------------------------------
> Date: Tue, 25 Aug 2009 07:32:32 +0100
> From:
> To: bbb at bioinformatics.org
> Subject: Re: [BiO BB] Creating animation from data?
>
> 2009/8/23 Rohan Sachdeva :
>> Hi all,
>> I have time series data. TRFLP from environmental bacterial samples to be
>> exact. The outputs look like this
>> http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg
>>
>> I would like that animate the data from month but using the actual data and
>> not just images so it looks like a smooth animation through time. Is there
>> anything out there that can do this?
>
> You can do this with imagemagick or ffmpeg (or both!)... Probably lots
> of other tools too!

What do you mean by "using the data?" You want to end up
with a video file and then you want some way to generate the intermediate frames? You are looking for a tool to interpolate or annotate spectra?
Certainly there are tools for composing video from snapshots but it
isn't clear that is the problem. Do you really want some interactive data viewer or just a video? 

_________________________________________________________________
Hotmail? is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009


From skhadar at gmail.com  Tue Aug 25 01:00:24 2009
From: skhadar at gmail.com (Shameer Khadar)
Date: Tue, 25 Aug 2009 10:30:24 +0530
Subject: [BiO BB] how to find conserved genes among viral genomes?
In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com>
	<8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
Message-ID: <b6ff81950908242200i39a07feej3e19d1676d2c05fc@mail.gmail.com>

Hello BH,

Can you explain the concept of conservation you are interested in is it
conserved genomes among a single genomes or across all genomes ?

Thanks,
K. Shameer
NCBS - TIFR

On Tue, Aug 25, 2009 at 4:40 AM, BH <mbhd.pha at gmail.com> wrote:

> Hi,
>
> Does anyone know how to find the conserved genes among the genomes (virus
> or
> phage genomes in particular)? Are there Bioinformatic tools/methods
> available for this?
>
> Will appreciate your suggestions. Thanks.
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From rsmsamal at gmail.com  Tue Aug 25 01:01:07 2009
From: rsmsamal at gmail.com (rasmiprava samal)
Date: Tue, 25 Aug 2009 10:31:07 +0530
Subject: [BiO BB] how to find conserved genes among viral genomes?
In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com>
	<8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
Message-ID: <c4c689ef0908242201o596f999cmd6ba157c3c3e3e95@mail.gmail.com>

The conserved patterns cannt be found in the genes. Rather we can determine
them in the corresponding protein.

The conserved patterns can be determined by BOXSHADE/TEXSHADE in CLC Biology
workbench.BOXSHADE is for local and TEXSHADE for global alignment.

regs,
Rashmi.

On Tue, Aug 25, 2009 at 4:40 AM, BH <mbhd.pha at gmail.com> wrote:

> Hi,
>
> Does anyone know how to find the conserved genes among the genomes (virus
> or
> phage genomes in particular)? Are there Bioinformatic tools/methods
> available for this?
>
> Will appreciate your suggestions. Thanks.
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


From liviu.vladutu at gmail.com  Tue Aug 25 02:10:44 2009
From: liviu.vladutu at gmail.com (Liviu Vladutu)
Date: Tue, 25 Aug 2009 09:10:44 +0300
Subject: [BiO BB] Creating animation from data?
In-Reply-To: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com>
References: <25b698b90908222203q5868ff16p74d64354663d20c0@mail.gmail.com>
Message-ID: <7d76d3450908242310i29553d41m17412cfd8dbc8c0e@mail.gmail.com>

Hi all,
I have created a small script in Matlab (from Mathworks) that creates the
movie from 10 images (.bmp) in this case.
Images names are:     'ima1.bmp' ,..., 'ima10.bmp' . So you basically have
to chop that chart (from http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg)
in smaller time series and plot each one (as I did, but replace line 2 of
CreateAviFromFrames the 'ima*.bmp'' from the fuf command with the right
image extension).
 The 2 necessary files are attached.
Hope that helps,
Liviu
===
Dr. Liviu Vladutu

On Sun, Aug 23, 2009 at 8:03 AM, Rohan Sachdeva <rsachdev at usc.edu> wrote:

> Hi all,
> I have time series data. TRFLP from environmental bacterial samples to be
> exact. The outputs look like this
> http://fuhrmanlab.usc.edu:60000/images/5mtrflp.jpg
>
> I would like that animate the data from month but using the actual data and
> not just images so it looks like a smooth animation through time. Is there
> anything out there that can do this?
>
> Thanks,
> Rohan
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
Liviu Vladutu

From marty.gollery at gmail.com  Tue Aug 25 09:37:16 2009
From: marty.gollery at gmail.com (Martin Gollery)
Date: Tue, 25 Aug 2009 06:37:16 -0700
Subject: [BiO BB] how to find conserved genes among viral genomes?
In-Reply-To: <8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com>
	<8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
Message-ID: <bdd10c2a0908250637i4aa0bebbmf0d36ba606f050a9@mail.gmail.com>

Try algorithms like BLAST, BLAT, Smith-Waterman, etc. etc.
Marty


On Mon, Aug 24, 2009 at 4:10 PM, BH <mbhd.pha at gmail.com> wrote:

> Hi,
>
> Does anyone know how to find the conserved genes among the genomes (virus
> or
> phage genomes in particular)? Are there Bioinformatic tools/methods
> available for this?
>
> Will appreciate your suggestions. Thanks.
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


-- 
-- 
Martin Gollery
Senior Bioinformatics Scientist
Tahoe Informatics
www.bioinformaticist.biz
www.hiddenmarkovmodels.com


From marchywka at hotmail.com  Tue Aug 25 13:37:14 2009
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 25 Aug 2009 13:37:14 -0400
Subject: [BiO BB] how to find conserved genes among viral genomes?
In-Reply-To: <c4c689ef0908242201o596f999cmd6ba157c3c3e3e95@mail.gmail.com>
References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com>
	<8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com> 
	<c4c689ef0908242201o596f999cmd6ba157c3c3e3e95@mail.gmail.com>
Message-ID: <BLU113-W11418D5B50278DAD0C3F5EBEF80@phx.gbl>


----------------------------------------
> Date: Tue, 25 Aug 2009 10:31:07 +0530
> From: 
> To: bbb at bioinformatics.org
> Subject: Re: [BiO BB] how to find conserved genes among viral genomes?
>
> The conserved patterns cannt be found in the genes. Rather we can determine
> them in the corresponding protein.


Is this a sweeping statement on science or your technology? That is, AFAIK
there is know indication that all sinonimuous kodons are Handeled tHe
same- between kemical interactions and interactions with rybOsum and
nasscent protein, etc. Certainly you expect "synonymuous" codons
to be generally more interchangeable than things that change the protein
but I'm still not sure what your point is here. 
[ I don't have a thesaurus for cinnamons and it was easier to mis-type
to make my point LOL]
And all that intron and regulatory stuff, what about that? Is this
a virus specific statement in sum weigh? 


>
> The conserved patterns can be determined by BOXSHADE/TEXSHADE in CLC Biology
> workbench.BOXSHADE is for local and TEXSHADE for global alignment.
>
> regs,
> Rashmi.
>
> On Tue, Aug 25, 2009 at 4:40 AM, BH  wrote:
>
>> Hi,
>>
>> Does anyone know how to find the conserved genes among the genomes (virus
>> or
>> phage genomes in particular)? Are there Bioinformatic tools/methods
>> available for this?
>>
>> Will appreciate your suggestions. Thanks.
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Windows Live: Make it easier for your friends to see what you?re up to on Facebook.
http://windowslive.com/Campaign/SocialNetworking?ocid=PID23285::T:WLMTAGL:ON:WL:en-US:SI_SB_facebook:082009


From mahef111 at link.net  Tue Aug 25 18:51:00 2009
From: mahef111 at link.net (Mahmoud ElHefnawi)
Date: Wed, 26 Aug 2009 01:51:00 +0300
Subject: [BiO BB] how to find conserved genes among viral genomes?
References: <8352923d0908241609i1ffd91chff766b5a834df4eb@mail.gmail.com><8352923d0908241610n36d996el6dd7d537f3bd44c@mail.gmail.com>
	<c4c689ef0908242201o596f999cmd6ba157c3c3e3e95@mail.gmail.com>
	<BLU113-W11418D5B50278DAD0C3F5EBEF80@phx.gbl>
Message-ID: <90441A9EAF294FFEBFD21DE65A867559@compaqf52e788f>

Hello,

I think u can use also motif prediction tools like the famous MEME, weider,
etc.. U can email me personally if u need more help.. Attached isone of my
works on Influenza for similar purposes.. Would appreciate also comments.

Best,
Mahmoud
----- Original Message ----- 
From: "Mike Marchywka" <marchywka at hotmail.com>
To: <bbb at bioinformatics.org>
Sent: Tuesday, August 25, 2009 8:37 PM
Subject: Re: [BiO BB] how to find conserved genes among viral genomes?


----------------------------------------
> Date: Tue, 25 Aug 2009 10:31:07 +0530
> From:
> To: bbb at bioinformatics.org
> Subject: Re: [BiO BB] how to find conserved genes among viral genomes?
>
> The conserved patterns cannt be found in the genes. Rather we can
> determine
> them in the corresponding protein.


Is this a sweeping statement on science or your technology? That is, AFAIK
there is know indication that all sinonimuous kodons are Handeled tHe
same- between kemical interactions and interactions with rybOsum and
nasscent protein, etc. Certainly you expect "synonymuous" codons
to be generally more interchangeable than things that change the protein
but I'm still not sure what your point is here.
[ I don't have a thesaurus for cinnamons and it was easier to mis-type
to make my point LOL]
And all that intron and regulatory stuff, what about that? Is this
a virus specific statement in sum weigh?


>
> The conserved patterns can be determined by BOXSHADE/TEXSHADE in CLC
> Biology
> workbench.BOXSHADE is for local and TEXSHADE for global alignment.
>
> regs,
> Rashmi.
>
> On Tue, Aug 25, 2009 at 4:40 AM, BH  wrote:
>
>> Hi,
>>
>> Does anyone know how to find the conserved genes among the genomes (virus
>> or
>> phage genomes in particular)? Are there Bioinformatic tools/methods
>> available for this?
>>
>> Will appreciate your suggestions. Thanks.
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb

_________________________________________________________________
Windows Live: Make it easier for your friends to see what you?re up to on
Facebook.
http://windowslive.com/Campaign/SocialNetworking?ocid=PID23285::T:WLMTAGL:ON:WL:en-US:SI_SB_facebook:082009
_______________________________________________
BBB mailing list
BBB at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bbb