[BiO BB] Looking for a DNA search engine that includes length as a parameter

Lambert, Lisa Lambert at Chatham.edu
Tue Jul 29 09:08:04 EDT 2008

Thanks, Mike

I've used PERL (I'll definitely take a look at BIOPERL), and could write something that locally searches downloaded FASTA results or whole genome files. I was just hoping I had missed something faster, since this is more of a side project than my main area of interest. It might be a good student project for the fall!

It also occurred to me (after sleeping on it) that, given the size, these might be SINEs, so I'm going to check Repeatmaster and see if I can narrow things that way.


From: bbb-bounces at bioinformatics.org [bbb-bounces at bioinformatics.org] On Behalf Of Mike Marchywka [marchywka at hotmail.com]
Sent: Tuesday, July 29, 2008 3:08 AM
To: General Forum at Bioinformatics.Org
Subject: Re: [BiO BB] Looking for a DNA search engine that includes length as a parameter

> From: Lambert at Chatham.edu
> To: bbb at bioinformatics.org
> Date: Mon, 28 Jul 2008 15:36:35 -0400
> Subject: [BiO BB] Looking for a DNA search engine that includes length as a parameter

>[...]What I really is an engine where I can specify that only hits longer than 200 bp with an identity of at least 50% be returned. Does anyone know of a tool that will do this?

In the past, I've written scripts to download search results non-selectively and then
sort through them locally with custom criteria. I'm a programmer so this is my approach to everything but I think it makes sense whenever you expect to do a lot of ad hoc or exploratory processing.
One of the well know packages, I think people keep mentioning bioperl, may be worth considering
rather than hoping to find a specific search engine that does exactly what you want.

In my case, I was looking for 25bp long DNA sequences and wanted to find "what is close by"
in various species. IIRC, the things I hoped to find "close by" were sequences close to different
25bp probes in my query list. So, I did blast searches on genomes and then extracted the hit locations,
requested expanded versions of the approriate chromosomes, and fiddled with the results
using text processing scripts.

I guess effectively I had a query that looked like, " find areas that contain 10, 25bp sequences in
any order, with matches being better than 20/25 for each key sequence, and  not more than about 300bp in total span."

Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.
Note: Hotmail is possibly blocking my mom's entire
ISP - try  me on marchywka at yahoo.com if no reply
here. Thanks.

Time for vacation? WIN what you need- enter now!
BBB mailing list
BBB at bioinformatics.org

More information about the BBB mailing list