[BiO BB] Looking for researcher, to assist on blast-like invention
theoriste at gmail.com
Tue Feb 12 11:49:02 EST 2008
One more thing --
If you do a homologous recombination function, I would also include an
additional mutator function to mimic genetic drift -- it can be
sophisticated in allowing mutations vs the codon table and can be
distributed by a known function of percent drift/difference, so you can
adjust that and not only catch originating sequences by domains but also by
On Feb 12, 2008 11:46 AM, DT <theoriste at gmail.com> wrote:
> By the way, nr is ftp-able from NCBI and is a protein-based database if
> you didn't know.
> On Feb 12, 2008 11:44 AM, DT <theoriste at gmail.com> wrote:
> > On Feb 11, 2008 6:56 PM, Theodore H. Smith <delete at elfdata.com> wrote:
> > >
> > > On 11 Feb 2008, at 22:28, Ryan Golhar wrote:
> > >
> > > > Why don't you write up a paper describing the algorithm in detail
> > > and
> > > > submit it to a bioinformatics journal? And, why not make the
> > > > executable
> > > > available with documentation so that people can download it and try
> > > it
> > > > out for themselves.
> > > >
> > > > Do you have any test cases that show it runs faster/better than
> > > BLAST?
> > > > Describe them and make them available.
> > >
> > > The first thing I'd need to do is make a good test. I'm not sure what
> > > constitutes "a good test", in this case.
> > NR ALL VS ALL: This will test speed and somehow test performance. The
> > nr database (non-redundant) from NCBI is a good place to start testing as a
> > template database. I'd use your algorithm all-against-all in nr. Test
> > against BLAST and then use your algorithm for each entry in nr versus all
> > of nr, and then compare performance. You can generate a ROC plot for BLAST
> > vs your algorithm against a known set of homologs and distant homologs,
> > based on a p-value or significance level cutoff.
> > A real randomization test would be this to test sensitivity and
> > specificity: take known sequences in nr -- all or some of them -- and
> > scramble them by 'homologous recombination" -- create chimeras of known
> > sequences by different randomization criteria -- by domain (criteria based
> > on domain annotation) or by individual sequence based on a known
> > randomization function, and then test the sensitivity and specificity of
> > BLAST vs your algorithm to detect the originating sequences that created the
> > chimeras.
> > You will also need to check the performance of your algorithm against
> > nucleotide sequences. There are already test cases in BLAST for
> > mouse-vs-human, that would be a good test case.
> > Deanne Taylor
More information about the BBB