[BiO BB] Looking for researcher, to assist on blast-like invention
theoriste at gmail.com
Tue Feb 12 11:44:34 EST 2008
On Feb 11, 2008 6:56 PM, Theodore H. Smith <delete at elfdata.com> wrote:
> On 11 Feb 2008, at 22:28, Ryan Golhar wrote:
> > Why don't you write up a paper describing the algorithm in detail and
> > submit it to a bioinformatics journal? And, why not make the
> > executable
> > available with documentation so that people can download it and try it
> > out for themselves.
> > Do you have any test cases that show it runs faster/better than BLAST?
> > Describe them and make them available.
> The first thing I'd need to do is make a good test. I'm not sure what
> constitutes "a good test", in this case.
NR ALL VS ALL: This will test speed and somehow test performance. The nr
database (non-redundant) from NCBI is a good place to start testing as a
template database. I'd use your algorithm all-against-all in nr. Test
against BLAST and then use your algorithm for each entry in nr versus all
of nr, and then compare performance. You can generate a ROC plot for BLAST
vs your algorithm against a known set of homologs and distant homologs,
based on a p-value or significance level cutoff.
A real randomization test would be this to test sensitivity and specificity:
take known sequences in nr -- all or some of them -- and scramble them by
'homologous recombination" -- create chimeras of known sequences by
different randomization criteria -- by domain (criteria based on domain
annotation) or by individual sequence based on a known randomization
function, and then test the sensitivity and specificity of BLAST vs your
algorithm to detect the originating sequences that created the chimeras.
You will also need to check the performance of your algorithm against
nucleotide sequences. There are already test cases in BLAST for
mouse-vs-human, that would be a good test case.
More information about the BBB