[Bioclusters] Versioning databases

Joe Landman landman at scalableinformatics.com
Sun Jun 4 22:33:40 EDT 2006

Sounds nice.  I had thought of also (somehow) saving diffs in a db so 
you could generate the test db you used previously.  Don't know if there 
is interest in this, but we had a prototype of this a few years ago.


Michael James wrote:
> Some biological databases actually come in versions,
>  for example;  we are up to the TIGR4 rice genome and
>  swisprot UniProtKB/Swiss-Prot Release 50.0 of 30-May-2006
> Others just change daily, NCBI:nr  NCBI:nt  etc.
> All this effort creates a problem for repeatability,
>  the blast results you get next week
>  won't quite be the ones you got today.
> It seems to me that the situation would be improved
>  by tagging results "BLAST against ncbi.nih.gov nr 2006-06-05 000"
> This means we need to come up with a versioning scheme
>  and for anything without, I'd suggest something as simple as
>    issuing_authority  database  date    3_digit_release_number
> eg  ncbi.nih.gov           nr  2006-06-05          000
> For uniqueness, use the internet name for issuing_authority.
> The database is the filename stripped of all qualifiers
> Remove things like  .gz  .00.tar.gz  
> The date in ISO format!
> 3 more digits to ensure uniqueness.
> Such a scheme would also be
>  a big win for us database administrators.
> We could start to weave it through the tangled web
>  of different providers and formats
>  so we actually know the original issuing authority
>  for the file we are downloading.
> What do you think?
> michaelj

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

More information about the Bioclusters mailing list