On Mon, 2002-12-16 at 20:40, Elia Stupka wrote: > > Generally this is not so hard. You can even incorporate the update > > into a queuing system, as long as you use an O(1) data distribution > > system, such as the old ccp I had architected, or some newer stuff. > > Use a priority based mechanism to schedule the update to occur between > > computing runs. This requires some tuning/tweaking of the queuing > > system, but it is generally not that hard to do. > > That was quite enlightening to a non system's person like me. One thing I > would still say though is that I personally wouldn't like to implement an > automatic update mainly because one would like to be able to reproduce a > whole bioinformatics protocol (especially if it is for publication) and in > order to do that one should know what version of a database his process > was running against. I would think that the repeatability of the computational pipeline is an important factor, so the automatic update scheme might not fit into this model. Either that, or some sort of tagging of the database that was used at a metadata level (I had been using things like date, time, and MD5 sum in a small XML structure to identify the DB). I think this is the best of all worlds, though it requires lots of disk space on a server somewhere, and clever data distribution mechanisms. But it allows you to version your pipeline protocols. This could be quite interesting from a data quality perspective. > Nonetheless very very interesting, thanks! We might tweak it to the > fullest, by automating as you suggest and then storing the information in > our pipeline, to be able to track it... I am working on a better distribution mechanism. I'll let you know when it is nearly ready for prime time. It should fit in with this scheme (the scheduler priority bubble). Joe -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman@scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615