[Bioclusters] bio data mirroring
David.Allouche at toulouse.inra.fr
Tue Jan 15 12:20:48 EST 2008
concerning the concept :
BioMAJ (BIOlogie Mises A Jour) is a workflow engine dedicated to data
synchronization and processing.The Software automates the update cycle
and the supervision of the locally mirrored databank repository.
BioMAJ is designed to carry out update cycles for a data source. Each
cycle has five stages.
The engine loads the properties file containing the workflow description
and looks at the current status of the bank by running through the
associated status file. After determining the bankÕs status, the
application opens the full cycle or, if necessary, tries to finish the
previously incomplete cycle (in the event of an error correction).
This is a sub-workflow run before the data update. It has the same
properties as the post-processing part explained below. Its purpose is
simple: to start tasks, controls and alerts prior to the rest of the
During this stage, the engine connects to the source and checks for new
data compared to data already present locally. It determines the list of
files to be downloaded and assigns a version name. It then carries out
the download followed by extraction. Finally, it consolidates the data
by producing a full version of the bank, adhering to the restrictions
defined by the properties file.
During this stage, the motor runs the post-processing sub-workflow. The
form can describe relatively complex workflows. It is a succession of
task blocks that contain one or several sub-collections of meta-tasks
that can themselves be made up of several processes. Each block is run
in sequence. In a given block, the meta-tasks that make it up are run in
parallel. In a meta-task, processes are run in sequence. If there is an
error in a process, only the branch that it belongs to is stopped. This
creates a Directed Acyclical Graph (DAG).
If the previous stages have completed without error, the application
puts the new version of the source online. Then it deletes the obsolete
versions and the temporary files produced when running the post-processes.
All stages in a session are written up into the status file. If there is
an error, during the following session, the application will try to
continue the session from the first erroneous stage of the previous
session to complete the cycle. One cycle is associated with a data
source. One or more sessions may be necessary to complete a cycle.
if you want more details , use the following url :
be careful we did a mistake in the application packaging.
the Manuel included into to download is in French !
the full English documentation is available on the web site into the
support pull down.
let me known if you have questions.
ps: more properties files ( i.e bank update cylce description ) are
available in the web presentation ( ressources pulldown )
example of application results can be browse on the following url :
The application is in production on the 3 parteners sites.
no major problem have been notified.
history of data processing (daily update ) is available on the following
there are daily generated with xslt from the xml statefiles proceed by
Tony Travis a écrit :
>David Allouche wrote:
>>we are looking for more beta-test for feed back. ( publication is going
>>to be submit very soon).
>I'm interested in beta-testing your software. Please let me know where I
>can download it?
More information about the Bioclusters