[Bioclusters] Re: BLAST job on SGE
dag at sonsorol.org
Fri Dec 3 15:11:46 EST 2004
The approach suggested below will solve the "how do I submit one job
that will get 100% of a SMP machine" problem but will not solve the
desire to stop the SGE scheduler from filling all the available job
slots on a single box before moving on to a different compute nodes.
The "smp" trick below works by setting up a Parallel Environment called
"smp" within grid engine.
You sould create the PE by running the command "qconf -ap smp" and
filling in the values listed below.
Once that was done you run jobs that would take 100% of the available
job slots on any given node. The end result is your job gets sole use of
the machine while it runs.
On a cluster of dual-cpu boxes you would submit your job like this:
$ qsub -pe smp 2 ./my-job-script
In effect you are asking for 2 parallel job slots and since this happens
to match the sum total of slots available on a 2way system you end up
getting sole use of the machine while your job runs. You are never
really doing any parallel work, just using the PE mechanism to take >1
job slots for your job.
There are lots of other approaches that may be better for particular people:
1. If this is your standard use case and you *never* want to allow more
than one job to run on a node at any time then the most simple solution
is just to edit your SGE queue confguration and set the "slots" value to
"1" on each compute node. That will force the scheduler to only allow
one-job-per node at any time.
2. Another (wierd) method is to assign a numerical value to "$seq_no"
within each of your queues and then adjust the SGE scheduler's value of
"$queue_sort_method" so that "$queue_sort_method=seqno". If you do that
then SGE will attempt to farm out jobs according to the order in which
queues have set their $seqno value. This is probably not optimal for
3. The most interesting approach is detailed in the manpage for
"sched_conf" where the description for the '$job_load_adjustments'
parameter says this:
> If your load_formula simply consists of the CPU load average parameter
> load_avg and if your jobs are very compute intensive, you might want to
> set the job_load_adjustments list to load_avg=100, which means that
> every new job dispatched to a host will require 100 % CPU time and thus
> the machine's load is instantly raised by 100.
This could be a way of getting round-robin allocation done outside of a
parallel environment. In effect you artifically boost the internal load
value that SGE "sees" right after the job starts which should have the
affect of causing the SGE scheduler to move on to the *next* machine
rather than packing more jobs into any more remaining job slots. Setting
load_avg to 100 should be fine beause the normalized np_load_avg value
is aware of multiple-CPU SMP systems.
Juan Carlos Perin wrote:
> I'm also interested in this, and have been playing with the configuration.
> Our friends at Penn have fixed this and created a queue configuration that
> allows single jobs to execute on all available nodes, as opposed to running
> two jobs per node, which isn't desirable. It seems to me that the idea was
> to run one blast job on each CPU, thus two jobs on each machine, but the
> architecture doesn't necessarily work like that, and instead waits for one
> to finish, or re-queues the job.
> This is a configuration that was suggested by the very kind people at Penn,
> that seems to work for this situation:
> # qconf -sp smp
> pe_name smp
> queue_list all
> slots 999
> user_lists NONE
> xuser_lists NONE
> start_proc_args NONE
> stop_proc_args NONE
> allocation_rule $pe_slots
> control_slaves FALSE
> job_is_first_task FALSE
> I have yet to test it myself.
> Juan Perin
> Bioinformatics Core
> Children's Hospital of Philadelphia
> Bioclusters maillist - Bioclusters at bioinformatics.org
Chris Dagdigian, <dag at sonsorol.org>
BioTeam - Independent life science IT & informatics consulting
Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E iChat/AIM: bioteamdag Web: http://bioteam.net
More information about the Bioclusters