[Bioclusters] Business-ish stuff: monitoring cluster usage, and how to pay for it all

Wed, 4 Sep 2002 14:22:13 +1200

Hi all,

We are planning to purchase 64 dual-processor nodes and assemble them into a
Beowulf cluster.  ("We" is the Allan Wilson Centre, an inter-university
group hosted by Massey University in New Zealand.  Paul Gardner posted
earlier asking about technical issues for this cluster.)  Since a government
grant, specifically intended to help bioinformatics research, will be used
to fund most but not all (roughly 50%) of the project, there is a need to
establish some system of charging users.

As we see it, there are three categories of potential users.  In decreasing
order of priority:
1.  Researchers affiliated to the AWC.
2.  Other people or groups doing work in the field of bioinformatics (e.g.
academics at other universities, people working for government research
institutes).
3.  Companies looking to use high-performance computing for purposes
unrelated to bioinformatics.

The priorities are set because of the grant from the government -- in
particular we feel it is not fair to openly compete with companies offering
high performance computing, so we intend to make available only the surplus
CPU time etc. not used by category 1 and 2 users to users in category 3.

The problem we have is being able to demonstrate to the tenders board (i.e.
the Powers That Be) that we can actually pay for the other 50%.  Since we
don't have any similar projects of this scale, we don't have the kind of
direct evidence they will be looking for.  So, I would very much like to
hear from anybody who has experience working with a cluster where (some part
of) the costs have to be recovered:

1.  What charging scheme do you use?  Options range from a one-off "lifetime
membership" charge for a whole company or university to charging by the
wallclock or CPU minute.
2.  How much interest do you have from the commercial sector for using up
unused clock cycles?  Is this a useful approach for meeting costs?
3.  How do you prioritise these users fairly?
4.  Do you have a way of deciding how many nodes should be allocated to a
particular batch task, based on the number and size of other batch requests
that have occurred or are likely to occur?
5.  Are there particular usage patterns you have discovered (e.g. length and
frequency of batch jobs, number of nodes requested or allocated etc.), which
are important to take into account?
6.  (More technical)  Is there any software you would recommend for
collected this information automatically?

At the moment we are planning to allow an initial 6-month period of free
access to any user, to determine the level of interest in using such a
system, the kinds of usage patterns and to build up an idea of how to manage
the system as we go along, but it would be really beneficial to hear from
others who have been there.

Please let me know if there are any further details you need to know.  I
look forward to your comments!

Thanks in advance,

Tim White