Hello I run a small PC farm to handle annotation of DNA sequence (primarily ESTs = * about 550,000 at present). The current model for this farm is to use "wha= tever is available" to us within our Institute in respect of compute capaci= ty. Consequently the farm has a few (19) 24x7 nodes and 50 odd nodes that j= oin the farm after normal laboratory hours. The nodes within this farm are = Intel PCs of varying types (RAM 512 MB, CPUs PIII/800 MHz * P4/2.8 GHz, mos= tly 40-80 GB local disks). We have 600 odd PCs in our Institute but I can n= o longer use this pool to expand capacity as our new CIO is dead against us= ing this distributed model which seems a pity since 80 new DELL 2.8 GHz PCs= arrived on our site this month. The disadvantages of the current model is= the necessity to insert a second hard disk to support Linux (primary OS is= Windows XP) and the unix system administration overhead of managing OS upd= ates to "transient" nodes (all bioinformatic database updates/script change= s etc are rolled out automatically from the master node when a node comes o= n line or as required when online in the farm).=20 Our CIO favours a fully dedicated system which would be great for us except= his goals may not be identical to ours - he has cost drivers, we have perf= ormance drivers. To this end I have recently compared our existing farm out= put to 1U test machines from two major vendors (single CPU and dual CPU). T= he single CPU machine was a 2.66 GHz P4/1GB Ram model. The dual Xeon 2.8 GH= z CPU machine was trialed initially with 512 MB per processor, then 1 GB R= AM, and then 2 GB RAM per processor. Comparisons to existing farm nodes sho= wed the single CPU test system performed similarly to 2.8 GHz P4 (512MB RAM= ) with little benefit from the additional RAM. The dual Xeon only showed li= ttle difference from this when the RAM was less then 2 GB per processor. Wh= en increased to 2 GB per processor, 2-6 fold increases in output where seen= in blastn vs "nt", blastx vs "nrdb90", and interproscan (dependent on task= ). The trial used our live production pipeline so each node does not receiv= e the same jobs. However, this is compensated by the fact that the runs wer= e in the range of 8000-16000 jobs per node. Currently we are not splitting = the large databases for blast (hence the performance gain seen for the 2 GB= per processor model). We are getting other test models in but really seems= sensible to tap into the wealth of knowledge that is already within the Bi= oCluster community. I have been scanning this newsgroup in an attempt to gain a better idea of = what others are implementing as solutions (1CPU vs 2 CPU; memory per proces= sor etc) and would welcome any input that you wish to give. In particular w= hat is the minimum memory configuration per processor that is being used fo= r blastn vt "nt" where the database in both being split and not being split. Also, our existing farm uses "node pull". That is, as nodes come online a p= rocess on each node requests from a mysql configuration database the type o= f jobs that the node is capable of undertaking, then requests a chunk of jo= bs from a mysql database functioning as a jobs queue. The nodes process the= ir chunk of jobs and post parsed results directly back to the appropriate m= ysql database. All blasts are performed by piping from the control script t= o blast then piping results back in for parsing. No physical sequence/repor= t files are read or written to local disk (except for interproscan). I used= to use NFS and have the nodes send results files back to an NFS server whe= re they were parsed to database but that is incredibly slow compared to the= system I now operate. The "node pull" system seems ideal for our current e= nvironment but if we move to a farm/cluster that is available 24x7 there ma= y be a better way to do it (use SGE, standard cluster queuing systems etc).= If I move to splitting databases then its seems I am back to using NFS, ge= neration of physical reports and parsing these on one or more servers (pars= ing itself could be a new job type and merged blast reports redistributed t= o the cluster to parse?). Is there a consensus on the best or most appropri= ate way to tackle this in a dedicated cluster environment? I would welcome = input on this as well. Apologies if this is "old hat" to many of you.=20 Ross ______________________________________________________ The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify=20 the sender and delete all material pertaining to this e-mail. ______________________________________________________