[Bioclusters] Urgent advice on RAID design requested

Sat Jan 20 18:07:14 EST 2007

George N. White III wrote:\

> MTBF is a statistical measure based on failure rates for a large
> number of fresh units.  You may have a component with 10 year MTBF

Absolutely.

> that whose mechanical bits will wear out in 5 years.  Vendors have

USB2 disks are a great example.  Even if the disk MTBF is in the 0.5M 
hour region, other components in the product aren't demonstrating this. 
  Especially the (very cheap) brick power supplies.

> become very adept at designing hardware that wears out a couple days
> after the warranty expires.  I hear horror stories about the 2nd disk

Hmmm...  I find that statement suspect at best.

> failing while rebuilding a RAID, but how many sites have a schedule to
> replace drives before they actually fail in service rather that
> waiting until the first one fails.  I've experienced too many cases

This is actually not a bad idea.  It is done with machine tools, and CNC 
parts.  Each moving/cutting part has a known statistical lifetime, and 
they are usually arranged so that you are "constantly" (in some 
definition of this word) replacing the cutting parts.

> where a number of identical parts (disks, power supplies, fans) in
> workstations purchased at the same time all fail at roughly the same
> time.  Sometimes there is a trigger event (A/C failure) that stresses
> systems within limits that they would handle when new, but after 2-3
> years cooling fans are less effective due to dust buildup, added
> components have increased heat production in the machine room, etc. so
> you get a cluster of failures.  Rebuilding a RAID is also a stressor.

Agreed.  Many things go into the MTBF estimation.  24x7 load times don't 
often take into account power cycling.  Or other stresses.

> 
>> The point being, if you are going to bet your life, or your data on
>> something, it makes sense to go with hard data as compared to 
>> speculation.
>>
>> The cheapest drives around, Maxtors and their ilk have seen failure
>> rates higher than 3-4% in desktop and other apps.  Sure, you will save a
>> buck or two on the front end (acquisition).  Unless you can tolerate
>> data loss, do you want to deal with the impact on the back end?  Without
>>   trying to FUD here, how much precisely is your data worth, how many
>> thousands or millions of dollars (or euros, or ...) have been spent
>> collecting it?  Once you frame the question in terms of how much risk
>> you can afford, you start looking at how to ameliorate the risk.
>>
>> There are simple, (relatively) inexpensive methods.  N+1 supplies adds
>> *marginal* additional cost to a unit.  Using better drives (notice I
>> didn't say FC/SCSI/SATA), adds minute costs to the unit.  Using
>> intelligent redundancy (RAID6 with hot spares, mirrored,...) reduces
>> risk at an increase in cost.
> 
> So does a sensible schedule to replace older units before they fail.

In combination with a RAID system, yes, this could be quite beneficial. 
     But as with everything you need to balance the cost of the solution 
against the benefit it gives.

> For organizations where unscheduled downtime is expensive, the
> benefits include being able to schedule  replacements to minimize
> disruptions.

Again, see machine tool operators.  They have been doing this for a while.

> 
>> We are not talking about EMC costs here.  Or NetAPP.  If you are
>> spending north of $2.5/GB of space you are probably overspending, though
>> this is a function of what it is and what technology you are buying.
>>
>> > separate machines with cheap components (chapest SATA drives with 
>> single
>> > power supply) is better that one expensive machine (higher quality hard
>> > drives, redundant power supply). What you Gurus say?
> 
> You have given an ill-posed question.  The answer is very sensitive to

The question had been asked in other posts.  What you buy should be a 
function of what you need to do with it as much as it is constrained by 
your budget.

> the I/O profile of your workload. There can be a big performance hit
> for the I/O it takes to replicate the data between the boxes.  Some

This is a function of box design as much as it is IO workload.

> workloads will have low I/O windows where replication can be done.
> How robust is your processing if the the separate machines get out of
> sync? One approach is to keep the filesystem metadata on a small
> highly reliable machine.

  ...  so if the meta data machine, through some terrible ABC 
(adminstration before coffee) accident goes offline ...

Aside from that, this design is a single point of information flow, 
which you wish to strenuously avoid if at all possible in high 
performance systems.

> 
>> I believe that you can save money at the most appropriate places to do
>> so.  Im not sure this is it.  Its your data, and you have to deal
>> with/answer for what happens if a disk or machine demise makes it
>> un-recoverable.  People whom have not had a loss event usually dont get
>> this (e.g. it hasnt bitten them personally).  If you have ever lost data
>> due to a failure, and it cost you lots of time/energy/sweat/money to
>> recover or replicate this, you quickly realize that the "added" cost is
>> a steal, a bargin in comparison with your time.  Which you should value
>> highly (your employer does, and rarely do they want you spending time on
>> data recovery, unless this is your job, as compared to what you are paid
>> to do).
> 
> There are usually people (who won't be around when the problems
> appear) telling management "cheap, secure, and reliable? -- no
> problem!".  In large organizations, the time/energy/sweat includes

Yup.

> sitting in the committees make the recommendations to management.
> Many large organizations have people running spreadsheets to look at
> the cost of data storage/processing in various sites. The results are
> then used to require every site to use the approach that looks
> cheapest -- often without appropriate consideration of the risks or of
> differences in workloads.

Yup.  At the end of the day it is $$$ (or insert your favorite currency).

> 

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615