[TriLUG] software raid failures?
aaron at joyner.ws
Tue Jan 13 17:23:16 EST 2009
I've personally supported software raid in Linux for well more than 6
years, in large scale production usage. I've never lost data. I know
that sounds broad, but I've never been in a situation where I had more
than one drive fail faster than I could do a timely back up, shut down
the box, replace the drive, and bring it back on line. I've had a box
which had unstable hardware (a flaky but irreplaceable PCI card) which
would lock the hardware, cause kernel panics, and all manner of other
generally unfriendly shutdowns on a weekly basis, for months on end,
and even in that fairly worst-case scenario didn't loose data on the
relevant linux md software raid-5 array.
I agree with others that if you're staking your production system on
it, you should compile as large as possible a set of anecdotes, but
that's about as close as you're going to come to realistic data. Far
more important than the droves of "it works great for me" responses
you're likely to get, are looking for those tell tale few "I lost data
in this situation" responses you may get. Those will be the anecdotes
that will trouble your skeptics. The best you can do, should you be
able to find any one willing to say that, is to try to assess that
either (a) their particularly very unusual situation wouldn't apply to
you, or (b) they're not really a trust-worthy source. If you can't
convince yourself of either (a) or (b) for anyone you can find who
*does* report a failure, then you might be on to something, and
perhaps you should listen to your skeptics.
Generally though, I'm of the opinion that if it were a real problem,
you'd hear a lot more grumbling about it on public lists, and it'd be
readily discoverable via Google. Because of my positive experiences,
I've never really gone looking for that type of concern, so it might
be out there. I'd be very surprised.
Aaron S. Joyner
On Tue, Jan 13, 2009 at 3:13 PM, Cristóbal Palmer <cmp at cmpalmer.org> wrote:
> Anybody here ever lost data because of a problem with Linux software
> raid? If so, please describe the circumstances. Now, of course you had
> separate, off-site backups, so you probably only lost a day's worth of
> data when this happened, but...
> I have two people who are software raid skeptics and need convincing.
> An official-looking document that basically says, "This is why you
> won't lose data due to a kernel panic or drive failure or..." would be
> Thanks in advance,
> Cristóbal M. Palmer
> "Small acts of humanity amid the chaos of inhumanity provide hope. But
> small acts are insufficient."
> -- Paul Rusesabagina
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG FAQ : http://www.trilug.org/wiki/Frequently_Asked_Questions
More information about the TriLUG