|
|||||||||||
|
Re: [OT] 19"/2U Cases
From: Michael Loftis <mloftis(at)modwest.com>
Date: Wed Aug 29 2007 - 18:37:28 EDT --On August 29, 2007 2:54:31 PM -0700 Mike Bird <mgb-debian@yosemite.net> wrote: > On Wednesday 29 August 2007 13:45, Michael Loftis wrote: On any hardware raid (atleast with a hotswap chassis) you can remove, and insert a new drive, live, no intervention, and the RAID takes care of starting the rebuild/readding the drive. If you don't have hotswap you can remove and add a new drive, and on the next power up the RAID takes care of it. Now I might be wrong but Linux AFAIK does not support SATA hotswap on most controllers. I've seen it mostly work on SCSI systems (you have to manually rescan the scsi bus usually to get the kernel to update it's list of drives). But on FibreChannel when a loop has an issue, the kernel will tend to mark the loop down, and no amount of coaxing short of a reboot will get that loop back into the up state. Just as recently as this week or last week on a 2.6.18 kernel MD RAID flipped on a mirror and marked both drives bad, when neither had any detectable issue. This caused the machine to OOPS/panic and stop. Neither drive was faulty. The fact still remains that MDRAID handles errors badly. It doesn't retry reads. Instead it assumes that any failed read means the whole partition is dead. It then retries on the partner drive. On a SCSI bus if you had a momentary issue this would likely also fail, then two drives/partitions are now marked bad. any hardware raid will give it another go before marking something as bad (and most will log the soft error). > Many hardware raid controllers support partitioning like this, but it's an advanced option found only on higher end cards. I haven't seen any consumer level RAIDs support this, so you have that one for sure. As far as hardware raid cards failing, in the hundreds of installations, I've seen it once. And that was because of improper handling of the card causing a hairline fracture in the PCB. Motherboards may be a different story, esp with the huge numbers of bad electrolytic caps out there. > I'd love to see any documentation on any of this. The problems I've described are real world. Despite that we still have a lot of MD software mirrors in production as as long as they're working they're cheaper. They take a lot more effort to make work right though. We use persistent superblocks, and it doesn't alleviate any of these issues. The installations are of about four major 'flavors'. RedHat9, FC3, Debian 3.0 and 3.1, and Debian 4.0. And none are immune to the issues. Debian 3 was pretty bad sometimes not making it past the initrd when a drive failed. The MD setups were all done using the normal TUI (anaconda, debian's system installer) tools during installation. And grub-installing on both drives isn't as simple as it sounds, because it only works right if their geometry matches. the grub installer isn't smart enough to figure out if things don't match. typing grub-install to install a boot block on hd1 (sdb, hdb, whatever it really is) won't necessarily give you a bootable hd1, because if your grub config references hd0 partitions, and they're different than hd1 in some way, it won't make it to stage 2/2.5, and thus no command prompt. This atleast has always been my experience, even with etch. The other issue is that no bios i know of will handle if the boot drive fails in some way that doesn't leave it simply not showing up. and most of the time they tend to fail in ways that leave them showing up to the bios, but are actually unusable. a hardware raid solves this. A related issue to that is the fact that most PC BIOS' have a pretty sad serial console support. This means that failures will more often require onsite visits if a reboot (for whatever reason) happens after a boot drive failure but before you can get a tech on site. This is a definite issue if you're deploying systems in locations remote to your own, or with difficult access. Software RAID has caveats, it's not perfect. Hardware RAID has caveats, it's not perfect. Having seen far more issues in the real world with software RAIDs than with hardware RAIDs puts me pretty squarely in the hardware RAID camp. Software RAID is undoubtedly cheaper for initial investment cost. But in our experience (Modwest) can cost significantly more when it fails due to undetected errors, poor error recovery behavior (sometimes not the fault of software RAID, many IDE, SATA, and even some SCSI drivers and controllers just do not behave very well when a drive isn't responding properly). It requires significantly more experience and know-how to properly manage and recover from an error. Hardware RAID boils down to 'which drive failed?' 'replace with same or larger drive' and you're done. This can be done by someone with only minimal experience and no unix experience at all. Where hardware RAID does lose is cost and flexibility. You do have less choice as to exactly how to manage/maintain your data with hardware RAID. Many people do not need that much flexibility. I am not FUDing as you put it, I am making known my objections and experience with software RAID. Many people don't have any issues with software RAID. And a software RAID compared to a cheap bottom barrel hardware RAID will usually be faster especially when you want to do RAID5 and RAID6, and possibly more reliable, and will certainly have more bells and whistles. -- To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.orgReceived on Wed Aug 29 18:38:06 2007 This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 00:07:41 EDT |
||||||||||
|
|||||||||||