Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: Filesystem corruption on md (Software) RAID

From: <michael(at)estone.ca>
Date: Mon Aug 06 2007 - 12:33:33 EDT


Quoting Sebastian Flothow <flothow@gip.com>:

> Hi,
>
> I'm getting massive filesystem corruption on an md RAID comprising 4
> SATA disks. I tried ext3, xfs and reiserfs on RAID level 5 as well as
> ext3 on RAID level 1 (using only 2 disks); all can be crashed reliably
> by running bonnie++ for just a few minutes. In the case of ext3, I
> usually get dmesg output like this:
>
> [...]
> md0: rw=1, want=1482184800, limit=490223232
> attempt to access beyond end of device
> md0: rw=1, want=1482184800, limit=490223232
> attempt to access beyond end of device
> md0: rw=1, want=1482184800, limit=490223232
> Buffer I/O error on device md0, logical block 185273099
> lost page write due to I/O error on md0
> Aborting journal on device md0.
> EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
> EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
> EXT3-fs error (device md0) in ext3_new_blocks: Journal has aborted
> ext3_abort called.
> EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
> Remounting filesystem read-only
>
> The filesystems are impossible to repair afterwards, e2fsck in
> particular will run for ages, and eventually segfault.
>
> By contrast, ext3 directly on the physical disk partition works fine and
> withstood days of continouus bonnieing.
>
> This is with Etch, kernel 2.6.18-4-686-bigmem. FWIW, the machine used to
> run Sarge with a 2.4 kernel, where the RAID worked fine.
>
> Now, it seems quite unlikely that RAID is completely broken in 2.6, so I
> suppose it might be related to the hardware: it's a Pentium 4 @ 2.8 GHz,
> 1.5 GiB RAM, the SATA Controller is a Promise S150 SX4 using the
> sata_sx4 kernel module.
>
>

Defintely sounds like hardware is failing. You could try installing smartmontools onto your system and use it to scan your drives. It might tell you if you have some bad sectors, or some other failing component.
Also, try not using the bigmem kernel. AFAIK, its designed for 32 bit systems with RAM exceeding 4 Gigs. ?? (Although I would guess that shouldn't make a difference) Received on Mon Aug 6 12:35:34 2007

This archive was generated by hypermail 2.1.8 : Thu Aug 09 2007 - 18:49:41 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library