|
|||||||||||
|
Strange console-related hang at boot-time
From: Andrew Reid <reidac(at)bellatlantic.net>
Date: Sun Nov 04 2007 - 15:25:31 EST Hi all -- Please let me know if this is the wrong forum for this query -- as you'll see by reading it, I don't yet know if it should be a bug report, but on the other hand, it's not really an installer question. I have a weird problem with a Debian "etch" file server. I've googled around and searched list archives, but I'm not finding anything helpful. The system is a dual Opteron 242 system with 4G of RAM, two 250G hard-drives in some RAID1 arrays (there are two partitions on each drive) with the "boot" and root filesystems on it, and a 3ware 9000 controller with eight more drives on it in RAID5 for a 2.6 TB-capacity array. The array holds files which are served by NFS by this server, mainly user accounts on our system. A few days ago, one of the RAID1 disks failed, and apparently took down the system, which then attempted to reboot, but the reboot hung part way through the process -- it seemed to be in /etc/rc2.d/S20 somewhere, the last message was from knfsd, which looked like it might be hung. I replaced the failed disk and sync'd up the RAID1, and it's still got the same problem. It's mounting the root filesystem OK, but it's not finishing the start-up scripts. The weird part is, under some circumstances (see below), I can boot it to single-user, and if I then run all the services in /etc/rc2.d/ manually, they all start up just fine. That's the condition it's in now, as a work-around to this problem, but I'm not happy with it, of course. My first guess was some kind of file system damage from the crash, corrupting one or more of the start-up files, but since manual start-up works fine, this seems unlikely. However, there's a strange connection with the console devices. The system normally talks to a console server via ttyS0, but it won't even boot to single user unless I leave the "console=ttyS0,115200" argument out of the boot string. That's not the whole story either, though, because even if I leave the "console=ttyS0,115200" argument out of the boot string, it also won't boot to run level 2. It hangs in roughly the same place, with some but not all of the /etc/rc2.d/S20* scripts giving messages on the console. Of course, run-level 2 starts up a bunch of virtual consoles. So, at the end of the process, it seems as though there's some kind of problem with starting consoles, virtual or ttyS0, that's making the start-up process hang. This may include corrupted files somehow, of course, but I'm not sure where to look for problems. Are there start-up-sequence experts who can help me with this? Thanks.
-- This archive was generated by hypermail 2.1.8 : Wed Mar 19 2008 - 03:17:05 EDT |
||||||||||
|
|||||||||||