|
|||||||||||
|
3.2-stable: xl(4) driver (3C509B) system freeze?
From: Ewen McNeill <ewen(at)naos.co.nz>
Date: Tue Dec 10 2002 - 16:14:41 EST The system was very stable under OpenBSD 3.1-stable, running for months at a time between manual reboots (for patching, etc). The same system froze (completely, no response in 2+ minutes, requiring a reset), repeatedly, during the upgrade to OpenBSD 3.2 (ie, booted off floppy32.fs), between 10% and 50% into FTPing base32.tgz off a local server. (It hung every time I tried to do a FTP upgrade. I eventually finished the upgrade by FTPing all the files when booted under 3.1, and then doing a mounted-disk upgrade.) When the system froze, it was locked up solid -- no response to ping on the network, no response to the keyboard, no error messages generated, no kernel panic. Even waiting a couple of minutes for a timeout yielded no response. Just completely frozen requiring the reset button. After the upgrade to 3.2-stable (I built and installed all the 3.2-erata patches), the system froze in a similar fashion (locked up hard, no error messages) 5 times in the next 36 hours. (The same system ran 3.1-stable for 114 days since its previous reboot, so I don't think it is a hardware issue per se.)
During the various reboots I tried:
Most recently (about 36 hours ago now) having seen a number of xl(4) driver updates between 3.1 and 3.2 in the changes list, including some 3C509B-specific changes, I pulled out the two 3C509B cards and replaced them with two other spare cards I had (a de(4) card and a dc(4) card). With no xl(4) cards in the system, just the de(4) and dc(4) cards the system has run continuously for 36 hours without any apparent issues, doing the same things that were causing it to freeze with the xl(4) cards in use. Which strongly points at the xl(4) driver changes being the issue. Has anyone else seen anything similar? Ie, issues with either the xl(4) driver or 3C509B cards under OpenBSD 3.2? Or have any suggestions for what to try next? There don't appear to be any flags for the xl(4) driver to selectively disable some of the newer functionality (TCP/IP checksum offloading, DMA to the board, etc). Based on the times when the system froze, the issue appears to be triggered by "high" network traffic, perhaps triggering some sort of race condition, possibly made worse by being a Pentium 100. (Most of the freeze times are very close to my machine-to-machine rsync backups times. And of course the freezes during the install where during the FTP download phase.) The most reliable way to reproduce the issue appears to be a FTP download with the floppy32.fs kernel from a local server -- it froze _every_ time during that download within 1-2 minutes of starting the download. I've sent two bug reports about this (the second, because I didn't get an acknowledgement to the first in the sort of time I was expecting; both were eventually acnowledged by gnats about 24 hours later): http://www.sigmasoft.com/~openbsd/archive/openbsd-bugs/200212/msg00069.html http://www.sigmasoft.com/~openbsd/archive/openbsd-bugs/200212/msg00070.html but not seen any response to them (they had less information than this post, because I hadn't got as far as pulling the xl(4) cards out at that point). Finally the dmesg output of a 3.2-stable boot when the xl(4)/3C509B cards were in the system:
-=- cut here -=-
ewen@vm-openbsd.em.naos.co.nz:/home/ewen/kernel/build cpu0: F00F bug workaround installed cpu0: Intel Pentium (P54C) ("GenuineIntel" 586-class) 99 MHz cpu0: FPU,V86,DE,PSE,TSC,MSR,MCE,CX8 real mem = 49922048 (48752K) avail mem = 40730624 (39776K) using 635 buffers containing 2600960 bytes (2540K) of memory mainbus0 (root) bios0 at mainbus0: AT/286+(bd) BIOS, date 05/05/98, BIOS32 rev. 0 @ 0xfcd20 pcibios0 at bios0: rev. 2.1 @ 0xf0000/0x69c
pcibios0: PCI BIOS has 5 Interrupt Routing table entries
pcibios0: PCI Interrupt Router at 000:01:0 ("SIS 85C503 ISA" rev 0x00)
pcibios0: PCI bus #0 is the last bus
bios0: ROM list: 0xc0000/0x8000 pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 0 function 0 "SIS 5511" rev 0x00 pcib0 at pci0 dev 1 function 0 "SIS 85C503 ISA" rev 0x01 pciide0 at pci0 dev 1 function 1 "SIS 5513 EIDE" rev 0x07: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility wd0 at pciide0 channel 0 drive 0: <WDC AC2540F> wd0: 16-sector PIO, LBA, 515MB, 1048 cyl, 16 head, 63 sec, 1056384 sectors wd0(pciide0:0:0): using PIO mode 3 xl0 at pci0 dev 10 function 0 "3Com 3c905B 100Base-TX" rev 0x30: irq 12 address 00:04:76:73:04:75 exphy0 at xl0 phy 24: 3Com internal media interface xl1 at pci0 dev 11 function 0 "3Com 3c905B 100Base-TX" rev 0x30: irq 10 address 00:04:76:73:4e:2a exphy1 at xl1 phy 24: 3Com internal media interface vga1 at pci0 dev 12 function 0 "S3 Trio64V2/DX" rev 0x14 wsdisplay0 at vga1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) isa0 at pcib0 isadma0 at isa0 ast0 at isa0 port 0x1a0/32 irq 5 pccom3 at ast0 slave 0: ns16550a, 16 byte fifo pccom4 at ast0 slave 1: ns16550a, 16 byte fifo pccom5 at ast0 slave 2: ns16550a, 16 byte fifo pccom6 at ast0 slave 3: ns16550a, 16 byte fifopckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 pcppi0 at isa0 port 0x61 midi0 at pcppi0: <PC speaker> sysbeep0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 npx0 at isa0 port 0xf0/16: using exception 16 pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec biomask 4040 netmask 5440 ttymask 54e2 pctr: 586-class performance counters and user-level cycle counter enabled dkcsum: wd0 matched BIOS disk 80 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302 WARNING: / was not properly unmounted lpt0: offline lpt0: output error -=- cut here-=- Ewen Received on Tue Dec 10 16:17:11 2002 This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 13:48:27 EDT |
||||||||||
|
|||||||||||