|
|||||||||||
|
kernel/2897: hang/panic with send() in post-Aug 8th kernelsB
From: Rukh <openbsd(at)rukh.net>
Date: Thu Aug 22 2002 - 11:42:26 EDT
System : OpenBSD 3.1 Architecture: OpenBSD.i386 Machine : i386 >Description: (note: I was having problems sending this via sendbug directly) The Problem: Repeatable kernel panic or system hang. Background: I've been trying to get nessusd (from the ports) to work on my machine for a few days and I believe I've tracked the problem down to where at one point the pare nt of a parent/child fork() does a send(socket,buffer,0,0); to the child. The 3r d argument is the killer. A length of 0 bytes to send. I quickly made up a simpler program to test this theory and sure enough on my sy stem it either hangs or panics every time. (This is running my simpler program a s a normal user. nessusd I have been running as root because it needs access to raw ip packets etc.) Nessusd usually paniced via the sys_recvfrom() system call, though it sometimes paniced in sys_write() and once or twice in some other system call. My own program either hangs the system or gives the panic that I've reproduced b elow. Systems Tried: I've tried my program on the following kernels (all GENERIC btw): 1) 3.1-release 2) 3.1-current dated 3rd August 2002 built from cvs. 3) 3.1-current dated 20th August 2002 built from cvs (the one I normally use) 4) 3.1-current dated 21st August 2002 3:42:00PM (downloaded from snapshots) My machine is a Pentium 3 (thus i386). A dmesg is included below. My Test:
I wrote and compiled a simple C program to test send().
$ gcc -o s5 s5.c
After booting my system, I log in, as a normal user (also works as root).
I then run my program:
If after around 10 seconds it returns back to the prompt, it passes. Otherwise i t will either hang or panic. There are no command line arguments or special chain of events.
I also tried running nessusd on kernels 2) and 3).
On kernel 2) there were no problems.
After making a small change to a send() call to send 1 byte rather than 0 bytes, it worked on kernel 3) Results of my tests: Kernel 1) had no problems Kernel 2) had no problems Kernel 3) either hung or paniced Kernel 4) either hung or paniced. My theory: As you can see from the panic message below the affected file is uipc_socket2.c. On August 8th a number of changes occurred to this file. All of them regarding s ome sort of buffer speedup on sockets or the like.
I believe that these changes are likely to have caused the problem I've encounte
red.
Contact Details:
I can be contacted at: openbsd@rukh.net
(Attachments below) KERNEL PANIC MESSAGES panic: kernel diagnostic assertion "sb->sb_lastrecord == NULL" failed: file "/us r/src/sys/arch/i386/compile/GENERIC/../../../../kern/uipc_socket2.c", line 812 Stopped at _Debugger+0x4: leave RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb> trace /u _Debugger(e39a6da8,d0b8a148,e39a6dd8,d01d9fdb,d0aae800) at _Debugger+0x4 _panic(d01cdaa4,d01e23db,d01e29ee,d01e2394,32c) at _panic+0x81 ___assert(d01e23db,d01e2394,32c,d01e29ee,e39a6da8) at ___assert_0x1f _sbflush(e39a6da8,d0b8a118,e39a6d78,d01e21ef,d0b8a118) at _sbflush+0xbf _sbrelease(e39a6da8,30,e39a6da8,d01e1e0f,d0b8a0cc) at _sbrelease+0x13 _sorflush(d0b8a0cc,6,e39a6e18,d01dfb64,d0b8a148,d0b8a0cc) at _sorflush+0x102 _sofree(d0b8a0cc,300042,e39a7008,d02b86f1) at _sofree+0x61 _soclose(d0b8a0cc,0,e39a7008,e386f000,e38c59c0) at _soclose+0x12f _soo_close(e38c59c0,e39a7008,e39a6ef8,d01d1f35,0) at _soo_close+0x1c _closef(e38c59c0,e39a7008,0,0,4) at _closef+0x148 _fdrelease(e39a7008,4,e39a6f80,0,e39a7008) at _fdrelease+0xcd _sys_close(e39a7008,e39a6f88,e39a6f80,d02fccef,0) at _sys_close+0x2d _syscall() at _syscall+0x25d --- syscall (number 6) --- (null)(1,cfbfd974,cfbfd97c,cfbfd9ac,2f) at 0xb673 (null)(cfbfd9ac,0,cfbfd9b1,cfbfd9bc,cfbfd9d6) at 0x109c ddb> ps PID PPID PGRP UID S FLAGS WAIT COMMAND *24513 17641 17641 1001 2 0x6 s5 17641 469 17641 1001 3 0x4086 nanosleep s5 18834 1 18834 0 3 0x4086 ttyin getty 24654 1 24654 0 3 0x4086 ttyin getty 7716 1 7716 0 3 0x4086 ttyin getty 2964 1 2964 0 3 0x4086 ttyin getty 469 1 469 1001 3 0x4086 wait bash 15316 1 15316 0 3 0x84 poll wsmoused 1785 1 1785 0 3 0x84 select cron 23658 1 23658 0 3 0x84 pause ntpd 27323 1 27323 0 3 0x84 select sshd 2089 1 2089 0 3 0x40184 select sendmail 22872 1 22872 0 3 0x184 select inetd 29311 1 6235 0 3 0x86 poll identd 4356 1 4356 0 3 0x84 netcon ftpd 25519 1 25519 0 3 0x84 bpf pflogd 23153 1 23153 0 2 0x84 syslogd 8 0 0 0 3 0x100204 apmev apm0 7 0 0 0 3 0x100204 crypto_wa crypto 6 0 0 0 3 0x100204 aiodoned aiodoned 5 0 0 0 3 0x100204 syncer update 4 0 0 0 3 0x100204 cleaner cleaner 3 0 0 0 3 0x100204 reaper reaper 2 0 0 0 3 0x100204 pgdaemon pagedaemon 1 0 0 0 3 0x4084 wait init 0 -1 0 0 3 0x80204 scheduler swapper ddb> show registers es 0xe39a0010 ds 0xd0aa0010 _end+0x4be5a0 edi 0xd01cdaa4 gcc2_compiled.+0x20 esi 0xe39a6ce4 ebp 0xe39a6cb8 ebx 0 edx 0xd01cdb2f _tablefull+0x23 ecx 0x70f eax 0x1 eip 0xd02eea68 _Debugger+0x4 cs 0x8 eflags 0x202 esp 0xe39a6cb8 ss 0xe39a0010 _Debugger+0x4: leave ddb> dmesg (from the August 20th -current kernel) (dmesg and /var/run/dmesg.boot are the same) OpenBSD 3.1-current (GENERIC) #14: Tue Aug 20 15:40:05 EST 2002
root@rukhserv.rukh.lan:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium III (Coppermine) ("GenuineIntel" 686-class) 707 MHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXS
R,SIMD
pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xf13a0/208 (11 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB PCI-ISA" rev 0x00)
pcibios0: PCI bus #2 is the last bus
bios0: ROM list: 0xc0000/0x9c00 0xcc000/0x800 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 "Intel 82815 Hub" rev 0x02 ppb0 at pci0 dev 1 function 0 "Intel 82815 AGP" rev 0x02 pci1 at ppb0 bus 1 vga1 at pci1 dev 0 function 0 "Nvidia Riva TNT2" rev 0x15 wsdisplay0 at vga1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) ppb1 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x05 pci2 at ppb1 bus 2 siop0 at pci2 dev 10 function 0 "Symbios Logic 53c810" rev 0x02: irq 7 scsibus0 at siop0: 8 targets siop0: target 6 now using 8 bit 10 MHz 8 REQ/ACK offset xfers cd0 at scsibus0 targ 6 lun 0: <MATSHITA, CD-R CW-7502, 4.17> SCSI2 5/cdrom remov able rl0 at pci2 dev 13 function 0 "Accton Technology MPX 5030/5038" rev 0x10: irq 3 address 00:04:e2:22:45:f4 rlphy0 at rl0 phy 0: RTL internal phy pcib0 at pci0 dev 31 function 0 "Intel 82801BA LPC" rev 0x05 pciide0 at pci0 dev 31 function 1 "Intel 82801BA IDE" rev 0x05: DMA, channel 0 w ired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 0: <QUANTUM FIREBALLP KA18.2> wd0: 16-sector PIO, LBA, 17624MB, 16383 cyl, 16 head, 63 sec, 36094464 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 "Intel 82801BA SMBus" rev 0x05 at pci0 dev 31 function 3 not configured isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 pmsi0 at pckbc0 (aux slot) pckbc0: using irq 12 for aux slot wsmouse0 at pmsi0 mux 0 pcppi0 at isa0 port 0x61 midi0 at pcppi0: <PC speaker> sysbeep0 at pcppi0 npx0 at isa0 port 0xf0/16: using exception 16 fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec biomask 40c0 netmask 40c8 ttymask 50ca pctr: 686-class user-level performance counters enabled mtrr: Pentium Pro MTRR support dkcsum: wd0 matched BIOS disk 80 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302 WARNING: / was not properly unmounted >How-To-Repeat:
Step 1: Compile my test program
s5.c is included here and is also available from http://www.rukh.net/s5.c s5.c (my test code) (can also be got from: http://www.rukh.net/s5.c) #include <sys/types.h> #include <sys/socket.h> #include <stdio.h> #include <sys/time.h> #include
main()
int sockets[2], child;
char buf[1];
int status;
int n;
fd_set fds;
struct timeval tv;
if (socketpair(AF_UNIX, SOCK_STREAM, 0, sockets) < 0) {
perror("opening stream socket pair");
exit(1);
}
if ((child = fork()) == -1)
perror("fork");
else if (child)
{
/* This is the parent. */
close(sockets[1]);
printf("Parent: sockets[0] = %d\n",sockets[0]);
n = send(sockets[0],buf,0,0);
sleep(10);
}
else
{
/* This is the child. */
close(sockets[0]);
printf("Child: sockets[1] = %d\n",sockets[1]);
FD_ZERO(&fds);
FD_SET(sockets[1],&fds);
tv.tv_sec = 5;
tv.tv_usec = 0;
if (select(sockets[1]+1,&fds,NULL,NULL,&tv)<=0)
{
printf("Child: read select failed\n");
close(sockets[1]);
_exit(1);
}
printf("Child select worked\n");
close(sockets[1]);
_exit(0);
}
} >Fix:
I haven't had a look at the uipc_socket2.c file, so I can't suggest a fix at present. In terms of nessusd...changing the send()'s which send 0 bytes to instead send 1 byte seems to fix it, but should not be considered tested. Considering that it works with older kernels, the onus is on the kernel to change rather than nessusd. Also, the fact that it works in my test code, and that I doubt it's a resource starvation problem which could be limited with ulimit and that it can be run by anyone with a local account and can bring down the machine...well...:) >Release-Note:
This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 13:29:36 EDT |
||||||||||
|
|||||||||||