Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

kernel/2897: hang/panic with send() in post-Aug 8th kernelsB

From: Rukh <openbsd(at)rukh.net>
Date: Thu Aug 22 2002 - 11:42:26 EDT


>Number: 2897
>Category: kernel
>Synopsis: hang/panic with send() in post-Aug 8th kernels
>Confidential: no
net
>Environment:

	System      : OpenBSD 3.1
	Architecture: OpenBSD.i386
	Machine     : i386

>Description:

(note: I was having problems sending this via sendbug directly) The Problem:

Repeatable kernel panic or system hang.

Background:



I've been trying to get nessusd (from the ports) to work on my machine for a few  days and I believe I've tracked the problem down to where at one point the pare nt of a parent/child fork() does a send(socket,buffer,0,0); to the child. The 3r d argument is the killer. A length of 0 bytes to send. I quickly made up a simpler program to test this theory and sure enough on my sy stem it either hangs or panics every time. (This is running my simpler program a s a normal user. nessusd I have been running as root because it needs access to raw ip packets etc.)
Nessusd usually paniced via the sys_recvfrom() system call, though it sometimes paniced in sys_write() and once or twice in some other system call. My own program either hangs the system or gives the panic that I've reproduced b elow.

Systems Tried:



I've tried my program on the following kernels (all GENERIC btw):
1) 3.1-release
2) 3.1-current dated 3rd August 2002 built from cvs.
3) 3.1-current dated 20th August 2002 built from cvs (the one I normally use)
4) 3.1-current dated 21st August 2002 3:42:00PM (downloaded from snapshots)

My machine is a Pentium 3 (thus i386). A dmesg is included below.

My Test:


I wrote and compiled a simple C program to test send(). $ gcc -o s5 s5.c
The source is included below.

After booting my system, I log in, as a normal user (also works as root). I then run my program:
$ ./s5

If after around 10 seconds it returns back to the prompt, it passes. Otherwise i t will either hang or panic.

Do you need help?X

There are no command line arguments or special chain of events.

I also tried running nessusd on kernels 2) and 3). On kernel 2) there were no problems.
On kernel 3), as soon as I begin a scan from a remote client (A Windows box in t his case running NessusWX), it panics.

After making a small change to a send() call to send 1 byte rather than 0 bytes,  it worked on kernel 3)

Results of my tests:


Kernel 1) had no problems
Kernel 2) had no problems
Kernel 3) either hung or paniced
Kernel 4) either hung or paniced.

My theory:



As you can see from the panic message below the affected file is uipc_socket2.c.

On August 8th a number of changes occurred to this file. All of them regarding s ome sort of buffer speedup on sockets or the like.

I believe that these changes are likely to have caused the problem I've encounte red.
Kernels from before then work fine. Kernels from after then, don't.

Contact Details:


Do you need more help?X

I can be contacted at: openbsd@rukh.net
My name is Alistair Kerr (aka. Rukh)

(Attachments below)




KERNEL PANIC MESSAGES


panic: kernel diagnostic assertion "sb->sb_lastrecord == NULL" failed: file "/us r/src/sys/arch/i386/compile/GENERIC/../../../../kern/uipc_socket2.c", line 812 Stopped at _Debugger+0x4: leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb> trace /u
_Debugger(e39a6da8,d0b8a148,e39a6dd8,d01d9fdb,d0aae800) at _Debugger+0x4
_panic(d01cdaa4,d01e23db,d01e29ee,d01e2394,32c) at _panic+0x81
___assert(d01e23db,d01e2394,32c,d01e29ee,e39a6da8) at ___assert_0x1f
_sbflush(e39a6da8,d0b8a118,e39a6d78,d01e21ef,d0b8a118) at _sbflush+0xbf
_sbrelease(e39a6da8,30,e39a6da8,d01e1e0f,d0b8a0cc) at _sbrelease+0x13
_sorflush(d0b8a0cc,6,e39a6e18,d01dfb64,d0b8a148,d0b8a0cc) at _sorflush+0x102
_sofree(d0b8a0cc,300042,e39a7008,d02b86f1) at _sofree+0x61
_soclose(d0b8a0cc,0,e39a7008,e386f000,e38c59c0) at _soclose+0x12f
_soo_close(e38c59c0,e39a7008,e39a6ef8,d01d1f35,0) at _soo_close+0x1c
_closef(e38c59c0,e39a7008,0,0,4) at _closef+0x148
_fdrelease(e39a7008,4,e39a6f80,0,e39a7008) at _fdrelease+0xcd
_sys_close(e39a7008,e39a6f88,e39a6f80,d02fccef,0) at _sys_close+0x2d
_syscall() at _syscall+0x25d

--- syscall (number 6) ---
(null)(1,cfbfd974,cfbfd97c,cfbfd9ac,2f) at 0xb673 (null)(cfbfd9ac,0,cfbfd9b1,cfbfd9bc,cfbfd9d6) at 0x109c ddb> ps
   PID   PPID    PGRP    UID    S     FLAGS     WAIT      COMMAND
*24513  17641   17641   1001    2       0x6               s5
 17641    469   17641   1001    3    0x4086     nanosleep s5
 18834      1   18834      0    3    0x4086     ttyin     getty
 24654      1   24654      0    3    0x4086     ttyin     getty
  7716      1    7716      0    3    0x4086     ttyin     getty
  2964      1    2964      0    3    0x4086     ttyin     getty
   469      1     469   1001    3    0x4086     wait      bash
Can we help you?X
15316 1 15316 0 3 0x84 poll wsmoused 1785 1 1785 0 3 0x84 select cron 23658 1 23658 0 3 0x84 pause ntpd 27323 1 27323 0 3 0x84 select sshd 2089 1 2089 0 3 0x40184 select sendmail 22872 1 22872 0 3 0x184 select inetd 29311 1 6235 0 3 0x86 poll identd 4356 1 4356 0 3 0x84 netcon ftpd 25519 1 25519 0 3 0x84 bpf pflogd 23153 1 23153 0 2 0x84 syslogd 8 0 0 0 3 0x100204 apmev apm0 7 0 0 0 3 0x100204 crypto_wa crypto 6 0 0 0 3 0x100204 aiodoned aiodoned 5 0 0 0 3 0x100204 syncer update 4 0 0 0 3 0x100204 cleaner cleaner 3 0 0 0 3 0x100204 reaper reaper 2 0 0 0 3 0x100204 pgdaemon pagedaemon 1 0 0 0 3 0x4084 wait init 0 -1 0 0 3 0x80204 scheduler swapper ddb> show registers es 0xe39a0010 ds 0xd0aa0010 _end+0x4be5a0 edi 0xd01cdaa4 gcc2_compiled.+0x20 esi 0xe39a6ce4 ebp 0xe39a6cb8 ebx 0 edx 0xd01cdb2f _tablefull+0x23 ecx 0x70f eax 0x1 eip 0xd02eea68 _Debugger+0x4 cs 0x8 eflags 0x202 esp 0xe39a6cb8 ss 0xe39a0010

_Debugger+0x4: leave
ddb>


dmesg (from the August 20th -current kernel) (dmesg and /var/run/dmesg.boot are the same)

OpenBSD 3.1-current (GENERIC) #14: Tue Aug 20 15:40:05 EST 2002

    root@rukhserv.rukh.lan:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel Pentium III (Coppermine) ("GenuineIntel" 686-class) 707 MHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXS R,SIMD
real mem = 267935744 (261656K)
avail mem = 242442240 (236760K)
using 3296 buffers containing 13500416 bytes (13184K) of memory mainbus0 (root)
bios0 at mainbus0: AT/286+(a1) BIOS, date 01/04/02, BIOS32 rev. 0 @ 0xf0c30 apm0 at bios0: Power Management spec V1.2 (BIOS mgmt disabled) apm0: AC on, battery charge unknown
pcibios0 at bios0: rev. 2.1 @ 0xf0000/0x1472

pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xf13a0/208 (11 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB PCI-ISA" rev 0x00)
Can't find what you're looking for?X
pcibios0: PCI bus #2 is the last bus

bios0: ROM list: 0xc0000/0x9c00 0xcc000/0x800 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 "Intel 82815 Hub" rev 0x02 ppb0 at pci0 dev 1 function 0 "Intel 82815 AGP" rev 0x02 pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "Nvidia Riva TNT2" rev 0x15 wsdisplay0 at vga1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) ppb1 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x05 pci2 at ppb1 bus 2
siop0 at pci2 dev 10 function 0 "Symbios Logic 53c810" rev 0x02: irq 7 scsibus0 at siop0: 8 targets
siop0: target 6 now using 8 bit 10 MHz 8 REQ/ACK offset xfers cd0 at scsibus0 targ 6 lun 0: <MATSHITA, CD-R CW-7502, 4.17> SCSI2 5/cdrom remov able
rl0 at pci2 dev 13 function 0 "Accton Technology MPX 5030/5038" rev 0x10: irq 3 address 00:04:e2:22:45:f4
rlphy0 at rl0 phy 0: RTL internal phy
pcib0 at pci0 dev 31 function 0 "Intel 82801BA LPC" rev 0x05 pciide0 at pci0 dev 31 function 1 "Intel 82801BA IDE" rev 0x05: DMA, channel 0 w ired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 0: <QUANTUM FIREBALLP KA18.2> wd0: 16-sector PIO, LBA, 17624MB, 16383 cyl, 16 head, 63 sec, 36094464 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 "Intel 82801BA SMBus" rev 0x05 at pci0 dev 31 function 3 not configured isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0 pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0/16: using exception 16 fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec biomask 40c0 netmask 40c8 ttymask 50ca
pctr: 686-class user-level performance counters enabled mtrr: Pentium Pro MTRR support
dkcsum: wd0 matched BIOS disk 80
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302
WARNING: / was not properly unmounted

>How-To-Repeat:

Step 1: Compile my test program
$ gcc -o s5 s5.c
Step 2: Run it.
$ ./s5
Step 3: wait for about 10 seconds...it will either return to the command prompt (in which case it's ok), or it will hang or panic.

s5.c is included here and is also available from http://www.rukh.net/s5.c



s5.c (my test code) (can also be got from: http://www.rukh.net/s5.c)
#include <sys/types.h>
#include <sys/socket.h>
#include <stdio.h>
#include <sys/time.h>
#include 
#include 

main()
{

        int sockets[2], child;
        char buf[1];
        int status;
        int n;
        fd_set fds;
        struct timeval tv;

        if (socketpair(AF_UNIX, SOCK_STREAM, 0, sockets) < 0) {
                perror("opening stream socket pair");
                exit(1);
        }

        if ((child = fork()) == -1)
                perror("fork");
        else if (child)
        {
          /* This is the parent. */
          close(sockets[1]);
          printf("Parent: sockets[0] = %d\n",sockets[0]);
          n = send(sockets[0],buf,0,0);
          sleep(10);
        }
        else
        {
          /* This is the child. */
          close(sockets[0]);
          printf("Child: sockets[1] = %d\n",sockets[1]);
          FD_ZERO(&fds);
          FD_SET(sockets[1],&fds);
          tv.tv_sec = 5;
          tv.tv_usec = 0;
          if (select(sockets[1]+1,&fds,NULL,NULL,&tv)<=0)
          {
            printf("Child: read select failed\n");
            close(sockets[1]);
           _exit(1);
          }
          printf("Child select worked\n");
          close(sockets[1]);
          _exit(0);
        }

}
Don't know where to look next?X

>Fix:

        I haven't had a look at the uipc_socket2.c file, so I can't suggest a fix at present. In terms of nessusd...changing the send()'s which send 0 bytes to instead send 1 byte seems to fix it, but should not be considered tested. Considering that it works with older kernels, the onus is on the kernel to change rather than nessusd. Also, the fact that it works in my test code, and that I doubt it's a resource starvation problem which could be limited with ulimit and that it can be run by anyone with a local account and can bring down the machine...well...:)

>Release-Note:
 To: gnats@openbsd.org
 Subject: hang/panic with send() in post-Aug 8th kernels  From: openbsd@rukh.net
 Cc:
 Reply-To: openbsd@rukh.net
 X-sendbug-version: 3.97 Received on Thu Nov 7 16:16:13 2002

This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 13:29:36 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library