Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

"le0: dropping chained buffer", OpenBSD 3.1 on Sparc 5 (sun4m)

From: Rich Kulawiec <rsk(at)gsp.org>
Date: Sat Aug 16 2003 - 15:46:57 EDT


I have two 170 MHz, 256M Sparc 5's running 3.1 on a small network connected via T-1 to the Internet. They have identical configurations.
(They were built on the same day from the same checklist.) They run
a handful of services: SSH, DNS, SMTP and POP. They've been running smoothly without a reboot for about 4 months. The load average seldom exceeds .5, as they really aren't asked to do very much.

Yesterday afternoon around 13:00, both of them simultaneously became unresponsive to all network traffic -- TCP, UDP, ARP, ICMP, everything.

Some minutes later (I estimate 10, but could easily be off) the problem went away; but it soon came back. Each time, there was at least one entry of the form "le0: dropping chained buffer" and a number of syslog entries indicating that the message was repeated.

I rebooted one machine at around 14:50; while that temporarily cleared the problem, it did not appear to have any lasting effect, as that system showed the same symptom again within 20 minutes.

However, neither system has shown any sign of this problem since about 16:00 yesterday. I have no explanation for that.

Sniffing network traffic with another system didn't show much beyond normal traffic and various systems around the 'net infected with the M$ RPC DCOM worm and looking for more.

This error is apparently rare: a search of the last several years' worth of archives of the OpenBSD, NetBSD, and FreeBSD mailing lists turned up only a few mentions of it, and all of them were of the form "What the hell is this?". A Google search was similarly fruitless.

So I went to the source. I have traced this to the following bit of code in am7990.c (in sys/dev/ic):

	   } else if ((rmd.rmd1_bits & (LE_R1_STP | LE_R1_ENP)) !=
                    (LE_R1_STP | LE_R1_ENP)) {
                        printf("%s: dropping chained buffer\n",
                            sc->sc_dev.dv_xname);
                        ifp->if_ierrors++;
Do you need help?X

This is inside am7990_rint(), which handles data receive interrupts. It's after code which looks for framing errors and crc errors, so I think those can be ruled out.

I managed to find the manufacturer's spec sheet for the LANCE chip (which is what this is about) on AMD's web site, thanks to a URL given in am7990reg.h.
(It's 17881.pdf, if you want to find it on AMD's site.)

Between reading the spec sheet and the driver source (wow, it's been a LONG time since I've done this) my impression is that the reason the interface shut down was that ifp->if_ierrors became large enough to merit taking it offline for a bit, then resetting it and trying again. In other words, I think that was a symptom, not the cause.

Trying to find the cause brings me back to that bit of code above and what conditions can trigger it. In am7990reg.h, we find:

#define     LE_R1_STP       0x02            /* start of packet */
#define     LE_R1_ENP       0x01            /* end of packet */

so I believe the test above is checking to see if the corresponding bits (0x03) in rmd.rmd1_bits are both set.

This seems to match up with 28 of the 7990 ("LANCE") data sheet, which says:

	STP	START OF PACKET indicates that this is the first buffer
		used by the C-LANCE for this packet.  It is used for
		data chaining buffers.

	ENP	END OF PACKET indicates that this is the last buffer used
		by the C-LANCE for this packet.  It is used for data chaining
		buffers.  If both STP and ENP are set, the packet fits in one
		buffer and there is no data chaining.

It's that last sentence that has me confused. I think the piece of code above is testing for exactly that condition, so I would expect that condition to be true if data was not being chained (across multiple buffers).

(Aside: the LANCE data sheet goes on about this for a while, and diagrams
can be found on page 32.)

Do you need more help?X

So the best I can come up with at the moment, is that something has wrong while receiving a packet, and it's gone wrong at a pretty low level, i.e. this doesn't seem to have anything to do with higher network layers. And it seems to have something to do with the driver's method of storing the packet -- i.e., it doesn't look like a malformed packet on the wire.

I think I'll stop here, because I think one explanation for my confusion is that I've misread something or made another kind of mistake. If I have, I'd appreciate it if someone could point it out. But whether I have or haven't, any guidance on what might be causing this (and of course, how I can fix it) would be most welcome.

Thanks,
---Rsk Received on Sat Aug 16 16:13:08 2003

This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 13:48:43 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library