Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: driver memory debugging

From: Trevor Talbot <quension(at)mac.com>
Date: Fri Jul 18 2003 - 19:00:16 EDT


Thanks for the tips; unfortunately most of these are already taken care of.

On Friday, Jul 18, 2003, at 14:42 US/Pacific, Erblichs wrote:

> 0) Recompile everything first. This verifes that some

Done several times, with different kernel options.

> 1) Enlarge the struct and place the element part that

>            is being re-written after the current location.

> (everytime any value in the struct is modified, print
Done. The struct currently looks like this: { struct device dv; struct arpcom deadbeef; u_int32_t pad; ...my stuff...

}

Previously, the pad wasn't there, and deadbeef had another name that was used by my code. I first noticed the problem when the ifnet part of that arpcom struct had a pointer overwritten, which of course tripped ddb. The real arpcom is now below in my stuff, and I added the pad just to move everything else in the struct. The exact same locations in the deadbeef space are still being written. The only thing that hasn't moved is the device struct, and it can't be moved due to system code. My code doesn't access anything in it.

> 2) Everytime you write to that location, write out the

I don't touch the deadbeef space at all now, except with one memset() call to make any changes visible. The only way I'm watching the contents is with ddb. The change only happens once -- I can reset the changes with ddb and they aren't changed again. This suggests some kind of initialization, but it's both delayed and inconsistent.

Do you need help?X

> 3) Place a softlock around it that is checked to verify

> 4) Use a temporary counter. Before each access decrement

No threads here. The only thing my code has to worry about is interrupt context, and that should be handled (it runs at IPL_NET). But my code doesn't do anything near the part of the struct that's changing, and as far as I can tell, isn't even being called at all. The major entry points are hardware interrupts from the device, status queries from tools like ifconfig, and a timer to check on device status. The timer is the only thing in use.

> 5) Anytime you free the struct write something like "abacad" into

This is a fixed struct allocated by the system and never freed. It's always in the same place in kernel memory, which makes monitoring and debugging easier. It's probably also why the problem happens :)

> Hope something of this helps you in the right direction..

I appreciate it, useful suggestions. Unfortunately I think I'm still stuck. As far as I've been able to tell, my code isn't responsible for doing the writes, but since this problem is only occurring with my driver, it's obviously the cause somehow.

> Trevor Talbot wrote:
>>
>> I'm working with a network driver, and have encountered a strange
>> problem that I can't figure out.  I'm looking for tips on how to go
>> about debugging it.
>>
>> The basic problem is that something is writing over a specific part of
>> my softc struct.  Moving the important parts of the struct elsewhere
>> and leaving that space unused hasn't changed anything.  Since the
>> driver struct at the beginning is the only thing that hasn't moved, 
>> I'm
>> assuming whatever is causing the problem is using it as a reference
>> point.  So far I only know that it happens during some normal 
>> interrupt
>> operation shortly after boot.  Just leaving the machine otherwise idle
>> for a few minutes and breaking into ddb periodically shows the change.
>> I also know it's not due to interrupts handled by the driver itself
>> (associated hardware has interrupts disabled).
>>
>> The driver struct occupies offsets 0x0 - 0x30.  Byte 3f is being
>> changed to 01, bytes 48-4b become 00, and bytes 78-7f become 3e 32 16
>> 3f d0 b9 0c 00 (or as a pair of i386 longwords, 3f16323e 000cb9d0).
>>
>> If this looks like a pattern to someone, cool.  If not, any 
>> suggestions
>> on how I can figure out what is writing to those locations?  Since 
>> it's
>> wired kernel memory, ddb's watch doesn't work too well.
Received on Fri Jul 18 19:22:55 2003

This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 13:48:43 EDT

Do you need more help?X

Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library