|
|||||||||||
|
Re: odd failure
From: B. Keith Murphy <kmurphy(at)icontact.com>
Date: Thu Sep 20 2007 - 13:22:05 EDT
thanks,
Keith
I have setup up a development cluster for our developers. It consists of two physical servers running the SQL daemon and data node on each one with management running on another server. About an hour an a half ago the sql node on one of the two servers stopped responding. The data node part was still responding and showing up in the ndb_mgm console. As you can see node 4 started missing heartbeats at 1:06 pm. 2007-09-19 13:06:59 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 2007-09-19 13:08:57 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 2007-09-19 13:14:43 [MgmSrvr] INFO -- Node 2: Local checkpoint 134 started. Keep GCI = 207358 oldest restorable GCI = 207369 2007-09-19 13:33:44 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 2 2007-09-19 13:33:46 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 2007-09-19 13:33:48 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 3 2007-09-19 13:33:49 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 2 2007-09-19 13:33:50 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 4 2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 3: Node 4 declared dead due to missed heartbeat 2007-09-19 13:33:50 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed 2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected 2007-09-19 13:33:50 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed 2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected 2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected 2007-09-19 13:33:53 [MgmSrvr] INFO -- Node 3: Communication to Node 4 opened 2007-09-19 13:33:54 [MgmSrvr] INFO -- Node 3: Node 4 Connected 2007-09-19 13:33:55 [MgmSrvr] INFO -- Node 2: Communication to Node 4 opened 2007-09-19 13:33:56 [MgmSrvr] INFO -- Node 2: Node 4 Connected I could log into the MySQL server node as normal and was able to switch databases and list tables. Anything against a table (select * from users for instance) would give an error 157. The two servers I have set up (each running a sql node and a data node) are running in virtual machines on the same server. So I can't figure out why the heartbeat failed. The management node is on another server, but it is on the same network. To get things going I ended up shutting everything down and restarting. I couldn't get the mysql processess on the sql nodes to shut down normally (/etc/init.d/mysql stop) but had to kill the processes on one server..on the second server I ended up rebooted the server just to shut it down. Once everything was reset it looks fine. I can start and stop the mysql nodes, etc..everything looks normal. Oh, I am running 5.1.20 all around on 64-bit debian etch. Any suggestions? thanks, Keith -- B. Keith Murphy Database Administrator iContact 2635 Meridian Parkway, 2nd Floor Durham, North Carolina 27713 blog: http://blog.paragon-cs.comReceived on Thu Sep 20 13:22:41 2007 This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 10:15:12 EDT |
||||||||||
|
|||||||||||