Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: odd failure

From: B. Keith Murphy <kmurphy(at)icontact.com>
Date: Thu Sep 20 2007 - 13:22:05 EDT


So, doing some work on this. I am fairly certain that the issue is related to latencies of the virtual machine. I have allocated some more memory to the two servers and am monitoring them with ganglia to see what happens. It just might not be possible to have a reasonable environment with these VM's.

thanks,

Keith
----- Original Message -----
From: "B. Keith Murphy" <kmurphy@icontact.com> To: "cluster" <cluster@lists.mysql.com> Sent: Wednesday, September 19, 2007 3:15:41 PM (GMT-0500) America/New_York Subject: odd failure

I have setup up a development cluster for our developers. It consists of two physical servers running the SQL daemon and data node on each one with management running on another server.

About an hour an a half ago the sql node on one of the two servers stopped responding. The data node part was still responding and showing up in the ndb_mgm console. As you can see node 4 started missing heartbeats at 1:06 pm.

2007-09-19 13:06:59 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 
2007-09-19 13:08:57 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 
2007-09-19 13:14:43 [MgmSrvr] INFO -- Node 2: Local checkpoint 134 started. Keep GCI = 207358 oldest restorable GCI = 207369 
2007-09-19 13:33:44 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 2 
2007-09-19 13:33:46 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 
2007-09-19 13:33:48 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 3 
2007-09-19 13:33:49 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 2 
2007-09-19 13:33:50 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 4 
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 3: Node 4 declared dead due to missed heartbeat 
2007-09-19 13:33:50 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed 
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected 
2007-09-19 13:33:50 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed 
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected 
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected 
2007-09-19 13:33:53 [MgmSrvr] INFO -- Node 3: Communication to Node 4 opened 
2007-09-19 13:33:54 [MgmSrvr] INFO -- Node 3: Node 4 Connected 
2007-09-19 13:33:55 [MgmSrvr] INFO -- Node 2: Communication to Node 4 opened 
2007-09-19 13:33:56 [MgmSrvr] INFO -- Node 2: Node 4 Connected 

I could log into the MySQL server node as normal and was able to switch databases and list tables. Anything against a table (select * from users for instance) would give an error 157.

The two servers I have set up (each running a sql node and a data node) are running in virtual machines on the same server. So I can't figure out why the heartbeat failed. The management node is on another server, but it is on the same network.

To get things going I ended up shutting everything down and restarting. I couldn't get the mysql processess on the sql nodes to shut down normally (/etc/init.d/mysql stop) but had to kill the processes on one server..on the second server I ended up rebooted the server just to shut it down. Once everything was reset it looks fine. I can start and stop the mysql nodes, etc..everything looks normal.

Do you need help?X

Oh, I am running 5.1.20 all around on 64-bit debian etch.

Any suggestions?

thanks,

Keith

-- 
B. Keith Murphy 
Database Administrator 
iContact 
2635 Meridian Parkway, 2nd Floor 
Durham, North Carolina 27713 
blog: 
http://blog.paragon-cs.com 

(o) 919-433-0786
(c) 850-637-3877
-- B. Keith Murphy Database Administrator iContact 2635 Meridian Parkway, 2nd Floor Durham, North Carolina 27713 blog: http://blog.paragon-cs.com
(o) 919-433-0786
(c) 850-637-3877
Received on Thu Sep 20 13:22:41 2007

This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 10:15:12 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library