Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Crash cluster after crash server.

From: Fabien FAYE <ffaye(at)dclux.com>
Date: Tue Aug 28 2007 - 07:28:07 EDT


Hi,

We have a cluster based on 2 servers with mysql 5.0.27. On each server, we have manager, mysqlD and NDBD.

I know it is not recommended by mysql to have manager, mysqld and ndbd on the same server, but in this case it is architecture reason.

Node 1 : Manager 1
Node 2 : Manager 2

Node 10 : NDBD1
Node 11 : NDBD2
Node 20 : Mysql Api 1
Node 21 : Mysql Api 2

On server 1 we have: Node 1,Node 10,Node 20 On server 2 we have: Node 2,Node 11,Node 21

We have tested on the Node 10 or Node 11 some crash test.

But during the crash test of Node 10, we have a shutdown few minutes after of Node 11, and this mistake could be reproducible. I have check on other log file to find something and I have read, error during arbitration

On the configuration file we have define of each manager this things :

Do you need help?X

ArbitrationRank=1 on manager 1
ArbitrationRank=2 on manager 2

My questions :

Do you have already seen this problem ? (I have found some similar bugs on MYSQL but not in the same case) This problem could be generate by the arbitration Rank ?

Thanks for your help!!

Manager 1 log :

2007-08-23 15:40:39 [MgmSrvr] INFO     -- Node 10: Local checkpoint 186 started. Keep GCI = 69786 oldest restorable GCI = 26677
2007-08-23 15:55:04 [MgmSrvr] INFO     -- Node 10: Local checkpoint 187 started. Keep GCI = 70173 oldest restorable GCI = 26677
2007-08-23 16:03:39 [MgmSrvr] INFO     -- Node 10: Local checkpoint 188 started. Keep GCI = 70555 oldest restorable GCI = 70244
2007-08-23 16:17:25 [MgmSrvr] ALERT    -- Node 10: Node 20 Disconnected
2007-08-23 16:17:25 [MgmSrvr] INFO     -- Node 10: Communication to Node 20 closed
2007-08-23 16:17:25 [MgmSrvr] ALERT    -- Node 11: Node 20 Disconnected
2007-08-23 16:17:25 [MgmSrvr] INFO     -- Node 11: Communication to Node 20 closed
2007-08-23 16:17:26 [MgmSrvr] INFO     -- Mgmt server state: nodeid 20 freed, m_reserved_nodes 0000000000200002.
2007-08-23 16:17:26 [MgmSrvr] INFO     -- Node 10: Node shutdown initiated
2007-08-23 16:17:29 [MgmSrvr] INFO     -- Node 11: Communication to Node 20 opened
2007-08-23 16:17:29 [MgmSrvr] INFO     -- Node 10: Communication to Node 20 opened
2007-08-23 16:17:35 [MgmSrvr] INFO     -- Node 1: Node 10 Connected
2007-08-23 16:17:35 [MgmSrvr] INFO     -- Node 1: Node 11 Connected

Manager 2 Log :

2007-08-23 16:17:25 [MgmSrvr] ALERT    -- Node 11: Node 20 Disconnected
2007-08-23 16:17:25 [MgmSrvr] INFO     -- Node 11: Communication to Node 20 closed
2007-08-23 16:17:25 [MgmSrvr] ALERT    -- Node 10: Node 20 Disconnected
2007-08-23 16:17:25 [MgmSrvr] INFO     -- Node 10: Communication to Node 20 closed
2007-08-23 16:17:26 [MgmSrvr] INFO     -- Node 10: Node shutdown initiated
2007-08-23 16:17:29 [MgmSrvr] INFO     -- Node 11: Communication to Node 20 opened
2007-08-23 16:17:29 [MgmSrvr] INFO     -- Node 10: Communication to Node 20 opened
Do you need more help?X
2007-08-23 16:17:32 [MgmSrvr] WARNING -- Node 11: Node 1 missed heartbeat 2 2007-08-23 16:17:33 [MgmSrvr] WARNING -- Node 11: Node 10 missed heartbeat 2 2007-08-23 16:17:33 [MgmSrvr] WARNING -- Node 11: Node 1 missed heartbeat 3 2007-08-23 16:17:35 [MgmSrvr] WARNING -- Node 11: Node 10 missed heartbeat 3 2007-08-23 16:17:35 [MgmSrvr] WARNING -- Node 11: Node 1 missed heartbeat 4 2007-08-23 16:17:35 [MgmSrvr] ALERT -- Node 11: Node 1 declared dead due to missed heartbeat 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 11: Lost arbitrator node 1 - process failure [state=6] 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 11: Communication to Node 1 closed 2007-08-23 16:17:35 [MgmSrvr] ALERT -- Node 11: Node 1 Disconnected 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 2: Node 10 Connected 2007-08-23 16:17:38 [MgmSrvr] INFO -- Node 2: Node 11 Connected 2007-08-23 16:23:57 [MgmSrvr] ALERT -- Node 11: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary er ________________________________
--------------------------------------------------------

This e-mail and any attached files are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail by mistake, please notify the sender immediately and delete it from your system. You must not copy the message or disclose its contents to anyone.


Received on Tue Aug 28 07:32:19 2007

This archive was generated by hypermail 2.1.8 : Sun Oct 07 2007 - 10:15:04 EDT

Can we help you?X

Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library