RE: Crash cluster after crash server.
Hi,
When the arbitrator goes down the cluster has to hold an election. The
problem here is you not just loosing an arbitrator but 1/2 the cluster as
well, so there is no time to hold an election.
The cluster sees this as a potential split brain and act properly by
shutting itself down.
This is why it is recommended that arbitrators are not on the same hosts as
your data nodes.
Best wishes,
/Jeb
Jonathan Miller
Austin, Texas USA
Senior Lead Quality Assurance Developer
MySQL AB www.mysql.com
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ /
/ /|_/ / // /\ \/ /_/ / /__
/_/ /_/\_, /___/\___\_\___/
<___/ www.mysql.com
Jumpstart your cluster!
http://www.mysql.com/consulting/packaged/cluster.html
Get training on clusters
http://www.mysql.com/training/courses/mysql_cluster.html
All-in-one Enterprise-grade Database, Support and Services
http://www.mysql.com/network/ > -----Original Message----- > From: Fabien FAYE [mailto:ffaye@dclux.com] > Sent: Tuesday, August 28, 2007 6:28 AM > To: 'cluster@lists.mysql.com' > Subject: Crash cluster after crash server. > > Hi, > > We have a cluster based on 2 servers with mysql 5.0.27. > On each server, we have manager, mysqlD and NDBD. > > I know it is not recommended by mysql to have manager, mysqld and ndbd on > the same server, but in this case it is architecture reason. > > Node 1 : Manager 1 > Node 2 : Manager 2 > Node 10 : NDBD1 > Node 11 : NDBD2 > Node 20 : Mysql Api 1 > Node 21 : Mysql Api 2 > > On server 1 we have: Node 1,Node 10,Node 20 > On server 2 we have: Node 2,Node 11,Node 21 > > We have tested on the Node 10 or Node 11 some crash test. > > But during the crash test of Node 10, we have a shutdown few minutes after > of Node 11, and this mistake could be reproducible. > I have check on other log file to find something and I have read, error > during arbitration > > On the configuration file we have define of each manager this things : > > ArbitrationRank=1 on manager 1 > ArbitrationRank=2 on manager 2 > > My questions : > > Do you have already seen this problem ? (I have found some similar bugs on > MYSQL but not in the same case) > This problem could be generate by the arbitration Rank ? > > Thanks for your help!! > > Manager 1 log : > > 2007-08-23 15:40:39 [MgmSrvr] INFO -- Node 10: Local checkpoint 186 > started. Keep GCI = 69786 oldest restorable GCI = 26677 > 2007-08-23 15:55:04 [MgmSrvr] INFO -- Node 10: Local checkpoint 187 > started. Keep GCI = 70173 oldest restorable GCI = 26677 > 2007-08-23 16:03:39 [MgmSrvr] INFO -- Node 10: Local checkpoint 188 > started. Keep GCI = 70555 oldest restorable GCI = 70244 > 2007-08-23 16:17:25 [MgmSrvr] ALERT -- Node 10: Node 20 Disconnected > 2007-08-23 16:17:25 [MgmSrvr] INFO -- Node 10: Communication to Node > 20 closed > 2007-08-23 16:17:25 [MgmSrvr] ALERT -- Node 11: Node 20 Disconnected > 2007-08-23 16:17:25 [MgmSrvr] INFO -- Node 11: Communication to Node > 20 closed > 2007-08-23 16:17:26 [MgmSrvr] INFO -- Mgmt server state: nodeid 20 > freed, m_reserved_nodes 0000000000200002. > 2007-08-23 16:17:26 [MgmSrvr] INFO -- Node 10: Node shutdown initiated > 2007-08-23 16:17:29 [MgmSrvr] INFO -- Node 11: Communication to Node > 20 opened > 2007-08-23 16:17:29 [MgmSrvr] INFO -- Node 10: Communication to Node > 20 opened > 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 1: Node 10 Connected > 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 1: Node 11 Connected > > Manager 2 Log : > > 2007-08-23 16:17:25 [MgmSrvr] ALERT -- Node 11: Node 20 Disconnected > 2007-08-23 16:17:25 [MgmSrvr] INFO -- Node 11: Communication to Node > 20 closed > 2007-08-23 16:17:25 [MgmSrvr] ALERT -- Node 10: Node 20 Disconnected > 2007-08-23 16:17:25 [MgmSrvr] INFO -- Node 10: Communication to Node > 20 closed > 2007-08-23 16:17:26 [MgmSrvr] INFO -- Node 10: Node shutdown initiated > 2007-08-23 16:17:29 [MgmSrvr] INFO -- Node 11: Communication to Node > 20 opened > 2007-08-23 16:17:29 [MgmSrvr] INFO -- Node 10: Communication to Node > 20 opened > 2007-08-23 16:17:32 [MgmSrvr] WARNING -- Node 11: Node 1 missed heartbeat > 2 > 2007-08-23 16:17:33 [MgmSrvr] WARNING -- Node 11: Node 10 missed > heartbeat 2 > 2007-08-23 16:17:33 [MgmSrvr] WARNING -- Node 11: Node 1 missed heartbeat > 3 > 2007-08-23 16:17:35 [MgmSrvr] WARNING -- Node 11: Node 10 missed > heartbeat 3 > 2007-08-23 16:17:35 [MgmSrvr] WARNING -- Node 11: Node 1 missed heartbeat > 4 > 2007-08-23 16:17:35 [MgmSrvr] ALERT -- Node 11: Node 1 declared dead > due to missed heartbeat > 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 11: Lost arbitrator node 1 > - process failure [state=6] > 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 11: Communication to Node 1 > closed > 2007-08-23 16:17:35 [MgmSrvr] ALERT -- Node 11: Node 1 Disconnected > 2007-08-23 16:17:35 [MgmSrvr] INFO -- Node 2: Node 10 Connected > 2007-08-23 16:17:38 [MgmSrvr] INFO -- Node 2: Node 11 Connected > 2007-08-23 16:23:57 [MgmSrvr] ALERT -- Node 11: Forced node shutdown > completed. Initiated by signal 0. Caused by error 2305: 'Node lost > connection to other nodes and can not form a unpartitioned cluster, please > investigate if there are error(s) on other node(s)(Arbitration error). > Temporary er > > > ________________________________ > -------------------------------------------------------- > > This e-mail and any attached files are confidential and intended solely > for the use of the individual or entity to whom they are addressed. If you > have received this e-mail by mistake, please notify the sender immediately > and delete it from your system. You must not copy the message or disclose > its contents to anyone. > > --------------------------------------------------------
--
MySQL Cluster Mailing List
For list archives:
http://lists.mysql.com/cluster
To unsubscribe:
http://lists.mysql.com/cluster?unsub=lists@pantek.com
Received on Tue Aug 28 13:16:52 2007
This archive was generated by hypermail 2.1.8
: Sun Oct 07 2007 - 10:15:04 EDT
|