Re: nodes on two seperate lans hangs
Hi Vincent.
> My cluster, with two data nodes, works fine when all nodes are on the > same network. But when configuring the cluster exactly the same way but > replacing a data node with a node on a remote network, it hang up in > starting phase (I did the --initial start after modifying the conf). > Maybe you can help or give me some hints
If I were you, I would use lsof and tcpdump/ngrep to find out exactly on
what interfaces/IP addresses and ports the ndbd processes are listening
and to what addresses and ports they try to connect, I would also see if
I can connect to those IP addresses and ports using telnet.
> manager is on host NET1.5 > > -- NDB Cluster -- Management Client -- > ndb_mgm> show ; > Connected to Management Server at: localhost:1186 > Cluster Configuration > --------------------- > [ndbd(NDB)] 2 node(s) > id=3 @NET1.6 (Version: 5.1.20, starting, Nodegroup: 0) > id=7 @NET2.178 (Version: 5.1.20, starting, Nodegroup: 0) > > [ndb_mgmd(MGM)] 1 node(s) > id=1 (Version: 5.1.20) > > [mysqld(API)] 3 node(s) > id=4 (not connected, accepting connect from NET1.6) > id=6 (not connected, accepting connect from NET1.7) > id=8 (not connected, accepting connect from NET2.178) > > > Remark: I am suprised here I have no master ? When all machines are on > the same lan NET1.0/24, node 3 becomes master and API nodes connect.
I think because the data nodes can not contact each other. If you fix
the problem with communication then they will elect the master.
> I get messages in the ndb_mgmd log I don't understand: > > 2007-07-25 14:09:30 [MgmSrvr] INFO -- Node 3: Initial start, waiting > for 0000000000000080 to connect, nodes [ all: 0000000000000088 > connected: 0000000000000008 no-wait: 0000000000000000 ] > 2007-07-25 14:09:32 [MgmSrvr] INFO -- Node 7: Initial start, waiting > for 0000000000000008 to connect, nodes [ all: 0000000000000088 > connected: 0000000000000080 no-wait: 0000000000000000 ] > > How do I interpret "0000000000000080" and "0000000000000008" as machine > identifiers ? Can you give details on these identifiers ?
It is something like this. They are not machine identifiers, they are
nodes and node sets. Node 1 is encoded as 0000000000000001, Node 2 is
encoded as 0000000000000002, Node 3 is encoded as 0000000000000004, Node
4 is encoded as 0000000000000008, Node 5 - 0000000000000010 and so on.
> NB: firewall is wide open between all these hosts (nothing is blocked, > tcpdump report a lot of traffic between manager and nodes, but no > traffic at all between the 2 data nodes). > NB: If I revert to a "all nodes on the same lan" config, evrything > restart normally. > NB: I checked the configuration for possible stupid errors, but this > looks nice right now. > NB: hosts on seperate lans have 100Mb/s connections between them. > > My final question is: "Even if not recommended, is it possible to have > such a configuration with data nodes on seprate networks ?"
If the connection between the networks is really 100 Mb/s then it should
be possible.
Regards,
Anatoly.
--
MySQL Cluster Mailing List
For list archives:
http://lists.mysql.com/cluster
To unsubscribe:
http://lists.mysql.com/cluster?unsub=lists@pantek.com
Received on Wed Jul 25 19:34:31 2007
This archive was generated by hypermail 2.1.8
: Thu Aug 09 2007 - 19:30:34 EDT
|