|
|||||||||||
|
I don't understand how SHOW SLAVE HOSTS works
From: Baron Schwartz <baron(at)xaprb.com>
Date: Tue Jul 17 2007 - 09:50:51 EDT
I'm sure I am commiting several sins here including looking at 5.1 code while running
5.0.40, but the code I'm looking at is the Doxygen-ized 5.1 code of
sql/repl_failsafe.cc at
So far, I find there is a hash table called slave_list, which is inserted from register_slave() and read from the function (whose name I have now forgotten and can't see) called by SHOW SLAVE HOSTS. It looks to me like the command doesn't work quite like it is supposed to. It looks like each slave is supposed to always know what other slaves are connected at all times, because each slave reports to and reads from its master, and the master updates the slave whenever another slave connects or disconnects (I think -- I am not very good at reading the source). Yet on my 5.0.40 setup, I have the following replication topology: portland
=> fries
=> nepal And on these servers, I see the following: on portland: +-----------+--------+------+-------------------+-----------+ | Server_id | Host | Port | Rpl_recovery_rank | Master_id | +-----------+--------+------+-------------------+-----------+ on fries: +-----------+----------+------+-------------------+-----------+ on fresno: +-----------+----------+------+-------------------+-----------+ on nepal: +-----------+----------+------+-------------------+-----------+ I'm sure you have guessed some of these servers have swapped roles at various times. For example, portland used to be a slave of usa, which it replaced (after an OS rebuild) and which is no longer in use. Likewise, I think nepal used to be a slave of portland, a very long time ago -- probably six months ago. But all of these servers have surely been restarted, if not given a new OS, during the swapping. Why the obsolete entry for portland (currently server_id 21) on nepal? What should this command really show in my setup? Should each of the four machines show the same thing? (I think they are meant to) Should a server unregister itself when it is stopped, and is the old entry for portland on nepal therefore a bug? Finally, a question on the code itself, from the file linked above: 00473 Asks the master for the list of its other connected slaves. 00474 This is for failsafe replication: 00475 in order for failsafe replication to work, the servers involved in 00476 replication must know of each other. We accomplish this by having each 00477 slave report to the master how to reach it, and on connection, each 00478 slave receives information about where the other slaves are. Shouldn't each slave also receive information about the other slaves whenever a new slave connects? I realize this becomes one of those O(n(n-1)) kinds of problems but it seems like the only way to get correct behavior -- unless only one server (the topmost master in the replication tree) ever stores any information about which slaves are connected. But then I imagine this isn't exactly failsafe. Thanks for reading my disjointed thoughts and questions! Baron -- Baron Schwartz http://www.xaprb.com/ -- MySQL Internals Mailing List For list archives: http://lists.mysql.com/internals To unsubscribe: http://lists.mysql.com/internals?unsub=lists@pantek.comReceived on Tue Jul 17 09:51:49 2007 This archive was generated by hypermail 2.1.8 : Thu Aug 09 2007 - 19:06:17 EDT |
||||||||||
|
|||||||||||