Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Many log errors, but no problems observed?

From: Casey Maloney Rosales Muller <mysql(at)nullterminated.com>
Date: Fri Jun 01 2007 - 11:15:43 EDT


Hi, I'm seeing some strange replication logs, wondering if anybody else has seen similar.

Master-master replication, everything was working fine for weeks until the much busier of the two DBs starting constantly giving these kinds of errors on May 29th:

Jun 1 06:50:37 ec2 mysqld[8667]: 070601 6:50:36 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log
'mysql-bin.000294' position 24674

Jun 1 06:50:37 ec2 mysqld[8667]: 070601 6:50:36 [Note] Slave: connected to master 'replication@10.0.0.5:3306',replication resumed in log 'mysql-bin.000294' at position 24674 Jun 1 06:50:37 ec2 mysqld[8667]: 070601 6:50:37 [Note] Slave: received end packet from server, apparent master shutdown: Jun 1 06:50:37 ec2 mysqld[8667]: 070601 6:50:37 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log
'mysql-bin.000294' position 25029

Jun 1 06:50:37 ec2 mysqld[8667]: 070601 6:50:37 [Note] Slave: connected to master 'replication@10.0.0.5:3306',replication resumed in log 'mysql-bin.000294' at position 25029 Jun 1 06:50:38 ec2 mysqld[8667]: 070601 6:50:38 [Note] Slave: received end packet from server, apparent master shutdown: Jun 1 06:50:38 ec2 mysqld[8667]: 070601 6:50:38 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log
'mysql-bin.000294' position 25029

Jun 1 06:50:38 ec2 mysqld[8667]: 070601 6:50:38 [Note] Slave: connected to master 'replication@10.0.0.5:3306',replication resumed in log 'mysql-bin.000294' at position 25029 Jun 1 06:50:39 ec2 mysqld[8667]: 070601 6:50:39 [Note] Slave: received end packet from server, apparent master shutdown:

As you can see, the master log position is advancing, and queries do appear to be replicating in both directions quickly (although again, the direction this slave is reading from is the one where updates are infrequent). No errors on the other DB server.

While watching SHOW SLAVE STATUS\G on the erroring server, I see these same errors showing up, and Seconds_Behind_Master is either zero, NULL, or a fairly high number (thousands of seconds at times). The seconds behind number seems to reset to zero when a new master binlog file starts being used. The Read_Master_Log_Position increases normally despite the high Seconds_Behind_Master. Slave_IO_Running is No during several of the error states, Slave_SQL_Running is always Yes.

Auto-increment offsets are being used, no changes to the configs recently.

One table has been getting large (between 1 and 2 million rows) and certain rare (once a day) operations on it can take a minute if that might contribute (but these errors are being seen constantly, so seems unlikely).

Any help would be much appreciated, is there anything else I can send that would reveal more? Config files, SHOW SLAVE STATUS output, binlog snippets?

Do you need help?X

The second user comment at
http://dev.mysql.com/doc/refman/4.1/en/slave-io-thread-states.html sounds similar, except updates are working for us, and I can't identify any failing queries in our case (also we're running 5.0.32 not 4.1):

Thanks for any help or thoughts!
Casey

-- 
MySQL Replication Mailing List
For list archives: 
http://lists.mysql.com/replication
To unsubscribe:    
http://lists.mysql.com/replication?unsub=lists@pantek.com
Received on Fri Jun 1 11:41:06 2007

This archive was generated by hypermail 2.1.8 : Fri Jun 01 2007 - 11:50:02 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library