Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: Synchronize bayes databases

From: <salist(at)nationalnet.com>
Date: Sun Oct 28 2007 - 18:10:38 EDT


On Sun, 28 Oct 2007 22:51:51 +0100
Lukas Garberg <lukas@spritelink.net> wrote:

> Dear list,
>
> I'm developing a spam filter solution where we'll distribute the load
> between a number of machines running SpamAssassin (together with
> MailScanner and postfix).
> We do currently use the bayes self learning feature, and would like to
> do so in the future as well.
>
> However, since the machines get different sets of mail fed to them,
> their bayes databases will differ quite a bit, and it would be great if
> all the self-learned tokens from all servers get distributed to all the
> others, as well as the manual learning.
>
> Which is the preferred way to synchronize the databases between
> the servers?

I would recommend using the SQL plugin to have a global database shared between your machines. That would eliminate the need to synchronize the bayes between them.
>
> I did consider the alternative to let all the servers use a common
> database server, and use the bayes SQL storage module but I'd like to
> avoid the single point of failure that solution comes with.

You could always use MySQL replication. You can find several how-tos on setting up master-master replication on google.
>
> To make all the servers member of a MySQL cluster is an alternative,
> but I'd like to avoid that as well to keep the complexity of the system
> low.
> Is it possible to simply sum the token counters from each of the servers
> to merge the databases?
>
> Thank you in advance,
> Lukas Garberg

Thanks,
Majied Najjar

Received on Sun Oct 28 18:11:27 2007

This archive was generated by hypermail 2.1.8 : Mon Jul 07 2008 - 23:05:10 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library