Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: help with training bayesian filter

From: Loren Wilton <lwilton(at)earthlink.net>
Date: Thu Oct 18 2007 - 02:07:49 EDT


I think the first things I'd do would be to make some adjustments to the settings:

> bayes_auto_learn_threshold_nonspam 0.2
> bayes_min_ham_num 200

And probably leave the rest the same.

Then I'd train on 200 hams, which you can go back into history to get; your ham messages probably don't change much year to year.

Also train at least 200 spams, which should be easy. In this case though you want recent junk, not somethign from 6 months ago. If it takes 6 days until you have enough spam to trigger bayes, that's fine, just wait for it.

At that point Bayes should kick in. Now you get to the hard part. You need to watch Bayes like a hawk for a few weeks to make sure you really got it trained right! If you do this, and feed it corrections when you don't like how it scored a ham or spam, you will be fine. If you *don't* do this, you will probably end up with Bayes going odd on a tangent, and you may end up with a database that is so badly trashed you will have to throw it away and start over.

But this watching closely business and feeding in corrections to get things right should only take a few weeks at most, unless the kind of mail you get changes. I've had bayes running for years on the same database, and quite honestly I haven't had to train a message in probably a year now. I also don't run auto-learning, and it is still giving me bayes_99 on my spams and numbers around 0 to 10 on my hams. I guess that means my message types don't change much. ;-)

        Loren

  • Original Message ----- From: "sinnerman" <kris_kauper@excite.com> To: <users@spamassassin.apache.org> Sent: Wednesday, October 17, 2007 8:49 PM Subject: Re: help with training bayesian filter
Do you need help?X

>
> I'm running spamd as:
>
> spamd -d -l -u nobody --siteconfigpath=<my site config's path>
>
> My config file is:
>
> required_hits 4
> bayes_auto_learn_threshold_nonspam 1
> bayes_auto_learn_threshold_spam 8
> bayes_min_ham_num 100
> score BAYES_99 5
>
> I don't have bayes_auto_learn set explicitly, but the docs indicate that
> enabled is the default setting.
>
>
> Mr. Gus wrote:
>>
>> I have a systemwide config so I don't know from experience, but are you
>> running spamd with -x or setting the user with -u? Because if you are,
>> that
>> might be mucking you up.
>>
>> Do you have bayes_auto_learn set? That's what turns it on/off.
>>
>> --
>> Gus
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/help-with-training-bayesian-filter-tf4643977.html#a13267625
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Received on Thu Oct 18 02:08:43 2007

This archive was generated by hypermail 2.1.8 : Sat Jul 05 2008 - 19:38:17 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library