|
|||||||||||
|
Re: A different approach to scoring spamassassin hits
From: Tom Allison <tom(at)tacocat.net>
Date: Sat Jun 30 2007 - 06:33:32 EDT On Jun 30, 2007, at 4:46 AM, John Andersen wrote: > For a purely bayesian filter this is always the case. But I have found through mailing lists and personal experience that this can be mitigated through a variety of approaches. The first approach is to impliment SA after you have trained it from some past corpus of mail you've captured. The opinion on how many you need to be effective varies from 10's to 1,000's. This is strictly a YMMV issue. Personally, I use an approach of train on error (never auto-train or train on everything but only the minimum to get right) with a result of 10 emails gets me above 90%. But my scoring is a little vague -- I use a ternary Yes, No, Maybe scoring process. If I exclude the Maybe I have 100% success in very short order. Including Maybe I have 98% success after training on ~100 messages. But the worse is over in the first day. Another method would be to simply seed the data from a SQL script to preload certain tokens and values. Kind of a "hack" in my opinion but it would be effective and any discrepancies would be quickly resolved by training. In the case of SA I would seed the rules into the tables for the simplest, yet effective results. Received on Sat Jun 30 06:34:18 2007 This archive was generated by hypermail 2.1.8 : Sat Jun 30 2007 - 06:40:02 EDT |
||||||||||
|
|||||||||||