Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: bayes_seen = 256GB

From: Magnus Holmgren <holmgren(at)lysator.liu.se>
Date: Sun Sep 23 2007 - 17:24:01 EDT


On Thursday 20 September 2007 07:59, Graham Murray wrote:
> "Loren Wilton" <lwilton@earthlink.net> writes:
> > If tokens are expired from the DB based on time, and assuming *all*
> > tokens older than some date are expired, wouldn't it be reasonable to
> > prune bayes_seen to the expiry date after the expiry run?
>
> You cannot assume that all tokens earlier than some date have expired. A
> token (in bayes_token) is only expired when its last occurrence in an
> email was before the expiry interval. So it is perfectly possible for a
> token from the very first email ever learnt to still be in bayes years
> later.

It doesn't really matter whether the tokens have expired, I think. You probably don't want to relearn an old message anyway.

The Bayes system can record the message date (e.g. from the top Received: field), expire messages older than a certain age, and refuse to learn older messages, unless explicitly overridden (for example when populating a clean bayes DB with an initial corpus).

-- 
Magnus Holmgren        holmgren@lysator.liu.se
                       (No Cc of list mail needed, thanks)
  • application/pgp-signature attachment: stored
Received on Sun Sep 23 17:31:29 2007

This archive was generated by hypermail 2.1.8 : Sat Oct 27 2007 - 11:15:59 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library