Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

Re: Usage of journal in Bayesian Filtering.

From: Matt Kettler <mkettler_sa(at)verizon.net>
Date: Thu Aug 30 2007 - 09:43:27 EDT


Srilatha wrote:
> Hi,
>
> I am trying understand the usage of journal in Bayesian Filtering.
>
> If bayes_learn_to_journal is set to 1, SA stores newly learnt tokens
> in the journal.

Correct.
>
>
> When bayesian filter is activated, while scanning a message
> SA reads tokens from BOTH 'bayes_tokens' database and 'bayes_journel'
No, it only reads bayes_tokens.

 If it read bayes_journal while scanning, it would defeat the purpose of the journal.

The journal exits to be more readily writable. This is possible only because it is rarely read from. If you read from the journal during scans, the write lock wouldn't be any more available than the write lock for the main tokens database, so you might as well use that for all your writes.

Data is merged from the journal into the tokens at regular intervals as a part of SA's automatic sync process (once a day), when you run sa-learn --sync, or sa-learn --force-expire.

This in general means data in the journal doesn't "go live" until a sync kicks off. This is why bayes_learn_to_journal defaults to 0. It improves learning performance, but also introduces a "lag" where the results don't take effect until there's a sync.
>
> While scanning a message, tokens found in bayes_tokens database are
> written to bayes_journel with modified timestamp
Correct. Timestamp updates are always written to the journal, largely because they're only relevant during expiry scans, and SA always does a sync before it scans for expiry. There's no sense holding up scanners in order to update timestamps, as it has no affect at all on the scan results, so dumping it into the journal is ideal.
>
>
> Is my understanding correct ?
> Please correct me if my understanding is wrong
Corrected where appropriate. Received on Thu Aug 30 09:48:16 2007

This archive was generated by hypermail 2.1.8 : Fri Oct 26 2007 - 03:23:51 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library