|
|||||||||||
|
Re: sender name same as recipient name
From: John D. Hardin <jhardin(at)impsec.org>
Date: Wed Sep 26 2007 - 15:10:50 EDT
> I have decided to restart this whole process... setting the bayes It's not generally a good idea to use *somebody else's* data for your starter DB - the nature of their email traffic is not likely to be similar to yours. This is why it's a good idea to keep the messages you use to train your bayes, if you're doing manual training - so that you can correct training errors, and retrain from scratch if necessary. Of course, that doesn't scale too well if you have large numbers of users and are autolearning... If your users retrieve their email from your server using IMAP, here's one thing you can do: set up a SpamAssassin-SPAM and SpamAssassin-HAM mail folder in each user's mailbox. Have them move missed spams to the SpamAssassin-SPAM folder, and *copy* false positives (SA says it's spam when it isn't) to the SpamAssassin-HAM folder. They can (and ideally *should*) also copy some legitimate messages to their SpamAssassin-HAM folder so that SA can get an idea of what "ham" looks like. You can then train off those folders, and retrain as needed. To manage the training work, you can rotate those files on a schedule - e.g. on October 1, everybody's SpamAssassin-HAM becomes SpamAssassin-HAM-200709, etc. I have some scripting for that sort of thing here: http://www.impsec.org/~jhardin/antispam/ -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- Pelley: Will you pledge not to test a nuclear weapon? Ahmadeinejad: CIA! Secret prison in Europe! Abu Ghraib! -- Mahmoud Ahmadeinejad clumsily dodges a question (60 minutes interview, 9/20/2007) ----------------------------------------------------------------------- 242 days until the Mars Phoenix lander arrives at MarsReceived on Wed Sep 26 15:11:45 2007 This archive was generated by hypermail 2.1.8 : Sat Oct 27 2007 - 15:22:51 EDT |
||||||||||
|
|||||||||||