|
|||||||||||
|
Re: Some thoughts on Baysian Setup...
From: Matus UHLAR - fantomas <uhlar(at)fantomas.sk>
Date: Fri Aug 31 2007 - 10:59:29 EDT
On 27.08.07 09:46, Chris St. Pierre wrote:
Yes, but according to what YOU mention in point 2, this may be contraproductive... > >2. Most people would consider the same emails to be SPAM. 90% of what I > Strongly disagree. Many users consider anything they don't want to be The fact that users don't differ between mail they subscribed to, may speak against personalized BAYES database. Otherwise some users will taint their database and it will become less and less effective. Of course, their reporting should go to personal bayes, not the shared one. If they have to teach the bayes database, they should teaht their own. However users should be well-informed that "report as spam" may be problematic in such ways. > >4. Site wide bayes saves disk space and more importantly it saves shared database will take less disk space (and less memory when loaded) and will probably be most of the time in memory, so it won't get loaded very often. However I don't think this will help much in efficiency... > >5. A larger database leads to more accurate baysian identification - I am someone mentioned here that the bayes poisoning is a myth... I'm not sure how much truth is in that, but my BAYES filter works well for some time... > So what's important is having a well-tuned database -- not necessarily a large well-tuned database is much better than small fine-tuned database. For much users it has to be larger, because much users get much of different e-mail. > If Joe and Jane User get different kinds of mail, disagree on what spam how can Jane get legitimate newsletter on stock tips when she didn't ask for them? How can they be legitimate if she does not want them? (provided she did what she could for not receiving them) > With a diverse user base, any sort of one-size-fits-all filtering is Yes, however the default scores for BAYES filters are not that big so shared database won't change score that much :) Also, note that one simple word will never change BAYES score that much, so I would not be that afraid that one word "viagra" would change much in final score. -- Matus UHLAR - fantomas, uhlar(at)fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Due to unexpected conditions Windows 2000 will be released in first quarter of year 1901Received on Fri Aug 31 11:05:21 2007 This archive was generated by hypermail 2.1.8 : Fri Oct 26 2007 - 11:07:29 EDT |
||||||||||
|
|||||||||||