|
|||||||||||
|
Re: why not doing a test that checks "name"-<email address> pairs
From: Chip M. <sa_chip(at)IowaHoneypot.com>
Date: Sat Aug 18 2007 - 14:41:04 EDT
I've implemented this as part of a qmail filter that runs after SA. As I've mentioned in other posts, I'm in a shared web hosting environment, and have no control over SA, so designed my filter to complement the great strengths of SA, and fill in the holes that are created by a limited environment. Just over twenty domains use my filter, and we all share data, so as to improve everyone's killrates. I have no idea how practical this would be as an SA plugin, and am Pearl-illiterate, so I merely describe how I have approached it. More than a year ago, I started using _VERY_ crude general header based (To/Cc checking) real name "pass" rules, then in March of 2007 I added an explicit "RealName" virtual header so as to allow more powerful rules, including "match not" type penalty rules.
It took me much less than five minutes to generate such a data list AND all matching rules for the last person to join my Team (18 accounts, one week of data), and my tool merely dumps the per account RealNames with frequencies. A slicker tool could make this VERY practical for larger userbases. Maintenance and verification would probably be an utter pain for anything in the 1000s, so best to let us small and nimble types prove its efficacy. :) There is anecdotal evidence that Hotmail may be doing something with real name based rules, granted, there's reports that it's a somewhat sub optimal implementation. I speculate that they could easily pull the real name straight out of each user's settings.
It's probably easier to understand how these work with a sample, so let's say we have a user whose account is "jcobb@firefly.example.com", the real name in his email client is "Jayne Cobb", and an automatic real name collector has shown that occasionally he receives important email that uses the real name "Hero of Canton". Somewhere, we would construct two data lists specific to his account, that would look something like this: realname_full = jayne cobb, hero of canton realname_words = jayne, cobb, hero, canton The generic real name "match" test would only trigger if the extracted real name exactly (case in-sensitive) matched either "jayne cobb" or "hero of canton", and the "match not" test would only trigger if NONE of the four words "jayne, cobb, hero, canton" was found anywhere in the real name. It's feasible to do "soft" matching, instead of word boundary based matching (my code allows either). Here's some examples: jcobb@firefly.example.com "Jayne Cobb" jcobb@firefly.example.com "Jayen Cobb" jcobb@firefly.example.com "Peter Petrelli" jcobb@firefly.example.comThe first triggers an "empty" test, but none of the other types of tests. The second triggers an exact "match" pass rule. The third has a misspelling so it fails an exact "match" pass rule, AND it also fails a "match not" penalty rule because one of the words ("Cobb") does match. In other words, it receives ZERO total real name points. The fourth triggers a "match not" penalty rule, because NO words match. By using a LIST of acceptable individual words in the "match not" rule, there's no need to mess about with fuzzy matching. It is still possible for a fuzzy misfire to occur, however so far I have not seen any actual FPs caused by them (in more than half a million human+machine reviewed emails). Our only FPs have contained word that were widely off, so fuzzy matching would have made no difference. As always, careful scoring is appropriate, and your mileage may vary. A fuzzy matching option might be more suited to a later version of a plugin.
I score the "match" rule between -0.51 and -4.59, depending on whether the real name has been compromised (one of our users gets a lot of ED spam sent from Russia with his correct real name), and whether that person has critical "pass" needs. I have found it to be an EXCELLENT means of preventing FPs, particularly during times when I'm tinkering with stuff to fight an emerging threat, and make a dumb mistake. :) I score the "match not" rules typically in the 1.02 to 3.06 range (default of 2.60). FPs have been extremely low, with most being unimportant bulk/junk type mail. One weakness in my own filter is the lack of metas. If an SA Real Name plugin were developed, it would be more powerful, since it could be used to reject specific attachment types that also triggered a "match not" test. That level of control is more suited to a small business, but it sure is nice to have. :)
I have no feel for the SA system performance issues. In my case, I do all the "simple" (fast) tests first, then exit if the score is high enough, and only then do DNS tests. My general impression is that my overall performance is higher, because on average these tests avoid more tests than the time they consume. Bottom line, I think these can be very effective for a smallish environment. Granted, I really need to write some code to extract precise stats. I am confident of the beneficial effect on FPs, because I check ALL of those by hand.
This archive was generated by hypermail 2.1.8 : Wed Oct 24 2007 - 19:13:06 EDT |
||||||||||
|
|||||||||||