|
|||||||||||
|
Re: two supposedly identical SA boxes, with slightly different report output -- help find the diff?
From: Andy Dills <andy(at)xecu.net>
Date: Tue Aug 28 2007 - 21:59:35 EDT
> aha! ... > For what it's worth, the fuzzyocr hashing is of very limited value, and in many cases is a severe performance hit. I found that scanning the hashes, due to the "fuzzy" nature, is more costly than just rescanning the file with OCR, as *each* *and* *every* hash must be checked iteratively. Because of the "fuzzy" nature, you can't just check the db to "see if this hash exists." You have to go through and compare the generated hash to every hash in the db, and it considers it a match if it's "close enough". It's severely less computationally expensive to just rescan the damn image. It won't matter if you only get a couple hundered emails per day, but once the number of stored hashes reaches a reasonably low number, it becomes faster to rescan the image than to go through every single stored hash to see if you've already scanned a similar image. Andy --- Andy Dills Xecunet, Inc. www.xecu.net 301-682-9972 ---Received on Tue Aug 28 22:00:16 2007 This archive was generated by hypermail 2.1.8 : Fri Oct 26 2007 - 03:04:44 EDT |
||||||||||
|
|||||||||||