Thursday, August 14, 2003

Software I Want

I get a bunch of e-newsletters from Infoworld. As a rule they are fairly interesting, and actually replicate so much of the content of the actual magazine that I rarely open the paper version any more. In the last couple of weeks they have started to roll over their newsletter source addresses from (or something like that ) to This, combined with a set of new correspondents, has had me recreating a bunch of e-mail filters in an attempt to guide e-mail to its final location in my system.

I shouldn't have to do this, or at least I shouldn't have to do more than a little training. At the moment I am using PopFile as an e-mail proxy that performs initial bucketing classification for me. I divide my mail into 11 broad categories (termed buckets), and then use Eudora's filters to further subclassify the mail into the appropriate folder. I noticed that Popfile was able to successfully identify the newsletters as still coming from Infoworld (one of my buckets) despite the address change, but Eudora was discombobulated by the change and largely ignored the mail. This made me realise that I want an e-mail client with Bayesian filtering, and a filter attached to each mailbox. Then when I reclassify mail (i.e., move it from my Inbox to a particular folder) I want it to start to learn where I put things, so that it can classify things in the future. It shouldn't be so much to ask.

With PopFile, I've had 98.97% accurate filtering since June 11th, spread over 11 buckets. If I were doing simple Spam/Ham filtering instead it would probably be well over 99% accurate. Now, in Eudora I probably have over a hundred mailboxes (I'm a packrat) but probably only 20 of them get significant traffic. With a learning Bayesian system, I would expect to see rapidly increasing accuracy in classification with little or no direct action on my part. This type of advance is where the productivity savings promised by the information age start to pay off.