Live From Lecture

Having EE2008 Day Structure and Algoristmas now… Currently the lecturer is going through the divide and conquer method which is the method supposed to be used for merging and sorting arrays. Which I did go through back when I was still in JC taking computing.

Before this was EE2006 Engineering Mathematics I. We went through this cool thing called Bayes’ Theorem and its use in Probability and Statistics. The cool thing is that Bayes’ Theorem is the exact same thing being used in POPFile, the program I use together with Mozilla Thunderbird to sort out which emails are supposed to be personal, blog comments, spam, from Friendster, NTU Stuff, subscribed groups, and subscribed emails.

POPFile analyses incoming emails for the words contained and assigns a category for the emails. It can be trained by telling it which category an email is supposed to go to if it classifies wrongly, and it will learn and adjust the probability numbers of the words contained in its database in such a way that the next time it encounters the same words, it will put the email in the correct category. Not just some words, but the entire email, including links.

Of course there will be mistakes from time to time and the initial classification was pretty much inaccurate, but after sorting about 15,255 emails (I didn’t know I’ve had that many emails!) it has achieved an accuracy rating of 97.25%, which is impressive because I’ve set up 8 different possible categories for classification.

If you use Mozilla Thunderbird or MS Outlook and has been plagued by spam emails, or if you want a program to automatically categorise your emails, check out POPFile. You’re gonna love it.

Bored,
bcc

7 Responses to “Live From Lecture”

  1. nxva Says:

    Actually, if you’re plagued by spam, then you might want to seriously consider SpamAssassin. I’m actually thinking of getting SA to filter out spam before passing the result to POPFile for general classification. Alas, I don’t have spam mails in any of my addresses (apart from the occasional ones going to my uni address). ;)

  2. nicole Says:

    hmm….next time…pay more attention to ur work lolx….

    and hopes u are feeling ok…u know wat i mean =)

  3. bcc Says:

    nxva: Doesn’t SpamAssassin work the same way as POPFile?

    nicole: LOL. Can’t help it… Otherwise I’d fall asleep. >.<

    Yep, I’m OK… You take care k?

  4. nxva Says:

    Nope. POPFile is a program to classify/sort your mails. For all it’s concerned, a “spam” bucket is no different from a “hummingbird” bucket. It simply learns that a mail with certain words is more likely to be in one bucket than the other.

    SA is a program specifically designed to detect spam. It uses more advanced techniques, such as IP identification, regular expression parser, HTML percentage, and even Bayesian filtering (like POPFile).

  5. nicole Says:

    glad to hear that… i will..but i gonna stuggle…but hey…my life has been hit hard since sec 1…i doubt u gone thru wat i have been… its hard to kick them all aside…

  6. nicole Says:

    blog 3 times… again lolx.. =)

  7. bcc Says:

    nxva: Ah, Ok… I haven’t had any problems with POPFile though so I guess I’ll just stick with it for the moment. ;)
    nicole: I’m sure you will, like you said yourself, take baby steps, slowly but surely. We’ll all be right behind you. :)