Archive

Posts Tagged ‘Email’

Mail Handling

May 28th, 2005 No comments

One of the other things that’s changed quite dramatically in the past three months is the way in which I deal with my mail. I’ve been through a number of different mail clients and setups in the past, and have never quite found any approach that quite fits my needs. I was a long term fan of MH on unix, mostly because there was no real client getting in the way. Everything was just simple files on disk with a series of commands to allowed you to slice and dice them in interesting ways. (Actually it was really only one big command that was hard-linked to be called as different commands, changing its behaviour depending on the name it was called as!)

This meant you could chain your mail commands into a normal unix pipeline, allowing you to manipulate them in simple ways – as long as you’re comfortable with the unix command line, of course! And if MH didn’t offer something you needed, you could just write your own as simple shell or perl scripts. One of my first perl scripts was a recursive grep through my mail archives.

Eventually I was persuaded to move to mutt, the one true unix mail client, and it has served me well for past 7 or 8 years. Its scoring and colour coding became very useful during the years I started drowning in spam, and once I got spam detection tools working, the ability to set up keybindings to integrate mutt with those made my life much easier.

I’ve had a rather strange mutt set-up though. I never really mastered procmail, or Mail::Audit, or equivalent, so rather than filtering on ‘From:’ lines, I’ve always filtered on ‘To’ lines. (Having my own domain means I can have an infinite number of email addresses). Each of the many mailing lists I subscribe to, and each commercial site I have to sign up for, gets a unique address, which then get gatewayed to their own folders through a series of .forw.d files. I then have a little script triggered from my mutt startup, to inform mutt of their existence. I then read ‘groups’ of mail in turn, rather than having them all mixed up in one big box.

I’ve experimented with other mail setups from time to time, and have been using Thunderbird via IMAP as by secondary mail reader for a while – particularly for my commercial mail folders which tend to mostly be in HTML these days (and, though some will be horrified to hear me say it, usually better for it), and for dealing with attachments.

But two things have changed in my mail set-up recently, and it’s making me rethink the whole approach again.

Firstly, I’ve switched my spam detection to the new service we’ve set up in UNITE. This is proving much more accurate than bogofilter (which had gotten stuck around 85-90%), and is currently running at 98.8% on the mail that arrives in my box (and a much greater amount of spam is also stopped upstream, never even reaching me). It also operates a web-based quarantine, so I get almost no spam arriving in my inbox. All my fancy mutt bindings to tag, report, and delete spam are to very little avail any more, and with a much clearer inbox, my scoring and colour coding loses a lot of its value.

Secondly, I’ve moved almost all my mailing lists and commercial email to Gmail. As a general mail client, Gmail is disappointing. I don’t think I could use it as my primary tool. But its ‘conversation’ approach works surprisingly well for mailing lists. With judicious use of labels and filters, I can not only direct each list to a different “folder”, but actually allow multiple messages to be in multiple ‘folders’ (so that, for example, Class::DBI discussion on the Maypole mailing list can also appear in my CDBI list, along with any CPAN uploads with Class::DBI in their name). This also allows me to stay on some of the mailing lists I was going to unsubscribe from. There are a variety of lists I subscribe to but rarely read (although I still occasionally skim the threaded subject lines for the occasional post that looks interesting). These are usually related to technologies that I use from time to time, and whose lists don’t have good archives. Often, when I have a problem, I can then just search through my own archive of the list. Gmail, of course, makes this even easier than before.

With these two changes, I now have a situation that I haven’t seen for close to 15 years, where almost all the email arriving on my desktop is either personal or work-related, and almost all requires some action! This theoretically enables me to massively simplify my mail set-up (and then probably add a new layer of complexity in a different way!)

I’m currently thinking of reverting to a single inbox, and doing all my filtering after reading rather than before, probably to a variety of categorised ‘TODO’ folders, leaving my inbox as a real inbox, ideally emptying it every time I access it.

The fact that it’s taken me 15 years to get close to the position that the majority of email users have as their normal set-up probably says something interesting. I just don’t know what that is yet.

Back Again

November 10th, 2004 No comments

It’s certainly been an eventful 6 months since I last posted.

Back in May we took over Ireland’s oldest ISP and have spent the last 6 months turning it around. They seemed to have an interesting business model whereby they would take all the revenue, give 50% of it to suppliers, 50% of it staff, and spend the other 50% on overheads.

I can’t say much more about it all here yet, as there are still six ongoing legal cases, but I’m sure I’ll get to tell the stories some day. You really won’t believe some of them (like the story of the directors who locked themselves in their office and refused to talk to us…)

Conference Season was interesting this year, whilst all this was going on. We ended up having to skip Oscon after O’Reilly messed up our tutorials, but I got to go to FoafCamp and FooCamp in Amsterdam, the FOAF Workshop in Galway, Web 2.0 in SF, and of course we were hosting YAPC::Europe this year. I’m sure I’ll get to talk more about those later.

The takeover also meant we had to put most of our ongoing projects on hold for a while. Simon and Marc have both moved on to other things, and Marty, Karen, and I have been working full-time at UNITE.

We said we’d put pretty much everything else on hold for six months, and so it’s time to start digging some of those out again. Everything moves so fast that we’ve had to rethink some of them significantly. Twingle, of course, now has Gmail to contend with. That doesn’t worry us too much though, as we believe that Twingle’s value is a lot more than just search. More on that later too.

This blog is probably going to be different this time around too. I used to use it as a place to store interesting things I came across. I’ve now switched to using del.icio.us for that, so this will be much more about what I’m doing. Hopefully that won’t be too boring for everyone else.

A Month of Bogofilter

January 12th, 2004 No comments

I’ve been using bogofilter against my spam for about a month now, and the results are looking good.

It’s catching a much higher percentage of my spam than SpamAssassin was, and I’ve only had one false positive. Although any amount of false positives is a major problem, this doesn’t concern me for two reasons.

Firstly, SpamAssassin was giving me at least one false positive every couple of days. I get a lot of solicited commercial emails, including quite a lot of financial related news. SpamAssassin had a nasty habit of assuming that things that talked about mortgages etc was spam. I trained bogofilter specifically against archives of this mail, and so far it hasn’t marked any as spam.

Secondly, the one false positive was rather an odd case. About a year ago I released a perl module, Games::Boggle that finds words on a Boggle board.

Recently I received email from a user who was having difficulty getting a script using it to work. With this script he included the entire dictionary file he was running the script against!

According to the theory of the pseudo-Bayesian Spam Filtering the spam detector should only pay attention to the 10 (or whatever) most significantly ham or spam words in your message (which is why the new wave of “include random phrases” or “include a chapter of a book” emails aren’t really causing me any difficulty). However, if there more than 10 “definitely spam” and more than 10 “definitely ham” words, I’m guessing they don’t cope very well…

Tags:

Spammy Words

November 28th, 2003 No comments

Today I trained Bogofilter on 85,000 spams (and an equally large number of normal messages).

A little playing with the resulting database reveals that my largest spam giveaway words are (not counting ‘header’ words, such as spamtrap addresses, and words added by previous Spam-Assassin rewriting):

  1. devnull
  2. dotted-decimal
  3. Impotence
  4. Enlargement
  5. zhtclxqx
  6. Medications
  7. Phentermine
  8. remove.html
  9. remove.php
  10. Soma
  11. eGroups
  12. Guaranteed!
  13. out.html
  14. windows-1251
  15. optout.html
  16. Citrate
  17. Sildenafil
  18. Ultram
  19. erections
  20. Adipex
Tags:

mutt tip of the day

August 28th, 2003 No comments

I’m an email packrat. I usually archive every email I receive, and as a mutt user, tend to store them, by default, into whatever folder mutt assigns to them. This is fine for most emails from humans, or to recognised mailing lists, as it’ll generally do the right thing.

But I also get lots of emails from companies I shop from on-line, daily “info” lists that I’m on, ebay and half.com watchlist updates etc. In most of these cases I waste a lot of time having to override mutt’s defaults, as it usually wants to save to =info or =news or =updates or somesuch, rather than =amazon, or =ebay. Of course it never seems like a lot of time – it’s actually quite quick to type ‘s=ebay’. But it’s even quicker to type ‘s<return>’, and with the volume of email I get, that adds up.

I always had a niggling suspicion that there would be a better way, and today I finally got fed up enough to find out. The on-line manual is definitive, but mostly useless for finding answers to things like this. So instead I did what I usually do in cases like this – ask Marty. He introduced me to the wonderful concept of the save-hook.

Now I can create entries like: save-hook '.*.ebay.co.uk' =ebay and have my ‘s<return>’ just Do The Right Thing.

Tags:

Spam Growth Slowing

July 15th, 2003 No comments

Well, I didn’t manage to hit 20,000 spams in June. The final figure was just over 18,000. This means that my month-on-month growth was only 17% (21% if adjusted for length of month) – a noticeable slowdown from previous months.

However, there are two notable things happening. Firstly, the number of bounces I’m getting when people fake spam from one of my domains is rising significantly (although as I don’t file bounces as spam yet, I don’t have figures). If I counted that in, I’d probably be closer to 25,000 now.

Secondly, SpamAssassin’s ‘catch-rate’ is slipping again. I don’t have exact figures yet (although I should probably put together a script to count this), but I find myself spending more and more time filing things manually as spam that SpamAssassin didn’t catch. Time to flesh out my spam catching strategy a bit more…

Tags:

Spam Growth

June 1st, 2003 No comments

May was my first 15,000+ spams month. I actually averaged 500 spams a day in the month. SpamAssassin is catching about two-thirds of them, so they don’t take up too much of my time, but I’m still having to flag a lot by hand.

However, the rate of growth has slowed slightly. The growth from March to April was over 41%, whereas April to May was only 37%. And if I adjust for the fact that April was a shorter month, it’s even more noticable – 45% to 33%.

Will be interesting to see if I hit 20,000 spams in June.

Tags:

Parliamentary Debate

May 23rd, 2003 No comments

Lord Sainsbury of Turville: My Lords, I am delighted that the noble Lord has asked me a Question about corned beef cans. I have been answering questions about them all my life and I regard them as one of my real areas of expertise.

Baroness Oppenheim-Barnes: My Lords, does the Minister agree, as the noble Baroness has demonstrated, that most home accidents are avoidable, arising out of carelessness, and that therefore paying attention is one of the best cures?

Lord Sainsbury of Turville: There are an estimated 55 accidents a year from putty, while toothpaste accounts for 73. I agree with the noble Baroness that it would be helpful if people paid careful attention.

Baroness Strange: My Lords, does the Minister agree that sardine tins and anchovy tins are also very difficult to open with their tin-openers?

Lord Mitchell asked Her Majesty’s Government: What are their plans to reduce the growth in spam?

Lord Sainsbury of Turville: My Lords, I hope noble Lords will appreciate how I move seamlessly from corned beef to spam.

Lord Renton: My Lords, will the Minister explain how it is that an inedible tinned food that lasted for ever and was supplied to those on active service can become an unsolicited e-mail, bearing in mind that some of us wish to be protected from having an e-mail?

Lord Faulkner of Worcester: My Lords, I can help the Minister with the origin of the word. It comes from aficionados of Monty Python, and the famous song, “Spam, spam, spam, spam”. It has been picked up by the Internet community and is used as a description of rubbish on the Internet. More seriously, is the Minister aware that up to 85,000 pieces of unsolicited e-mail are received by the Parliamentary Communications Directorate each month?

Hansard text for 6 May

Tags:

Spam Spam Spam

March 22nd, 2003 No comments

Kasia decided to investigate her highest score from Spamassassin.

41.90 points is fairly impressive, but I’ve got 22 mails with higher scores than that in the last month!

My current highest is 56.8:

Content analysis details:   (56.80 points, 5 required)
RCVD_FAKE_HELO_DOTCOM (2.3 points)  Received contains a faked HELO hostname
NO_REAL_NAME       (0.7 points)  From: does not include a real name
SUBJ_HAS_SPACES    (2.0 points)  Subject contains lots of white space
AS_SEEN_ON         (2.1 points)  BODY: As seen on national TV!
ONLY_COST          (0.2 points)  BODY: Only $$$
MLM                (0.8 points)  BODY: Multi Level Marketing mentioned
BANG_GUARANTEE     (0.5 points)  BODY: Something is emphatically guaranteed
EARN_MONEY         (1.2 points)  BODY: Message talks about earning money
EXCUSE_14          (0.1 points)  BODY: Tells you how to stop further spam
JODY               (2.9 points)  BODY: Contains "My wife, Jody" testimonial
OPT_IN             (0.3 points)  BODY: Talks about opting in (lowercase version)
BANG_MONEY         (0.7 points)  BODY: Talks about money with an exclamation!
BULK_EMAIL         (2.1 points)  BODY: Talks about bulk email
ORDER_REPORT       (2.9 points)  BODY: Order a report from someone
SENT_IN_COMPLIANCE (4.3 points)  BODY: Claims compliance with spam regulations
READ_TO_END        (2.9 points)  BODY: You'd better read all of this spam!
FINANCIAL          (4.3 points)  BODY: Financial Freedom
SECTION_301        (3.2 points)  BODY: Claims compliance with spam regulations
INVALUABLE_MARKETING (2.9 points)  BODY: Invaluable marketing information
NOT_INTENDED       (2.9 points)  BODY: Not intended for residents of somewhere or other
RISK_FREE          (1.0 points)  BODY: Risk free.  Suuurreeee....
COPY_ACCURATELY    (2.9 points)  BODY: Common pyramid scheme phrase (1)
INITIAL_INVEST     (2.9 points)  BODY: Requires Initial Investment
SERIOUS_CASH       (2.7 points)  BODY: Serious cash
MSGID_OUTLOOK_TIME (4.4 points)  Message-Id is fake (in Outlook Express format)
SUBJ_HAS_UNIQ_ID   (0.8 points)  Subject contains a unique ID
DATE_IN_FUTURE_12_24 (2.8 points)  Date: is 12 to 24 hours after Received: date
CASHCASHCASH       (0.0 points)  Contains at least 3 dollar signs in a row
Tags:

SpamAssassin Upgrade

March 3rd, 2003 No comments

We waited until the debian package for the new SpamAssassin was out before upgrading, and I’ve now had a weekend to play with it.

Other than the fact that the previous version was starting to get far too many false negatives, the main reason I wanted to upgrade was the fact that SA no longer broke all the attachments. In the previous version any ham marked as spam was pretty much ruined as all the attachments were inlined into the main body. Now they stay as attachments.

As SA tended to occassionally mark commercial emails that I had actually signed up for as spam (such as the ebay auction watch mails) this was rather irritating.

[I discovered a few days ago that I can edit the mail and do ':%!spamassassin -d' to remove the mark-up but that's still annoying]

The main issue now is that SA no longer marks the Subject line with a **** SPAM *** header. In general this is a good thing, but it also makes my deletion strategy more difficult. Previously I using mutt’s scoring rules to automatically decrese the score of all mail so marked, and have a rule that marked anything below a score threshhold as ready to be deleted as soon as I quit the folder.

Now I’ve had to bind a function key to a macro to do this as you can’t score on just any random header.

But for now I’ve decided to make that macro move the spam to a different folder rather than delete it, so I can play about with training the new baysianesque filter.

Tags: