Tuesday, July 21, 2009

Experiments with Flowing Data

I have been reading Nathan Yau's Flowing Data for some time now, and was intrigued by the announcement of Your Flowing Data, a data collection service that accepts input via Twitter. About the same time, I read Nat's blog post about personal data logging. Also, today, a post from Ed Murphy about visualization, with a back-cite to yours truly. Herewith, a report on my first experiments recording my personal data with the YFD service.

I decided to start by tracking things automatically. Traditionally, if I am required to perform some periodic action -- in this case, sending a tweet -- I will forget about it. So, I thought about what kind of data could be tracked without manual intervention, and I decided to start with email statistics: the size of my inbox and the number of unread messages in my inbox.

I decided to write some Python to get this task done. The first thing to do was figure out how to post to Twitter from Python, and that turned out to be easy: Google told me about Mike Verdone's Python Twitter Tools (PTT), which turned out to be very easy to install and use.

The next task was to access my IMAP mailbox from Python, and that turned out to be easy as well: Python has a built-in IMAP library. The size of my inbox was easy enough, but a little more research was needed to figure out the number of unread messages. It turns out there is an IMAP search term ("UNSEEN") that gave me exactly what I wanted.

Anyway, to make a not-so-long story somewhat shorter, here is the script (edited to hide personal data) that I ended up running twice a day:

#!/usr/bin/python

import twitter, getpass, imaplib, string

# set some options
options = {}
options['imapserver'] = 'my IMAP server'
options['imapuser'] = 'my IMAP username'
options['twitteruser'] = 'my Twitter username'
options['imappass'] = 'my IMAP password'
options['twitterpass'] = 'my Twitter password'

# in case you don't want to hard code your passwords in the script
if options['imappass'] == None or options['imappass'] == '':
options['imappass'] = getpass.getpass('IMAP password:')

if options['twitterpass'] == None or options['twitterpass'] == '':
options['twitterpass'] = getpass.getpass('Twitter password:')

# connect to mail server and get the data
imap = imaplib.IMAP4_SSL(options['imapserver'])
imap.login(options['imapuser'],options['imappass'])
inbox = imap.select()[1][0]
unread = len(string.split(imap.search(None, 'UNSEEN')[1][0]))
imap.close()
imap.logout()

# connect to twitter and post a message to @yfd
api = twitter.Api(options['twitteruser'],options['twitterpass'])
status = api.PostDirectMessage('yfd','inbox ' + inbox)
status = api.PostDirectMessage('yfd','unread %d' % unread)

Obviously there are some changes I could (and indeed, should) make, such as reading the options from command line or file, but this is the basic script, implemented quickly for personal use.

In the next day or two I will post about my second experiment, using Google Latitude to track my location data.

No comments: