Moving my blog from posterous to Pelican

Background

I do not want to start this blog post by bashing Posterous. Posterous is a great blogging tool for quickly making blog posts. A couple of years back, some of its unique features convinced me to move my blog from wordpress to posterous. Posterous offered custom domain name for free whereas wordpress was charging for it. I really liked the email to blog post feature, although I never used it other than testing it a couple of times. Another amazing feature of posterous was detecting and making beautiful widgets for external objects like YouTube, github gist etc.

Posterous provides some nice templates but I wanted to have more control over presentation. A few days back youtube was blocked in Pakistan. Some misconfiguration caused problem in loading other google sites. This affected the sites using resources from google like javascripts for google analytics, google maps etc. Same thing happened with my site. The template, I was using was consuming some resources from google. I don’t know why but it was there and there was no way to remove it. So, the end result was a slowly loading page for Pakistani audience.

Another problem was how posterous modifies the HTML of the blog post. Again, I wanted to have more control on my blog post presentation. Inserting a table in a blog post was a trivial task. The WYSIWYG editor cannot handle table, even if it is copy pasted. I had to manually draft HTML and paste it in HTML part of the WYSIWYG editor. And it gets modified when rendered :-(

Using SSGs

The idea of SSG is amazing. Why do I need a dynamic setup for content which is hardly going to be modified in a month. I tried Jekyll & Pelican and decided to use Pelican. Why Pelican? It was because I am more biades towards Python. Jekyll is an equally good or may be better SSG.

Being a geek, I like writing in plain text editors more than WYSIWYG editors. Writing in Markdown and reStructuredText is fun. One can keep his energies focused on writing rather than formatting the content. My content is saved as content not as HTML markup. It has better revision management using git or any other version control system. This can easily be imported to any other application. The content is saved in files, not in DB. I can write offline and publish when I am online.

I have full control over the page rendered. I can design and optimize it as I want. I do not have to worry about security or scaling as all the content is purely static.

User Experience & Minimalism

I am not a UX expert but I do not want a lot of distractions in my content. Here is what I did to improve UX:

Migration

Jekyll provides a posterous importer but Pelican does not. Currently pelican provides only following imports:

For posterous I had to write my own importer which consumes Posterous API. Here is the code:

def posterous2fields(api_token, email, password):
    """Imports posterous posts"""
    import base64
    from datetime import datetime, timedelta
    import simplejson as json
    import urllib2

    def get_posterous_posts(api_token, email, password, page = 1):
        base64string = base64.encodestring('%s:%s' % (email, password)).replace('\n', '')
        url = "http://posterous.com/api/v2/users/me/sites/primary/posts?api_token=%s&page=%d" % (api_token, page)
        request = urllib2.Request(url)
        request.add_header("Authorization", "Basic %s" % base64string)
        handle = urllib2.urlopen(request)
        posts = json.loads(handle.read())
        return posts

    page = 1
    posts = get_posterous_posts(api_token, email, password, page)
    while len(posts) > 0:
        posts = get_posterous_posts(api_token, email, password, page)
        page += 1

        for post in posts:
            slug = post.get('slug')
            if not slug:
                slug = slugify(post.get('title'))
            tags = [tag.get('name') for tag in post.get('tags')]
            raw_date = post.get('display_date')
            date_object = datetime.strptime(raw_date[:-6], "%Y/%m/%d %H:%M:%S")
            offset = int(raw_date[-5:])
            delta = timedelta(hours = offset / 100)
            date_object -= delta
            date = date_object.strftime("%Y-%m-%d %H:%M")

            yield (post.get('title'), post.get('body_cleaned'), slug, date, 
                post.get('user').get('display_name'), [], tags, "html")

The above code produced pelican fields which can later be passed to fields2pelican which uses pandoc to tranform html content to markdown or reStructuredText.

Deployment

The site is deployed on heroku Cedar Stack which supports Pyhton applications. It is served from great wsgi app called ‘static‘, gunicorn and gevent.

Update: Using my own fork of static for performance tweaks.