Richard Hulse: 2009

Friday, October 9, 2009

How to Remember CSS Shorcut Order

I have always had trouble remembering the correct order for CSS shortcuts like this:

margin: 3px 4px 2px 3px;

The order is Top, Right, Bottom, Left.

I have just found two ways to remember this.

The first is using the word trouble:

T R o u B L e

The consonants give the order.

The second (pointed out by a colleague this morning) is using a clockface starting at the top and going clockwise.

12 is at the Top
3 is on the Right side
6 is at the Bottom
9 is on the Left.

Thursday, August 13, 2009

Speeding up content delivery on the Radio NZ website

We've made a few changes to the way content is served on radionz.co.nz

1. The first is that all static content is now served from a separate domain (static.radionz.net.nz). We are using Nginx for this task because it is faster and more light-weight than Apache.

One benefit is that the browser can make more connections to the site for downloading page elements in parallel.

The second reason we used a completely different domain (instead of a sub-domain such as static.radionz.co.nz) is that it is cookie-less.

The browser does not send any cookies with requests for this static content. If it was on a sub-domain, then any cookies for the domain would be sent with each request. In the case of our home page, this added about 6k to every page request.

One of the NZ sites I tested while working on this has 65K of cookies sent for static assets on the home page. Someone has to pay for that wasted bandwidth.

2. The expiry date of static content is set one year in the future, and assets are marked as Cache-Control: Public.

This tells any application that the content passes through (including the end-user's browser) that the content can be cached, and that it won't expire for a year.

Intermediate caches (like the ones most ISPs have) are more likely to keep a copy of the assets and pass them on instead coming back to our servers. The browser will also check the content in its local cache and only fetch items that have expired.

The net effect is fewer requests for static assets. The makes for less bandwidth used by the client, and less bandwidth paid for by us.

Initial tests show that browsing between two recently visited pages (or pages that are cached locally) is about 100% faster than before the changes. Pages that are not already cached are about 15% faster due to nginx serving files more efficiently and the gzipping of our HTML content. Previously we did not gzip html content because of the relatively high server load, but nginx seems to be able to do this without any impact.

In fact, the overall server load has been reduced by about 10%.

For those who are interested, browse around the site with the Firebug Net tab open and watch how the content loads. You'll notice two things. The first is that subsequent visits to the page only get the main page and the two analytic scripts. The second is that the page fully renders quickly, even though the browser is still getting content. This was achieved by careful placement of the CSS and JS files in the HTML markup.

The site now gets a YSlow 'A' rating (excluding the CDN test, which is not relevant to us).

Thursday, July 9, 2009

Browser and OS trends visitor trends at Radio NZ

Yesterday I took a look back at browser and OS usage changes over the last couple of years to evaluate any useful trends.

Here are the browser stats from the last month, and the same period 1 & 2 year's ago:

Type	2007	2008	2009
IE	70%	65%	58%
Firefox	22%	27%	27%
Safari	4%	5.4%	8.6%
Chrome	-	-	3.1%
Opera	.93%	1%	.96%

IE version 5.x is virtually extinct.

Looking at the underlying data I'd draw the following conclusions:

IE use is on the decline generally, and IE users are the slowest at upgrading
Firefox usage appears to have plateaued (and they are the fastest to upgrade)
Most growth is in new browser entrants.
There is greater market fragmentation (more choice for consumers)

There are also changes in Operating Systems (using the same periods):

Type	2007	2008	2009
Windows	91.8%	90.4%	86.6%
Macintosh	7%	8%	10.7%
GNU/Linux	.98%	1.3%	1.83%
iPhone	-	.06%	0.33%

Around 73% of Windows users still use XP, and only 56% of Mac users are on OSX 10.5. The lower rate of Mac upgrades could limited by hardware restrictions.

For many website developers these figures will represent a significant challenge - you can no longer design your website for any one browser or OS. The days of 'best viewed in browser X' are gone.

And yes, I still see sites that only work in IE. I went to get an on-line quote for something last week, and the site simply would not work in Firefox or Safari. I took my business elsewhere. The average punter isn't going to know why - they'll just think the site doesn't work. Is this the branding message you want to send to visitors?

Based on these stats, failing to design cross-platform websites will give at least 15% of your users an inferior experience. That is a lot of lost traffic (and business).

Saturday, July 4, 2009

Rails Active Directory Authentication

Radio New Zealand has released a rails plugin to allows user to authenticating to Active Directory from a Rails Application.

We use it with the restful authentication plugin to allow a single set of credentials to be used for each person, regardless of the application.

Michael Koziarski wrote the plugin, and we (RNZ) are releasing it under the MIT license.

The source is available on GitHub.

Friday, July 3, 2009

Deploying Acts as Solr on Tomcat with Rails

At Radio NZ we have a number of small Rails applications that need really good search. In several cases we want field-based searching.

e.g. title:"A Blog Post"

to look only in the field 'title' for "A Blog Post".

Apache Solr is:

"an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features."

Acts as Solr is plugin for Rails that provides a Rails interface to Solr. The plugin includes Solr, and provides rake tasks to start, stop and reindex the Solr index.

This works fine in development, but gets tricky in production. If you have several apps, it does not really make sense to have several instances of Solr running on the same server (and you have to change the port number for each so they don't clash). There is also the question of how to ensure Solr restarts if the server reboots.

Because of this I decided to run the production instance of Solr in a Tomcat container. Each app has there own index on the development machine, and when deployed the solr.yml file tells each app to use the single Slor/Tomcat instance instead.

We use Debian Lenny on our production server. You should read these instructions first. Twice. They work for Lenny, and only a few minor tweaks are required to adapt them for Acts as Solr.

The first thing is to use the version of Solr that comes with the plugin - it has changes in the config files that make it work correctly with the plugin. I found this out the hard way.

Deploy your Rails app first and install Tomcat as outlined above. Then do this:

sudo cp path/to/your/app/vendor/plugins/acts_as_solr/solr/webapps/solr.war /usr/local/tomcat6/webapps/solr.war

sudo cp -r path/to/your/app/vendor/plugins/acts_as_solr/solr/solr /usr/local/tomcat6/solr/

Then carry on with the rest of the recipe and you are done.

Parsing Word HTML with Ruby and Rails

I promised I would write about my first Rails project, so here goes.

Within Radio NZ we have a web application for parsing Microsoft Word documents and reformatting them for the web. Content from Word is pasted into a WYSIWYG cell, and submitted to the application. A preview of the parsed content is presented for review.

If this looks OK, then the page is submitted again and an XML document is generated from the content. This XML is sent to our Content Management System to be imported.

If the content is NOT to be imported, then it can be cut from the preview page, and pasted into the CMS WYSIWYG.

The parser ensures that the code is cleaned of extraneous markup, and validates. In the case of pre-formatted documents like our schedules, it can split a week's content into 7 day-parts, format subheadings, add emphasis to times, and add links to programme names. You can see the result here and here.

The old application was written in PHP and used a combination of regular expressions, HTML Tidy, and some stream-based parsing to work its magic.

The updated application is written in Ruby on Rails, uses the Sanitize and Hpricot Gems, and is much more modular. The reasons for the change was to make the app more maintainable - I wanted to add more parsing filters - and the PHP code was a bit of a mess.

I could have refactored the PHP version, but I needed a real project to help me learn Rails, and I suspected it would be less work anyway. Also, having testing built in has advantages when you are writing a parser.

The Rails version has the same basic workflow. Content from Word is pasted into a WYSIWYG (in this case the FCK Editor). The HTML is sent to this bit code which cleans most of the rubbish and does a few RNZ specific things like standardise time formatting.

The cleaner adds new lines after certain tags, and this is passed to a stream-based parser. That walks through the document and processes it based on the document type (set via a drop-down setting when the content was submitted).

The new version is in production now, and is more reliable than the old. This is partly because I cleaned up some of the underlying algorithms, fixed some logic bombs, and added a lot more error checking.

One important check added in this version was for smartags. This is a feature of Word the tags certain Words to give them special attributes. The problem is that when pasted and parsed they do not appear in the final document. The new parser checks for these and reminds the user to remove them first.

I really liked the Rails framework. The two best parts were:

Having sensible defaults and behaviours for things. I used to spend most of my time in PHP just configuring stuff and getting things flowing.
The second was the Ruby language (and the Rails extension to it). Just brilliant. The language design is very good with consistent interfaces and predictable syntax. It certainly made a nice change from working which PHP function to use and what order the parameters should be in.

I have also coded in Perl and C (and Assembler), and I like some of the Perlish things that have made their way into Ruby. You can use =~ to compare a string with a /regex/. Cool. Being able to write

when /regex/

inside a

case string

block. Very cool

(I still use a lot of Perl - the Radio NZ audio and text publishing engines are built with Perl).

There are some other Rails projects in the works at Radio NZ - one of them is a search tool based on Solr (an update of the BRAD application that I am working on with Marcus from AbleTech) - so expect some more Ruby and Rails posts in the near future.

Wednesday, June 10, 2009

Installing Git from source on Debian Lenny

I tend to use the most current release of Git, and had to install it on a new Lenny server today.

These are the steps required to do so:

wget http://kernel.org/pub/software/scm/git/git-1.6.3.2.tar.gz
tar -xzvf git-1.6.3.2.tar.gz
cd git-1.6.3.2.tar.gz
sudo apt-get install build-essential linux-headers-`uname -r` libssl-dev curl libcurl4-openssl-dev libexpat1-dev tcl
make prefix=/usr/local/ all
make prefix=/usr/local/ install

Done!

Saturday, May 30, 2009

Tracking forks on Github

I have been using github for a few of my projects, and occasionally have forked another project so I can add my own changes.

Today I forked a project and made some changes, and then found another fork with some code I wanted.

The project is simple-navigation - a Rails plugin to create and manage multi-layered navigation.

After creating a fork on github, I cloned my new fork locally:

$ git clone git@github.com:rhulse/simple-navigation.git

I then added my own changes and committed them.

User edk has added some breadcrumb code that works in a similar way to some standalone code I use in my projects. I want to grab this code, have a look at the differences, and merge it in to my master branch.

The first step is adding edk's repository as a remote:

$ git remote add edk git://github.com/edk/simple-navigation.git

Step two is fetching any objects in that repository that I don't have in mine:

$ git fetch edk

The output from that command looks like this:

remote: Counting objects: 25, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 17 (delta 12), reused 0 (delta 0)
Unpacking objects: 100% (17/17), done.
From git://github.com/edk/simple-navigation
* [new branch] gh-pages -> edk/gh-pages
* [new branch] master -> edk/master

It is then a simple matter to create a branch and put edk's master branch in it:

$ git checkout -b edk edk/master

The last step is to compare the changes between my master and the edk branch. At this stage it might be tempting to just type this:

$ git diff master edk

The problem is that while you'll see all the additions made by edk, you will also see changes made on the master branch as removals.

To see just edk's changes I used this command:

$ git diff master...edk

This does a diff from the most recent common ancestor. The resultant output contains only code made since the point at which edk started adding changes. (Hat tip to Scott Chacon who mentioned this in his Scotland on Rails git talk)

If I wanted to change any of the new code I could do it in the new edk branch, or make a third branch.

The last step is to merge the new code into my master branch. While on master:

$ git merge edk

When these changes are pushed back to github, the network display page will show the new activity and merges.

And before you ask, you can track as many remotes as you want.

Thursday, April 30, 2009

Installing a Rails plugin from GitHub through a firewall and proxy

I was working on a Ruby on Rails project today with Nigel from Able Technology, and we wanted to install a plugin from GitHub.

We had a few problems because of a corporate firewall and an http proxy server. Only port 80, http traffic was allowed.

To do this in OSX the following steps were necessary:

1. Set the htttp_proxy variable in the shell:


export http_proxy=http://our.proxy:our_port

if your proxy requires a username and password the command is:


export http_proxy=http://username:password@your.proxy:your_port

2. Run the import using http instead of the git protocol:


script/plugin install http://github.com/adamlogic/showoff.git

Monday, April 20, 2009

Converting English dates to Maori with Ruby

In the project I am working on at the moment (my first Rails project) I needed a quick way to convert a date set in English (with the month and day of the week in full) to Māori.

This was a pretty simple function to write, so in plenty of time for Māori Language Week (27 July - 2 August 2009) here is the Ruby function.

It turns this:

Monday 20 April 2009

into this:

Rāhina 20 Paengawhā-whā 2009

Friday, April 17, 2009

Moving from PHP to Ruby (and Rails)

For the last year or so I have been porting a legacy PHP application to the Kohana framework. This is not really a legacy app that needs replacing - the system is still highly functional and meeting the needs of staff who use it.

The main problem has been maintaining the spaghetti code that is common in PHP apps written circa 2003. It's not really that bad, its just that things have moved on, and it would be a nightmare for anyone else to make changes to the system. Actually it is getting to be a nightmare for me, mostly because adding new features requires an increasing number of ugly hacks.

The port has been an on-again, off again. I began using the CodeIgniter framework, but changed to Kohana when they added support for PHP5 and started adding useful features not in the other framework.

One of the main issues though, has been the rate of change in both projects - CodeIgniter has been slow and steady, while there has (possibly) been more innovation in Kohana.

In the case of CodeIgniter, it did not always have what I needed (and they weren't taking patches), while Kohana was changing a lot with each major release.

I am definitely not saying that the approach taken by the writers of these frameworks is wrong. They are both good products. But for me I want to be writing business logic, not worrying about how to get a (not available) feature working, or having to rewrite code to account for internal framework changes. Life's too short...

My frustration with both frameworks has led me to look into Ruby on Rails. The byline on the website pretty much sums up what I wanted to experience in my daily work writing software - "Web development that doesn't hurt".

Alongside this we've also had the guys from AbleTech working here on a couple of projects, and I have been impressed with what they have been able to achieve with the Rails framework.

To get started I am porting another smaller web app to Rails. It is going well, and I'm getting to like the syntax very much.

True to the Rails methodology, I got the basic app running and deployed (via Capistrano) in about 20 minutes. From there it was straight into recoding the logic.

I'll be writing a follow-up post soon about the experience.

Wednesday, March 18, 2009

Copyright Act Time-line

This post has moved to here.

Thursday, February 26, 2009

My First Computer

My very first computer was a Video Genie (aka Dick Smith System 80). It was a clone of the popular TRS-80 (often called the trash 80). I was a student at the time and paid for it with money earned working on the weekends and school holidays. It took me a whole 3 months to save up to buy it. The guy I worked for could not see the point.

For those who don't know the machine, it had a cassette player for saving and loading programs in Basic.

The first game I remember playing was Hunt the Wumpus which I entered by hand from Creative Computing. I wrote my own version of the game with 3 interlinked layers, but sadly I have lost the code.

Very few of my peers had computers of their own. One guy (David ????) built his own computer and had to programme it by hand one step at a time, although I think this was before I got the System 80.

The second computer I owned was a Sinclair ZX-81. I was working by this stage and I bought an after-market keyboard and disc drive to augment the cassette drive.

After that it was onto a MicroBee, which I mainly bought because it had a Zilog Z80 microprocessor and I wanted to learn assembler. This was with the aim of making a Z80 based controller for my ICON IC-720 amateur radio transceiver. The unit had a bi-directional data port on the back, and I initially built a discrete controller with about 30 integrated circuits. I still have it and it still works!

I never did get around to building the Z80-based unit, but I found the chips and data books in the garage a couple of months ago.

One odd memory I have is of selling the ZX-81. The guys who came to house to look at it all had beards - not sure why I remember that - and seemed intensely interested in all the details of the mods I'd made. At the time I thought they were a bit 'odd', but these days nothing out of the ordinary in the software world!

I was PC-less for several years after that - I thought I'd spent too much and got more interested in music (CD's had just come out).

What was your first PC (link from the comments)?

Tuesday, February 10, 2009

Book Review: The Public Domain by James Boyle

Over summer I read a bunch of books, including "The Public Domain: Enclosing the Commons of the Mind" by James Boyle.

In this book Boyle explains why the Public Domain is important and shows how it is being eroded by current Intellectual Property (IP) laws and practices. IP is defined as Copyrights, Patents, and Trademarks.

The book begins by explaining why we need IP, and then moves on to outline "The Jefferson Warning" - some basic principles that should govern the implementation of an IP system in society.

The central argument is that the scope of IP should be limited to only cover the minimum required to create the effect (an incentive to create) desired.

The third chapter compares the enclosure (migration of property from public shared ownership to private controlled ownership) of land with the enclosure of intangible things and asks if this is a good idea.

The Internet Threat is explained in chapter 4, and this outlines how the threat of costless copying has been framed to law makers in a way that (possibly) overstates the impact the internet might have on IP.

Chapter 5 is an allegory call the Farmers Tale which helps us to understand the concept of trespass, and how this relates to the DMCA.

Chapter 6 has the history of the Ray Charles song I Got A Woman. The song was a product of what went before it, and has been the source for other material since. Boyle argues convincingly that this is the stuff of culture, but that the scope of IP law is too wide, shutting down important cultural innovation.

Chapter 7 moves the argument into the field of science and technology with two case studies. The second of these is a look at synthetic biology - something I knew nothing about - and it was interesting to get some understanding of the concerns of those involved in this bleeding edge research.

A Creative Commons is dealt with in chapter 8 where Boyle draws parallels between the CC movement and those of Free and Open Source Software.

Chapter 9 is an expose of the processes used to change laws, and how these changes often are not supported by factual research. In the example used, the copyrighting of databases in the EU, Boyle shows that the law change did not have the desired effect.

In Chapter 10 Boyle asks if we can learn from the environmental movement; an understanding of these issues has become normalised over time. Will this happen with the Public Domain? Can society get to the point where there is a common understanding of the issues at stake?

I liked this book a lot. If you are having trouble understanding the Public Domain and it relationship to IP, you should read it. Boyle moves you from the familiar to the unfamiliar, and frames his arguments in a logical way and provides plenty of background material.

Boyle is clearly in favour of a strong Public Domain, but presents both sides of the debate in a way that helps you understand the issues from new perspectives.