Richard Hulse: July 2009

Thursday, July 9, 2009

Browser and OS trends visitor trends at Radio NZ

Yesterday I took a look back at browser and OS usage changes over the last couple of years to evaluate any useful trends.

Here are the browser stats from the last month, and the same period 1 & 2 year's ago:

Type	2007	2008	2009
IE	70%	65%	58%
Firefox	22%	27%	27%
Safari	4%	5.4%	8.6%
Chrome	-	-	3.1%
Opera	.93%	1%	.96%

IE version 5.x is virtually extinct.

Looking at the underlying data I'd draw the following conclusions:

IE use is on the decline generally, and IE users are the slowest at upgrading
Firefox usage appears to have plateaued (and they are the fastest to upgrade)
Most growth is in new browser entrants.
There is greater market fragmentation (more choice for consumers)

There are also changes in Operating Systems (using the same periods):

Type	2007	2008	2009
Windows	91.8%	90.4%	86.6%
Macintosh	7%	8%	10.7%
GNU/Linux	.98%	1.3%	1.83%
iPhone	-	.06%	0.33%

Around 73% of Windows users still use XP, and only 56% of Mac users are on OSX 10.5. The lower rate of Mac upgrades could limited by hardware restrictions.

For many website developers these figures will represent a significant challenge - you can no longer design your website for any one browser or OS. The days of 'best viewed in browser X' are gone.

And yes, I still see sites that only work in IE. I went to get an on-line quote for something last week, and the site simply would not work in Firefox or Safari. I took my business elsewhere. The average punter isn't going to know why - they'll just think the site doesn't work. Is this the branding message you want to send to visitors?

Based on these stats, failing to design cross-platform websites will give at least 15% of your users an inferior experience. That is a lot of lost traffic (and business).

Saturday, July 4, 2009

Rails Active Directory Authentication

Radio New Zealand has released a rails plugin to allows user to authenticating to Active Directory from a Rails Application.

We use it with the restful authentication plugin to allow a single set of credentials to be used for each person, regardless of the application.

Michael Koziarski wrote the plugin, and we (RNZ) are releasing it under the MIT license.

The source is available on GitHub.

Friday, July 3, 2009

Deploying Acts as Solr on Tomcat with Rails

At Radio NZ we have a number of small Rails applications that need really good search. In several cases we want field-based searching.

e.g. title:"A Blog Post"

to look only in the field 'title' for "A Blog Post".

Apache Solr is:

"an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features."

Acts as Solr is plugin for Rails that provides a Rails interface to Solr. The plugin includes Solr, and provides rake tasks to start, stop and reindex the Solr index.

This works fine in development, but gets tricky in production. If you have several apps, it does not really make sense to have several instances of Solr running on the same server (and you have to change the port number for each so they don't clash). There is also the question of how to ensure Solr restarts if the server reboots.

Because of this I decided to run the production instance of Solr in a Tomcat container. Each app has there own index on the development machine, and when deployed the solr.yml file tells each app to use the single Slor/Tomcat instance instead.

We use Debian Lenny on our production server. You should read these instructions first. Twice. They work for Lenny, and only a few minor tweaks are required to adapt them for Acts as Solr.

The first thing is to use the version of Solr that comes with the plugin - it has changes in the config files that make it work correctly with the plugin. I found this out the hard way.

Deploy your Rails app first and install Tomcat as outlined above. Then do this:

sudo cp path/to/your/app/vendor/plugins/acts_as_solr/solr/webapps/solr.war /usr/local/tomcat6/webapps/solr.war

sudo cp -r path/to/your/app/vendor/plugins/acts_as_solr/solr/solr /usr/local/tomcat6/solr/

Then carry on with the rest of the recipe and you are done.

Parsing Word HTML with Ruby and Rails

I promised I would write about my first Rails project, so here goes.

Within Radio NZ we have a web application for parsing Microsoft Word documents and reformatting them for the web. Content from Word is pasted into a WYSIWYG cell, and submitted to the application. A preview of the parsed content is presented for review.

If this looks OK, then the page is submitted again and an XML document is generated from the content. This XML is sent to our Content Management System to be imported.

If the content is NOT to be imported, then it can be cut from the preview page, and pasted into the CMS WYSIWYG.

The parser ensures that the code is cleaned of extraneous markup, and validates. In the case of pre-formatted documents like our schedules, it can split a week's content into 7 day-parts, format subheadings, add emphasis to times, and add links to programme names. You can see the result here and here.

The old application was written in PHP and used a combination of regular expressions, HTML Tidy, and some stream-based parsing to work its magic.

The updated application is written in Ruby on Rails, uses the Sanitize and Hpricot Gems, and is much more modular. The reasons for the change was to make the app more maintainable - I wanted to add more parsing filters - and the PHP code was a bit of a mess.

I could have refactored the PHP version, but I needed a real project to help me learn Rails, and I suspected it would be less work anyway. Also, having testing built in has advantages when you are writing a parser.

The Rails version has the same basic workflow. Content from Word is pasted into a WYSIWYG (in this case the FCK Editor). The HTML is sent to this bit code which cleans most of the rubbish and does a few RNZ specific things like standardise time formatting.

The cleaner adds new lines after certain tags, and this is passed to a stream-based parser. That walks through the document and processes it based on the document type (set via a drop-down setting when the content was submitted).

The new version is in production now, and is more reliable than the old. This is partly because I cleaned up some of the underlying algorithms, fixed some logic bombs, and added a lot more error checking.

One important check added in this version was for smartags. This is a feature of Word the tags certain Words to give them special attributes. The problem is that when pasted and parsed they do not appear in the final document. The new parser checks for these and reminds the user to remove them first.

I really liked the Rails framework. The two best parts were:

Having sensible defaults and behaviours for things. I used to spend most of my time in PHP just configuring stuff and getting things flowing.
The second was the Ruby language (and the Rails extension to it). Just brilliant. The language design is very good with consistent interfaces and predictable syntax. It certainly made a nice change from working which PHP function to use and what order the parameters should be in.

I have also coded in Perl and C (and Assembler), and I like some of the Perlish things that have made their way into Ruby. You can use =~ to compare a string with a /regex/. Cool. Being able to write

when /regex/

inside a

case string

block. Very cool

(I still use a lot of Perl - the Radio NZ audio and text publishing engines are built with Perl).

There are some other Rails projects in the works at Radio NZ - one of them is a search tool based on Solr (an update of the BRAD application that I am working on with Marcus from AbleTech) - so expect some more Ruby and Rails posts in the near future.