Wednesday, August 27, 2008

Creating a Pseudo Daemon Using Perl

One of the trickier software jobs I've worked on is a Perl script that runs almost continuously.

The script was needed to check a folder for new content (in the form of news stories) and process them.

The stories are dropped into a folder as a group via ftp, and the name of each story is written into a separate file. The order in this file is the order the stories need to appear on the website.

The old version of the script ran by cron every minute, but there were three problems.

The first is that you might have to wait a whole minute for content to be processed, which is not really ideal for a news service.

The second is that should the script be delayed for some reason, it is possible to end up with a race condition with two (or more scripts) trying to process the same content.

The third is that the script could start reading the order file before all the files were uploaded.

In practice 2 and 3 were very rare, but very disruptive when they did occur. The code needed to avoid both.

The new script is still run once per minute via cron, but contains a loop which allows it to check for content every three seconds.

It works this way:

1. When the script starts it grabs the current time and tries to obtain an exclusive write lock on a lock file.
2. If it gets the lock it starts the processing loop.
3. When the order file is found the script waits for 10 seconds. This is to allow any current upload process to complete.
4. It then reads the file, deletes it, and starts processing each of the story files.
5. These are all written out to XML (which is imported into the CMS), and the original files are deleted.
6. When this is done, the script continues to look for files until 55 seconds have elapsed since it started.
7. When no time is left it exits.

This is what the loop code looks like:

my $stop_time = time + 55;

my $loop = 1;
my $locking_loop = 1;
my $has_run = 0;

while( $locking_loop ){

# first see if there is a lock file and
# wait till the other process is done if there is
if( open my $LOCK, '>>', 'inews.lock' ){

flock($LOCK, LOCK_EX) or die "could not lock the file";
print "** lock gained " . localtime(time) . " running jobs: " if ($debug);

while( $loop ){
my $job_count = keys(%jobs);
for my $job (1..$job_count){
# run the next job if we are within the time limit
if( time < $stop_time ){ $has_run ++; print "$job "; process( $job ); sleep(3); } else{ $loop = 0; $locking_loop = 0; } } } close $LOCK; print "- lock released\n" } else{ print "Could not open lock file for writing\n"; $locking_loop = 0; # nothing happens here } unless($has_run){ print localtime(time) . " No jobs processed\n" } }

This is the output of the script (each number is the name of a job - we have three jobs, each for one ftp directory:

** lock gained Tue Jul 22 08:17:00 2008running jobs:1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 - lock released
** lock gained Tue Jul 22 08:18:00 2008 running jobs: 1 2 3 1 2 3 1 2 3 1

24 news files to process
* Auditor-General awaits result of complaint about $100,000 donation
* NZ seen as back door entry to Australia
.
snip
.
- lock released
** lock gained Tue Jul 22 08:19:23 2008 running jobs: 1 2 3 1 2 3 1 2 3 1 2 - lock released


You can see that the script found some content part way through its cycle, and ran over the allotted time, so the next run of the script did not get a full minute to run.

This ensures that there is never a race between two scripts. You can see next happens when a job take more than 2 minutes:

- lock released
** lock gained Tue Jul 22 10:46:32 2008 running jobs: - lock released
Tue Jul 22 10:46:32 2008 No jobs processed
** lock gained Tue Jul 22 10:46:32 2008 running jobs: - lock released
Tue Jul 22 10:46:32 2008 No jobs processed
** lock gained Tue Jul 22 10:46:32 2008 running jobs: - lock released
Tue Jul 22 10:46:32 2008 No jobs processed
** lock gained Tue Jul 22 10:46:32 2008 running jobs: 1 2 3 1 2 3 1 2


All the scripts that were piling up waiting to get the lock exited immediately, once they found that there time was up.

I am using the same looping scheme to process a queue elsewhere in our publishing system and I'll explain this in my next post.

No comments: