Mastering space and time with jQuery deferreds
Recently Zarino and I were pairing on making improvements to a new scraping tool on ScraperWiki. We were working on some code that allows the person using the tool to pick out parts of some scraped data in order to extract a date into a new database column. For processing the data on the server side we were using a little helper library called scrumble which does some cleaning in Python to produce dates in a standard format. Which is great for server-side, but we also needed to display a preview of the cleaned dates to the user, before it’s finally sent to the server for processing.
Rather than rewrite this Python code in JavaScript we thought we’d make a little script which could be called using the ScraperWiki exec endpoint to do the conversion for us on the server side.
Our code looked something like this:
var $tr = $('<tr>'); // for each cell in each row… $.each(row, function (index, value) { $td = $('<td>'); var date = scraperwiki.shellEscape(JSON.stringify(value)); // execute this command on the server… scraperwiki.exec('tools/do-scrumble.py ' + date, function(response){ // and put the result into this table cell… $td.html(JSON.parse(response)); }); $td.appendTo($tr); });
Each time we needed to process a date with scrumble we made a call to our server side Python script via the exec endpoint. When the value comes back from the server, the callback function sets the content of the table cell to the value.
However when we started testing our code we hit a limit placed on the exec endpoint to prevent overloading the server (currently no more that 5 exec calls can be executing at once).
Our first thought was to just limit the rate at which we made requests so that we didn’t trip the rate limit, but our colleague Pete suggested we should think about batching up the requests to make them faster. Sending each one individually might work well with just a few requests, but what about when we needed to make hundreds or thousands of requests at a time?
How could we change it so that the conversion requests were batched, and the results were inserted into the right table cells once they’d been computed?
jQuery.deferred() to the rescue
We realised that we could use jQuery deferreds to allow us to do the batching. A deferred is like an I.O.U that says that at some point in the future a result will become available. Anybody who’s used jQuery to make an AJAX request will have used a deferred – you send off a request, and specify some callbacks to be executed when the request eventually succeeds or fails.
By returning a deferred we could delay the call to the server until all of the values to be converted have been collected and then make a single call to the server to convert them all.
Below is the code which does the batching:
scrumble = { deferreds: {}, as_date: function (raw_date) { if (!this.deferreds[raw_date]) { d = $.Deferred() this.deferreds[raw_date] = d; } return this.deferreds[raw_date].promise() }, process_dates: function () { var self = this; var raw_dates = _.keys(self.deferreds); var date_list = scraperwiki.shellEscape(JSON.stringify(raw_dates)); var command = 'tool/do-scrumble-batch.py ' + date_list; scraperwiki.exec(command, function (response) { response_object = JSON.parse(response); $.each(response_object, function(key, value){ self.deferreds[key].resolve(value); }); }); } }
Each time as_date
is called it creates or reuses a deferred which is stored in an object keyed on the raw_date
string and then returns a promise (a deferred with a restricted interface) to the caller. The caller attaches a callback to the promise that will use the value once it is available.
To actually send the batch of dates off to be converted, we call the process_dates
method. It makes a call to the server with all of the strings to be processed. When the result comes back from the server it “resolves” each of the deferreds with the processed value, which causes all of the callbacks to fire updating the user interface.
With this design the changes we had to make to our code were minimal. It was already using a callback to set the value of the table cell. It was just a case of attaching it to the jQuery promise returned by the scrumble.as_date
method and calling scrumble.process_dates
, after all of the items had been added, to make the server side call to convert all of the dates.
var $tr = $('<tr>'); $.each(row, function (index, value) { $td = $('<td>'); var date = scraperwiki.shellEscape(JSON.stringify(value)); scrumble.as_date(date).done(function(response){ $td.html(JSON.parse(response)); }); $td.appendTo($tr); }); scrumble.process_dates();
Now instead of one call being made for every value that needs converting (whether or not that string has already been processed) a single call is made to convert all of the values at once. When the response comes back from the server, the promises are resolved and the user interface updates showing the user the preview as required. jQuery deferreds allowed us to make this change with minimal disruption to our existing code.
And it gets better…
Further optimisation (not shown here) is possible if process_dates
is called multiple times. A little-known feature of jQuery deferreds is that they can only be resolved once. If you make an AJAX call like $.get('http://foo').done(myCallback)
and then, some time later, call .done(myCallback)
on that ajax response again, the callback myCallback
is immediately called with the exact same arguments as before. It’s like magic.
We realised we could turn this quirky feature to our advantage. Rather than checking whether we’d already converted a date, and returning the pre-converted date on subsequent calls, rather than adding them to the queue to be processed, we just call the deferred .done()
callback regardless, as if this was the first time. Deferreds that have already been handled are returned immediately, meaning we only send requests to the server if there are new dates that haven’t been processed yet.
jQuery deferreds helped us keep our user interface responsive, our network traffic low, and our code refreshingly simple. Not bad for a mysterious set of functions hidden halfway down the docs.