Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.

Blog

A faster, safer sandbox to play in

When programmers first hear about ScraperWiki, their initial reaction is often “what! you let anyone edit general purpose code and run it on your servers!”.

The answer is that, yes, we do, but in an isolated environment. Your own “sandbox” if you like, where you can safely build castles without knocking others over. Or, as The Julian calls it, a “firebox” where you can burn logs without burning down the whole house.

We’re rolling out an upgrade to that environment, changing to a new core technology. We used to use a thing called UML (User Mode Linux), and now we’re changing our sandbox to use a thing called LXC (Linux Containers).

It’s just been deployed, but enabled only for beta test users. Changes are:

  • Safe: The scripts now run in better isolation from each other. This is so that we can offer private scrapers securely, making sure they cannot read each other’s data and code.
  • Fast: Both in the editor, and when scheduled, scrapers and views run a lot quicker. The old system used a particularly slow method to identify scrapers, making it pause for half a second each page scraped, or write to the datastore (for Unix geeks, it spawned “lsof” each time). This is now down to a fraction of the time (it just looks at a bridge network IP address).
  • Robust: We don’t have any long running virtual machines any more, LXC is light enough it effectively “boots up” each time the script is run. After we’ve fixed any bugs in the daemon that manages all this, it should be fundamentally more reliable.
  • Updated languages: With the migration, we’re also moving from Python 2.6.2 to Python 2.7.1, and from Ruby 1.8.7 to Ruby 1.9.2. The Ruby move is particularly significant, it should be faster and make scraping unicode easier.
  • Updated libraries: We’ve updated all the 3rd party libraries in the sandbox to their most recent versions.

What next? We’ll spend about a week with beta testers, testing the new containers, for bugs, compatibility and performance. If you’d like to help test, please do get in touch. We can enable it so all scrapers and views you own will run in the new LXC environment.

After that, we will start rolling it out whether you like it or not! This will break some scrapers. Specifically, there are some minor syntax changes in Ruby 1.9, and some of the library upgrades might cause problems. We’ll be eliminating as many of these as possible in the test phase, and will make another announcement before we start rolling it out for everyone. But it is possible that you will have to fix up some of your scrapers. Lets us know if you need help fixing them and we’ll do our best to get one of our developers to help you out.

Bearing in mind, after that, everything will be faster 🙂

Tags: , ,

2 Responses to “A faster, safer sandbox to play in”

  1. cypecrypeva July 14, 2012 at 10:37 pm #

    Roberson And Emory Sports Medicine Misuse Of Drugs Act buy Ambien online Before you take this drug, it is very important that you tell your doctor if you suffer or have suffered from liver disease or respiratory problems such as emphysema, bronchitis, or asthma. http://www.baronsorchids.com/ – buy ambien without prescription You can purchase cheap Ambien(Zolpidem) without prescription. Ambien comes in two forms: Ambien and Ambien CR.

Trackbacks/Pingbacks

  1. New backend now fully rolled out | ScraperWiki Data Blog - October 6, 2011

    […] new faster, safer sandbox that powers ScraperWiki is now fully rolled out to all […]

We're hiring!