Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.


Local ScraperWiki Library

It quite annoyed me that you can only use the scraperwiki library on a ScraperWiki instance; most of it could work fine elsewhere. So I’ve pulled it out (well, for Python at least) so you can use it offline.

How to use

pip install scraperwiki_local 

A dump truck dumping its payload
You can then import scraperwiki in scripts run on your local computer. The scraperwiki.sqlite component is powered by DumpTruck, which you can optionally install independently of scraperwiki_local.

pip install dumptruck


DumpTruck works a bit differently from (and better than) the hosted ScraperWiki library, but the change shouldn’t break much existing code. To give you an idea of the ways they differ, here are two examples:

Complex cell values

What happens if you do this?

import scraperwiki
shopping_list = ['carrots', 'orange juice', 'chainsaw']
scraperwiki.sqlite.save([], {'shopping_list': shopping_list})

On a ScraperWiki server, shopping_list is converted to its unicode representation, which looks like this:

[u'carrots', u'orange juice', u'chainsaw'] 

In the local version, it is encoded to JSON, so it looks like this:

["carrots","orange juice","chainsaw"] 

And if it can’t be encoded to JSON, you get an error. And when you retrieve it, it comes back as a list rather than as a string.

Case-insensitive column names

SQL is less sensitive to case than Python. The following code works fine in both versions of the library.

In [1]: shopping_list = ['carrots', 'orange juice', 'chainsaw']
In [2]: scraperwiki.sqlite.save([], {'shopping_list': shopping_list})
In [3]: scraperwiki.sqlite.save([], {'sHOpPiNg_liST': shopping_list})
In [4]: scraperwiki.sqlite.select('* from swdata')
Out[4]: [{u'shopping_list': [u'carrots', u'orange juice', u'chainsaw']}, {u'shopping_list': [u'carrots', u'orange juice', u'chainsaw']}]

Note that the key in the returned data is ‘shopping_list’ and not ‘sHOpPiNg_liST’; the database uses the first one that was sent. Now let’s retrieve the individual cell values.

In [5]: data = scraperwiki.sqlite.select('* from swdata')
In [6]: print([row['shopping_list'] for row in data])
Out[6]: [[u'carrots', u'orange juice', u'chainsaw'], [u'carrots', u'orange juice', u'chainsaw']]

The code above works in both versions of the library, but the code below only works in the local version; it raises a KeyError on the hosted version.

In [7]: print(data[0]['Shopping_List'])
Out[7]: [u'carrots', u'orange juice', u'chainsaw']

Here’s why. In the hosted version, scraperwiki.sqlite.select returns a list of ordinary dictionaries. In the local version, scraperwiki.sqlite.select returns a list of special dictionaries that have case-insensitive keys.

Develop locally

Here’s a start at developing ScraperWiki scripts locally, with whatever coding environment you are used to. For a lot of things, the local library will do the same thing as the hosted. For another lot of things, there will be differences and the differences won’t matter.

If you want to develop locally (just Python for now), you can use the local library and then move your script to a ScraperWiki script when you’ve finished developing it (perhaps using Thom Neale’s ScraperWiki scraper). Or you could just run it somewhere else, like your own computer or web server. Enjoy!

Tags: , ,

5 Responses to “Local ScraperWiki Library”

  1. Eric January 2, 2013 at 12:11 pm #

    Is there anything like this for Ruby?

  2. Francis Irving January 4, 2013 at 6:36 pm #

    Alas no! We hope somebody will make one as we roll out x.scraperwiki.com to help migrate scripts to it 🙂

  3. Francis Irving January 31, 2013 at 9:03 am #

    Nowdays you do “pip install scraperwiki” (instead of scraperwiki_local)

  4. carl plant  (@carlplant) March 9, 2013 at 12:37 pm #

    Hi, I’ve tried to install scraperwiki but need pdftoxml module, can’t for the life of me find the module to install! Please could you advise where to find the module?

    • Tom Fisher September 9, 2013 at 2:46 pm #

      Same problem here! Would love to know where to find the pdftohtml module

We're hiring!