Back to contents PHP Python Ruby Choose a language:

In addition to all the standard Python libraries for downloading and parsing pages from the web, ScraperWiki provides the scraperwiki Python library.

Access like this:

import scraperwiki

The source code that implements these functions can be found in our bitbucket repository.

Scraping

You can also use any Python HTTP library, such as urllib2.

scraperwiki.scrape(url[, params][,user_agent])

Returns the downloaded string from the given url.

params are sent as a POST if set.

user_agent sets the user-agent string if provided.

Datastore (SQLite)

ScraperWiki provides a fully-fledged SQLite database for each scraper which you can save to. You can read the data back that has been committed by other scrapers, or extract it through the API.

See Datastore copy & paste guide for examples. See SQLite's SQL as understood by SQLite for the query language.

scraperwiki.sqlite.save(unique_keys, data[, table_name="swdata", verbose=2])

Saves a data record into the datastore into the table given by table_name.

data is a dict object with field names as keys; unique_keys is a subset of data.keys() which determines when a record is overwritten.

For large numbers of records data can be a list of dicts.

verbose alters what is shown in the Data tab of the editor.

scraperwiki.sqlite.attach(name[, asname])

Attaches to the datastore of another scraper named name (which should be the short-name of the scraper, as it appears in the URL of its overview page).

asname is an optional alias for the attached datastore.

Attached scrapers are mounted read-only. You can see some examples in this post on our mailing list.

scraperwiki.sqlite.execute(sql[, vars], verbose=1)

Executes any arbitrary sqlite command (except attach). For example create, delete, insert or drop.

vars is an optional list of parameters, inserted when the select command contains ‘?’s. For example:

scraperwiki.sqlite.execute("insert into swdata values (?,?,?)", [a,b,c])

The ‘?’ convention is like "paramstyle qmark" from Python's DB API 2.0 (but note that the API to the datastore is nothing like Python's DB API). In particular the ‘?’ does not itself need quoting, and can in general only be used where a literal would appear.

scraperwiki.sqlite.select(sqlfrag[, vars], verbose=1)

Executes a select command on the datastore. For example:

scraperwiki.sqlite.select("* from swdata limit 10")

Returns a list of dicts that have been selected.

vars is an optional list of parameters, inserted when the select command contains ‘?’s. This is like the feature in the .execute command, above.

scraperwiki.sqlite.commit()
Commits to the file after a series of execute commands. (sqlite.save auto-commits after every action).
scraperwiki.sqlite.show_tables([dbname])
Returns an array of tables and their schemas in either the current or an attached database.
scraperwiki.sqlite.table_info(name)
Returns an array of attributes for each element of the table.
scraperwiki.sqlite.save_var(key, value)
Saves an arbitrary single-value into a table called swvariables. Intended to store scraper state so that a scraper can continue after an interruption.
scraperwiki.sqlite.get_var(key[, default])
Retrieves a single value that was saved by save_var. Only works for string, float, or int types. For anything else, use the pickle library to turn it into a string.
scraperwiki.sqlite.SqliteError
An exception that is raised when there is, for example, a syntax error in your sql query.

Views

scraperwiki.utils.httpresponseheader(headerkey, headervalue)

Set the content-type header to something other than HTML when using a ScraperWiki "view". For example:

scraperwiki.utils.httpresponseheader("Content-Type", "image/png")
scraperwiki.dumpMessage({"content":base64.encodestring(binstring), "message_type":"console", "encoding":"base64"})
The method for outputting the binary string binstring that contains, for example, a PNG image

Geocoding

Some installed functions to help you transform between different (Earth) coordinate systems.

scraperwiki.geo.os_easting_northing_to_latlng(easting, northing[, grid='GB'])
Converts a OSGB or OSIE (grid='IE') grid reference to a WGS84 (lat, lng) pair.
scraperwiki.geo.extract_gb_postcode(string)
Attempts to extract a UK postcode from a given string.
scraperwiki.geo.gb_postcode_to_latlng(postcode)
Returns a WGS84 (lat, lng) pair for the central location of a UK postcode.

Miscellaneous

scraperwiki.pdftoxml(pdfdata, options='')
Convert a byte string containing a PDF file into an XML file containing the coordinates and font of each text string.
options is an optional string of options (!!) to be passed to the underlying pdftohtml command (see the pdftohtml documentation for details).
Refer to the example for more details.
modulename = scraperwiki.utils.swimport(name)
Imports the code from another scraper as the module modulename.

Exceptions

scraperwiki.Error
This is the base class for all exceptions raised by the ScraperWiki library code. Currently there is only one subclass used (see below), but we like to leave room for future expansion.
scraperwiki.CPUTimeExceededError

This is raised when a script running on ScraperWiki has used too much CPU time. This is implemented in a similar fashion across all our supported languages and is explained in a bit more detail in the FAQ.

This is a simple example of how to catch the exception.