ScraperWiki has always made it as easy as possible to code scripts to get data from web pages. Our new platform is no exception. The new browser-based coding environment is a tool like any other. Here are 9 things you should know about it. 1. You can use any language you like. We recommended Python, as it is […]
Introducing the Search for Tweets tool
Hey – my name is Ed Cawthorne and I have recently started with ScraperWiki as the resident product manager. My first task is to let you know about the “Search for Tweets” tool on the new ScraperWiki platform. To understand how the Twitter tool came about, it is useful to understand some of the background. […]
pdftables – a Python library for getting tables out of PDF files
Got PDFs you want to get data from? Try our web interface and API over at PDFTables.com! One of the top searches bringing people to the ScraperWiki blog is “how do I scrape PDFs?” The answer typically being “with difficulty”, but things are getting better all the time. PDF is a page description format, it […]
Uploading a (structured) spreadsheet
We’ve made a new tool to help you upload a structured spreadsheet. That is to say, one that contains a table with headers. I’m trying it out with an old spreadsheet of expenses from when I worked at mySociety. If your spreadsheet isn’t consistent enough, it tells you where you’ve gone wrong. In my case, I […]

Book Review: Clean Code by Robert C. Martin
Following my revelations regarding sharing code with other people I thought I’d read more about the craft of writing code in the form of Clean Code: A Handbook of Agile Software Craftmanship by Robert C. Martin. Despite the appearance of the word Agile in the title this isn’t a book explicitly about a particular methodology […]
Sharing in 6 dimensions
Hands up everyone who’s ever used Google Docs. Okay, hands down. Have you ever noticed how many different ways there are to ‘share’ a document with someone else? We have. We use Google Docs at lot internally to store and edit company documents. And we’ve always been baffled by how many steps there are to […]
We’ve migrated to EC2
When we started work on the ScraperWiki beta, we decided to host it ‘in the cloud’ using Linode, a PaaS (Platform as a Service) provider. For the uninitiated, Linode allows people to host their own virtual Linux servers without having to worry about things like maintaining their own hardware. On April 15th 2013, Linode were […]
ScraperWiki – Professional Services
How would you go about collecting, structuring and analysing 100,000 reports on Romanian companies? You could use ScraperWiki to write and host you own computer code that carries out the scraping you need, and then use our other self-service tools to clean and analyse the data. But sometimes writing your own code is not a […]
What Does It All Mean? Find out with Summarise This Data
Every time I generate a new dataset, the first thing I want is a high-level overview of what’s going on. I can’t digest millions of individual rows of data – I need a way to zoom-out and get the bigger picture of what’s going on. Take the table below which shows all the National Trust […]
Open your data with ScraperWiki
Open data activists, start your engines. Following on from last week’s announcement about publishing open data from ScraperWiki, we’re now excited to unveil the first iteration of the “Open your data” tool, for publishing ScraperWiki datasets to any open data catalogue powered by the OKFN’s CKAN technology. Try it out on your own datasets. You’ll […]