Which car should I (not) buy? Find out, with the ScraperWiki MOT website…

I am finishing up my MSc Data Science placement at ScraperWiki and, by extension, my MSc Data Science (Computing Specialism) programme at Lancaster University. My project was to build a website to enable users to investigate the MOT data. This week the result of that work, the ScraperWiki MOT website, went live. The aim of […]

And fast streaming CSV download…

We’re rolling out a series of performance improvements to ScraperWiki. Yesterday, we sped up the Tableau/OData connector. Today, it’s the turn of the humble CSV. When you go to “Download a spreadsheet” you’ll notice the CSV file is now always described as “live”. This means it is always up to date, and streams at full […]

Super-faster Tableau integration

We’ve just rolled out a change to make our OData endpoint much faster. For example, it is down from 20 minute to 6 minutes to import 150,000 Tweets. If you’ve Tableau, or other software that can read OData, please try it out! If you’ve already got a connection set up, you need to go and get the […]

Introducing “30 day free trial” accounts

Last May, we launched free Community accounts on ScraperWiki. We’ve since found that the limit on number of datasets isn’t enough to convert heavy users into paying customers. This matters, because we want to invest more in improving the product, and adding new tools. Today, we’re pleased to announce that we’re introducing a new Free Trial […]

Scraperwiki’s response to the Heartbleed security failure

Et tu, Heartbleed “Catastrophic” is the right word. On the scale of 1 to 10, this is an 11. ― Security expert, Bruce Schneier, responds to Heartbleed On Monday the 7th of April 2014, a software flaw was identified which exposed approximately two thirds of the web to the risk of catastrophic security failure. The flaw has […]

ScraperWiki Classic retirement guide

In July last year, we announced some exciting changes to the ScraperWiki platform, and our plans to retire ScraperWiki Classic later in the year. That time has now come. If you’re a ScraperWiki Classic user, here’s what will be changing, and what it means for you: Today, we’re adding a button to all ScraperWiki Classic […]

9 things you need to know about the “Code in your browser” tool

ScraperWiki has always made it as easy as possible to code scripts to get data from web pages. Our new platform is no exception. The new browser-based coding environment is a tool like any other. Here are 9 things you should know about it. 1. You can use any language you like. We recommended Python, as it is […]

Uploading a (structured) spreadsheet

We’ve made a new tool to help you upload a structured spreadsheet. That is to say, one that contains a table with headers. I’m trying it out with an old spreadsheet of expenses from when I worked at mySociety. If your spreadsheet isn’t consistent enough, it tells you where you’ve gone wrong. In my case, I […]

We’ve migrated to EC2

When we started work on the ScraperWiki beta, we decided to host it ‘in the cloud’ using Linode, a PaaS (Platform as a Service) provider. For the uninitiated, Linode allows people to host their own virtual Linux servers without having to worry about things like maintaining their own hardware. On April 15th 2013, Linode were […]

Your questions about the new ScraperWiki answered

You may have noticed we launched a completely new version of ScraperWiki last week. Here’s a suitably meta screengrab of last week’s #scraperwiki twitter activity, collected by the new “Search for tweets” tool and visualised by the “Summarise this data” tool, both running on our new platform. These changes have been a long time coming, […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive | Data Science Platform