Data analysis using the Query with SQL tool

Ferdinand Magellan, the Renaissance’s most prodigious explorer. He almost certainly knew lingua franca – but did he know SQL?It’s Summer 1513. Rome is the centre of the Renaissance world, and Spanish, Italian, and Portuguese merchant ships criss-cross the oceans, ferrying textiles from the North, spices from the East, and precious metals from the newly-discovered Americas. […]

Announcing the new ScraperWiki.com

Today is a big day for ScraperWiki – Our new platform is coming out of beta. Sign up and give it a go! We think you’ll like it. ScraperWiki is about liberating data from silos and empowering you to do what you want with it. You can either write your own code, or use our […]

It’s all about tools

The new ScraperWiki is all about tools. People talk a lot about data, big data, data mining, data science. But the action happens in tools. Tools like Excel, R, SPSS, Python. Or, on the new ScraperWiki, tools like View in a table, Summarise this data and Query with SQL. We’ve just pushed an improvement to […]

10 technical things you didn’t know about the new ScraperWiki

1. Scrapers are now completely language neutral. Not just Python and Ruby – but anything open source that can make or read an SQLite file, from R to Clojure. 2. Scrapers can have as many files as they like. So you can use modules, write separate tests… whatever you want to do. 3. You can […]

Quick start guide: Make your first data tool

The new ScraperWiki is all about tools, like a hammer rather than a washing machine. For data scientists and developers we’ve made a new quick start guide, that takes you through making your first tool. We’re looking forward to seeing what kind of tools you make!

Tools liberate your data from apps

An app is something that makes it easy for a user to achieve one thing. It’s an appliance. Like a washing machine. On the new ScraperWiki platform we talk a lot about tools. A tool is something you use with materials and with other tools to achieve a variety of things. Like a hammer. At ScraperWiki, […]

Summarise #4: Images and domains

(This is the fourth part in a series of posts about the “Summarise this dataset” tool on the new beta.scraperwiki.com platform – go there and sign up for free to try it out! The code is open source; take a look in facts.js for the key parts) URLs are a type of data that is particularly […]

Scheduling! Keep your data fresh

We’ve added scheduling to the “Code in your browser” tool on beta.scraperwiki.com. For now it is daily, as that covers most people’s uses. Please ask if you need something else! Or have a look at the tool’s source code. Want to know how to use the new ScraperWiki? There’s a quick start guide to coding […]

Summarise #3: Buckets of time and numbers

In the last two weeks I introduced the “Summarise automatically tool”, which magically shows you interesting facts about any dataset in the new ScraperWiki. It’s an open source tool – geeks can play along on Github, or use the SSH button to log into the tool and see the code running in action. After adding […]

Summarising Serendipity

5 years ago, a friend and I sat down in a pub in Shrewsbury, drank some beer, and chatted about the web. Every month since, people have been doing that in Shrewsbury (and a few times in Ludlow). It’s called ShropGeek (we’re very savvy in our naming conventions, you see). It was started and organised […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive | Data Science Platform