What’s a CTO actually do? (and a job advert)

It can be hard to tell what somebody else’s job actually is. If you’ve never done it, you don’t know what really matters. Job adverts with bulleted lists of skills give some indication, yet somehow don’t get to the heart of it. The language really matters, writing it clearly, describing tasks in a concrete way. […]

10 technical things you didn’t know about the new ScraperWiki

1. Scrapers are now completely language neutral. Not just Python and Ruby – but anything open source that can make or read an SQLite file, from R to Clojure. 2. Scrapers can have as many files as they like. So you can use modules, write separate tests… whatever you want to do. 3. You can […]

Job: Product Marketing Manager

Our new platform and associated data science services are going well, so we’re hiring an ambitious marketeer to help us communicate better what we’re doing. Full job advert and how to apply here It’s our first full time marketing role, so since we’re a start up it needs to be someone quite versatile. It’s also a great […]

Quick start guide: Make your first data tool

The new ScraperWiki is all about tools, like a hammer rather than a washing machine. For data scientists and developers we’ve made a new quick start guide, that takes you through making your first tool. We’re looking forward to seeing what kind of tools you make!

Tools liberate your data from apps

An app is something that makes it easy for a user to achieve one thing. It’s an appliance. Like a washing machine. On the new ScraperWiki platform we talk a lot about tools. A tool is something you use with materials and with other tools to achieve a variety of things. Like a hammer. At ScraperWiki, […]

Summarise #4: Images and domains

(This is the fourth part in a series of posts about the “Summarise this dataset” tool on the new beta.scraperwiki.com platform – go there and sign up for free to try it out! The code is open source; take a look in facts.js for the key parts) URLs are a type of data that is particularly […]

Scheduling! Keep your data fresh

We’ve added scheduling to the “Code in your browser” tool on beta.scraperwiki.com. For now it is daily, as that covers most people’s uses. Please ask if you need something else! Or have a look at the tool’s source code. Want to know how to use the new ScraperWiki? There’s a quick start guide to coding […]

Summarise #3: Buckets of time and numbers

In the last two weeks I introduced the “Summarise automatically tool”, which magically shows you interesting facts about any dataset in the new ScraperWiki. It’s an open source tool – geeks can play along on Github, or use the SSH button to log into the tool and see the code running in action. After adding […]

Internships – coding and data science

The last two summers, we had a really good intern (Aidan Hobson Sayers – thanks for finding him for us, John!). We’d like to do it again this year. We’ve opportunities in three areas, depending on your skills and interests. Platform team – CoffeeScript, Backbone, Unix. We use Extreme Programming. Data science team – Python, R. Scraping, […]

Summarise #2: Pies and facts

In a previous blog post, I showed how by counting the most common values in each column (like a pivot table, or “group by” in SQL), I managed to make a tool that can automatically summarise datasets. I quickly realised that there were better ways of visualising the data than just showing tables. For example, […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive by Author