ScraperWiki is a place where data professionals get, clean, analyse, visualise and manage data
Our services include:
Regular scheduled custom data provision for a monthly fee with a setup charge. This might include pricing data, lead generation data, research data. Monitoring services for social media and websites. Market data collection.
A collaborative platform for sharing, managing and analysing your data. This might be a standalone product where you carry out provision and analysis data yourselves, or a joint endeavour where we source and analyse data with you.
Data collection at scale through scraping external or internal resources with transformation to formats for reuse.
Analysis and visualisation of data, either acquired through scraping or ingested via more conventional means.
The ScraperWiki platform includes ‘off the shelf’ data tools that our data services team use for customers. In addition they can write customised data tools for specific projects
Sessions to help businesses understand methods in data collection, analysis and management.
UK Government Digital Services
When the government wanted to migrate its disparate departmental web sites to a unified central location, gov.uk, they brought us in to carry out content scraping and transformation for upload to their new content management system.
We delivered this work to a tight timeline, scraping dozens of sites and thousands of web pages. As a result of this work we are now suppliers on GCloud.
Channel 4 Dispatches
When Channel 4’s Dispatches program wished to investigate whether selling off Britain’s assets could cut the national debt they turned to us to extract data from The National Asset Register 2007 which was locked up in PDF files.
We created a sophisticated visualisation which enabled reporters and viewers to get an overview of the data with the ability to drill down to single lines in the original source documents.
As an extension of our work with Channel 4 Dispatches we were asked to help establish whether councils could sell 11,000 acres of land to fill budget holes. Our response was to create a visualisation tool where any viewer could type in a postcode to see available brownfield sites in their area.
This visualisation was based on 25,000 data points stored in an Excel spreadsheet file, which we ingested into our data hub which provides richer analysis, visualisation and publication.
Corporate data hub
A major business to business publisher has an ongoing need to source and share commodity pricing data amongst its journalists, and they find conventional mechanisms such as document management systems and shared drives lacking.
The ScraperWiki data hub allows them to easily share data which they have sourced themselves and we are providing data using scrapers. The data hub also provides analysis and visualisation through an ever growing set of data tools. As part of our service we have delivered a Data 101 Workshop to their journalists and data professionals.
A firm of lawyers in the US receives a ‘Judgement Abstract Report’ as a PDF from the state every week. The report is used to identify potential clients for the law firm. It has a regular layout but it is not machine readable and reusing it once required manual retyping.
We wrote a script which automatically receives the PDF, ingests it into their ScraperWiki datahub, schedules itself weekly to collect the incremental data, and makes the data available privately to the lawyers.
Liverpool John Moores University health map
A research department at the University wanted to provide data to the emergency services to support their work. For example, identifying factors such as numbers of assaults near licensed premises and schools, or numbers of ambulance call outs to falls by local demographics.
For this work we matched up public data from the Office of National Statistics and local authorities with internal data from the emergency services. This data was then displayed on an interactive map which enabled researchers to make sense of the complex interactions in the data.
Devon and Somerset Fire Incident Data
As data scientists we are always eager to probe new and interesting data sources. One of our founders, Julian Todd, was interested in the Fire Incident Data published by Devon & Somerset’s Fire and Rescue Service. The data is functionally and clearly presented, but it lacks the opportunity to see a bigger picture of how their resources are used and applied.
By ingesting the data into the ScraperWiki data hub, and combining this with a sophisticated interactive visualisation, we are able to make much more of the underlying data.
EU NewsReader Project (FP7)
We are working with academic groups and companies in a pan-European collaboration under the EU’s Framework 7 research and innovation programme. The NewsReader project ingests vast quantities of news and related data and uses natural language processing to offer enhanced navigation and analysis.
Our role is to provide data sources in the form of scrapers for openly available material such as parliamentary proceedings and to support the exploitation and dissemination of results and technologies arising from the project.
Company data for the academic sector
When Marko Klasnja, a PhD politics student at New York University, wanted to get data on thousands of companies from the website of the Romanian Ministry of Finances he turned to us.
Following his exemplary brief we were able to quickly extract the data from the HTML tables on the site and present it to him along with a report which highlighted discrepancies in the underlying data. Our experience with and in academia means we understand the problems academic customers face.