My unstoppable reading continues, this time I’ve polished off The Tableau 8.0 Training Manual: From Clutter to Clarity by Larry Keller. This post is part review of the book, and part review of Tableau. Tableau is a data visualisation application which grew out of academic research on visualising databases. I’ve used Tableau Public a little bit […]
Making a ScraperWiki view with R
In a recent post I showed how to use the ScraperWiki Twitter Search Tool to capture tweets for analysis. I demonstrated this using a search on the #InspiringWomen hashtag, using Tableau to generate a visualisation. Here I’m going to show a tool made using the R statistical programming language which can be used to view […]
#InspiringWomen – catching twitter with ScraperWiki
Those of you on twitter may have caught the recent #InspiringWomen hash tag, this was a response to the online abuse and threats received by many women in the public eye. On Sunday 4th August people tweeted about women who inspired them marking their tweets with the #InspiringWomen hashtag. #InspiringWomen was launched by my friend, […]
Adventures in Tableau – loading files
Tableau is a widely used visualisation tool, particularly in the business intelligence area. It grew out of the Polaris project at Stanford University, subtitled “interactive database visualisation”. This is worth bearing in mind since it is the context in which Tableau deals with data. It anticipates that the data you are interested in comes in the form of […]
Exploring Stack Exchange Open Data
Inspired by my long commute and the pretty dreadful EDM music blasting out in my gym, I’ve found myself on a bit of a podcast kick lately. Besides my usual NPR fare (If you’ve not yet listened to an episode of This American Life with Ira Glass, you’ve missed out), I’ve been checking out the […]
pdftables – a Python library for getting tables out of PDF files
Got PDFs you want to get data from? Try our web interface and API over at PDFTables.com! One of the top searches bringing people to the ScraperWiki blog is “how do I scrape PDFs?” The answer typically being “with difficulty”, but things are getting better all the time. PDF is a page description format, it […]
ScraperWiki – Professional Services
How would you go about collecting, structuring and analysing 100,000 reports on Romanian companies? You could use ScraperWiki to write and host you own computer code that carries out the scraping you need, and then use our other self-service tools to clean and analyse the data. But sometimes writing your own code is not a […]
What Does It All Mean? Find out with Summarise This Data
Every time I generate a new dataset, the first thing I want is a high-level overview of what’s going on. I can’t digest millions of individual rows of data – I need a way to zoom-out and get the bigger picture of what’s going on. Take the table below which shows all the National Trust […]
Testing, testing…
Data science is a distinct profession from software engineering. Data scientists may write a lot of computer code but the aim of their code is to answer questions about data. Sometimes they might want to expose the analysis software they have written to others in order they can answer questions for themselves, and this is […]
Book review: Natural Language Processing with Python by Steven Bird, Ewan Klein & Edward Loper
I bought Natural Language Processing in Python by Steven Bird, Ewan Klein & Edward Loper for a couple of reasons. Firstly, ScraperWiki are part of the EU Newsreader Project which seeks to make a “history recorder” using natural language processing to convert large streams of news articles into a more structured form. ScraperWiki’s role in […]