My unstoppable reading continues, this time I’ve polished off The Tableau 8.0 Training Manual: From Clutter to Clarity by Larry Keller. This post is part review of the book, and part review of Tableau. Tableau is a data visualisation application which grew out of academic research on visualising databases. I’ve used Tableau Public a little bit […]
Making a ScraperWiki view with R
In a recent post I showed how to use the ScraperWiki Twitter Search Tool to capture tweets for analysis. I demonstrated this using a search on the #InspiringWomen hashtag, using Tableau to generate a visualisation. Here I’m going to show a tool made using the R statistical programming language which can be used to view […]
#InspiringWomen – catching twitter with ScraperWiki
Those of you on twitter may have caught the recent #InspiringWomen hash tag, this was a response to the online abuse and threats received by many women in the public eye. On Sunday 4th August people tweeted about women who inspired them marking their tweets with the #InspiringWomen hashtag. #InspiringWomen was launched by my friend, […]
Adventures in Tableau – loading files
Tableau is a widely used visualisation tool, particularly in the business intelligence area. It grew out of the Polaris project at Stanford University, subtitled “interactive database visualisation”. This is worth bearing in mind since it is the context in which Tableau deals with data. It anticipates that the data you are interested in comes in the form of […]
pdftables – a Python library for getting tables out of PDF files
Got PDFs you want to get data from? Try our web interface and API over at PDFTables.com! One of the top searches bringing people to the ScraperWiki blog is “how do I scrape PDFs?” The answer typically being “with difficulty”, but things are getting better all the time. PDF is a page description format, it […]
Book Review: Clean Code by Robert C. Martin
Following my revelations regarding sharing code with other people I thought I’d read more about the craft of writing code in the form of Clean Code: A Handbook of Agile Software Craftmanship by Robert C. Martin. Despite the appearance of the word Agile in the title this isn’t a book explicitly about a particular methodology […]
ScraperWiki – Professional Services
How would you go about collecting, structuring and analysing 100,000 reports on Romanian companies? You could use ScraperWiki to write and host you own computer code that carries out the scraping you need, and then use our other self-service tools to clean and analyse the data. But sometimes writing your own code is not a […]
Testing, testing…
Data science is a distinct profession from software engineering. Data scientists may write a lot of computer code but the aim of their code is to answer questions about data. Sometimes they might want to expose the analysis software they have written to others in order they can answer questions for themselves, and this is […]
Book review: Natural Language Processing with Python by Steven Bird, Ewan Klein & Edward Loper
I bought Natural Language Processing in Python by Steven Bird, Ewan Klein & Edward Loper for a couple of reasons. Firstly, ScraperWiki are part of the EU Newsreader Project which seeks to make a “history recorder” using natural language processing to convert large streams of news articles into a more structured form. ScraperWiki’s role in […]
Data Science London 12th June – a speaker speaks
Data Science London run an approximately monthly programme of evening events comprising short talks, beer and pizza. Last week I was invited to give a talk on Scraping and Parsing PDF using Python. The venue for these events is the Westminster Hub in central London – we were diverted in our approach by the premier […]