Yahoo!Finance to Tableau via ScraperWiki

Our recently announced OData connector gives Tableau users access to a world of unstructured and semi-structured data. In this post I’d like to demonstrate the power of a Python library, Pandas, and the Code in a Browser tool to get “live” stock market data from Yahoo!Finance into Tableau. Python is a well-established programming language with […]

Book review: The Signal and the Noise by Nate Silver

Nate Silver first came to my attention during the 2008 presidential election in the US. He correctly predicted the outcome of the November results in 49 of 50 states, missing only on Indiana where Barack Obama won by just a single percentage point. This is part of a wider career in prediction: aside from a […]

Digitally enhanced social research

Guest post by Dr Rebecca Sandover. The continued expansion of social media activity raises many questions of how this ever-changing digital life spreads ideas and how ‘contagious’ online events arise. Exeter University’s Contagion project has been running since September 2013, funded by the UK Economics and Social Research Council to explore how such events spread […]

NewsReader – one year on

ScraperWiki has been contributing to NewsReader, an EU FP7 project, for over a year now. In that time, we’ve discovered that all the TechCrunch articles would make a pile 4 metres high, and that’s just one relatively small site. The total volume of news published everyday is enormous but the tools we use to process it […]

Face ReKognition

I’ve previously written about social media and the popularity of our Twitter Search and Followers tools. But how can we make Twitter data more useful to our customers? Analysing the profile pictures of Twitter accounts seemed like an interesting thing to do since they are often the faces of the account holder and a face […]

Book review: Hadoop in Action by Chuck Lam

Hadoop in Action by Chuck Lam provides a brief, fairly technical introduction to the Hadoop Big Data ecosystem. Hadoop is an open source implementation of the MapReduce framework originally developed by Google to process huge quantities of web search data. The name MapReduce, refers to dividing up jobs amongst multiple processors (“Mapping”) and then recombining […]

Book review: Python for Data Analysis by Wes McKinney

As well as developing scrapers and a data platform, at ScraperWiki we also do data analysis. Some of this is just because we’re interested, other times it’s because clients don’t have the tools or the time to do the analysis they want themselves. Often the problem is with the size of the data. Excel is […]

Book review: Mining the Social Web by Matthew A. Russell

The twitter search and follower tools are amongst the most popular on the ScraperWiki platform so we are looking to provide more value in this area. To this end I’ve been reading “Mining the Social Web” by Matthew A. Russell. In the first instance the book looks like a run through the APIs for various […]

Book review: Data Mining – Practical Machine Learning Tools and Techniques by Witten, Frank and Hall

I’ve been doing more reading on machine learning, this time in the form of Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank and Mark A. Hall. This comes by recommendation of my academic colleagues on the Newsreader project, who rely heavily on machine learning techniques to do natural language […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive by Author