NewsReader – one year on

ScraperWiki has been contributing to NewsReader, an EU FP7 project, for over a year now. In that time, we’ve discovered that all the TechCrunch articles would make a pile 4 metres high, and that’s just one relatively small site. The total volume of news published everyday is enormous but the tools we use to process it […]

Book review: Mining the Social Web by Matthew A. Russell

The twitter search and follower tools are amongst the most popular on the ScraperWiki platform so we are looking to provide more value in this area. To this end I’ve been reading “Mining the Social Web” by Matthew A. Russell. In the first instance the book looks like a run through the APIs for various […]

Asking data questions of words

The vast majority of my contributions to the web have been loosely encoded in the varyingly standard-compliant family of languages called English. It’s a powerful language for expressing meaning, but the inference engines needed to parse it are pretty complex, staggeringly ancient, yet cutting edge (i.e. brains). We tend to think about data a lot […]

ScraperWiki

Extract tables from PDFs and scrape the web

Tag Archives | natural language processing

NewsReader – one year on

Book review: Mining the Social Web by Matthew A. Russell

Asking data questions of words