Comments on: 600 Lines of Code, 748 Revisions = A Load of Bubbles https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/ Extract tables from PDFs and scrape the web Thu, 14 Jul 2016 16:12:42 +0000 hourly 1 https://wordpress.org/?v=4.6 By: Scraping guides: Excel spreadsheets | ScraperWiki Data Blog https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/#comment-601 Wed, 14 Sep 2011 15:55:38 +0000 http://blog.scraperwiki.com/?p=758214401#comment-601 […] We used an Excel scraper that pulls together 9 spreadsheets into one dataset for the brownfield sites map used by Channel 4 News. […]

]]>
By: #hhhglas – the really late “live” blog » Nicola Osborne https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/#comment-600 Mon, 09 May 2011 16:31:04 +0000 http://blog.scraperwiki.com/?p=758214401#comment-600 […] So, to finish off the introductory part of the day you might be wondering how ScraperWiki makes money? Well private scrapers (the default is public and shared) or excessive API calls can be charged for. And for consulting/pieces of work for others – big organisations don’t need to do this stuff regularly just occasionally so it makes sense to contract it out. So we worked with Channel4 News and the Dispatches programme. […]

]]>
By: All the news that’s fit to scrape | Online Journalism Blog https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/#comment-599 Fri, 25 Mar 2011 13:54:14 +0000 http://blog.scraperwiki.com/?p=758214401#comment-599 […] the Open Knowledge Foundation blog (more on Scraperwiki’s blog): “ScraperWiki worked with Channel 4 News and Dispatches to make two supporting data […]

]]>
By: National Asset Bubbles – All in a day’s data « Data Miner UK https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/#comment-598 Wed, 09 Mar 2011 14:26:17 +0000 http://blog.scraperwiki.com/?p=758214401#comment-598 […] This is a visual made from the most inaccessible (both data and journalistically) PDFs of the National Asset Register. the information it contained was used for a Dispatches live debate and this repurposing was put into an article on the Channel 4 News website. I was fortunate enough to be part of the ScraperWiki team that took on the project and produced it in a matter of days. I have written a blog post on ScraperWiki here. […]

]]>
By: nicolahughes https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/#comment-597 Tue, 08 Mar 2011 17:06:00 +0000 http://blog.scraperwiki.com/?p=758214401#comment-597 The cropper and annotator are new features just rolled out. We’re testing to see what is most useful. Those that are will be made much more user-friendly. Our goal is to make unstructured data more usable but also much more explorable. Sadly, the data as it is collected, has very little information attached. Hopefully with exposure it will be collected in a much more usable manner

]]>
By: Tim Green https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/#comment-596 Tue, 08 Mar 2011 17:01:13 +0000 http://blog.scraperwiki.com/?p=758214401#comment-596 Now I feel a bit silly after finding the ‘Instructions’ bit. I guess I was mostly wondering if it could also do OCR, but the pdf annotator seems to be the tool for actually extracting information.

]]>
By: Tim Green https://blog.scraperwiki.com/2011/03/600-lines-of-code-748-revisions-a-load-of-bubbles/#comment-595 Tue, 08 Mar 2011 16:58:48 +0000 http://blog.scraperwiki.com/?p=758214401#comment-595 Is there an explanation anywhere of the PDF cropper? I’ve seen it a few times from the Dispatches data, but the pages doesn’t give much explanation. I assume it’s only been used internally so far.

]]>