Here’s an example of why you have to be very careful when scraping, and why your normal run-of-the-mill technology that makes assumptions won’t cut it: One of our super-users, Julian Todd, decided to scrape the Vehicle Certification Agency (VCA) website on new car fuel consumption and exhaust emissions figures. And he spotted this: And another […]
New event! Hacks and Hackers Hack Day Cardiff (#hhhCar)
The UK Hacks & Hackers tour carries on – into 2011. Our first stop: Wales. Scraperwiki, which provides award-winning tools for screen scraping,data mining and visualisation, will hold a one day practical hack day* at the Atrium in Cardiff on Friday 11 March, 2011. Web developers and designers will pair up with journalists and bloggers […]
Be alert! Your scrapers need alerts
It’s important to know when your scrapers have stopped working, so you can fix them. And if someone else makes a change to one of your scrapers, you need to know, so you can check it’s OK and thank them. Over the next day or two, if you have made or contributed to a scraper […]
Ruby screen scraping tutorials
Mark Chapman has been busy translating our Python web scraping tutorials into Ruby. They now cover three tutorials on how to write basic screen scrapers, plus extra ones on using .ASPX pages, Excel files and CSV files. We’ve also installed some extra Ruby modules – spreadsheet and FastCSV – to make them possible. These Ruby scraping […]
Views part 2 – Lincoln Council committees
(This is the second of two posts announcing ScraperWiki “views”. A new feature that Julian, Richard and Tom worked away and secretly launched a couple of months ago. Once you’ve scraped your data, how can you get it out again in just the form you want? See also: Views part 1 – Canadian weather stations.) Lincoln […]
Student scraping in Liverpool: football figures and flying police
A final Hacks & Hackers report to end 2010! Happy Christmas from everyone at ScraperWiki! Earlier this month ScraperWiki put on its first ever student event, at Liverpool John Moores University in partnership with Open Labs for students from both LJMU’s School of Journalism and the School of Computing & Mathematical Sciences, as well as […]
Scraping PDFs: now 26% less unpleasant with ScraperWiki
Got a PDF you want to get data from? Try our easy web interface over at PDFTables.com! Scraping PDFs is a bit like cleaning drains with your teeth. It’s slow, unpleasant, and you can’t help but feel you’re using the wrong tools for the job. Coders try to avoid scraping PDFs if there’s any other option. But […]
Belfast Hacks & Hackers – the video
As we’ve previously reported, the Belfast Hacks and Hackers Hack Day in November was a great success with some brilliant projects emerging, and we’re thrilled to post this video, courtesy of the School of Media, Film and Journalism at the University of Ulster. Enjoy! Hacks and Hackers Hack Day Belfast short film, by Eleaner Mulholland […]
Hacks & Hackers RBI: The video
Media reporter Rachel McAthy has produced this excellent video from last month’s Hacks & Hackers Hack Day at RBI. View it on Journalism.co.uk, or below. More on the event at this link.
Hacks & Hackers RBI: Snow mashes, truckstops and moving home
Sarah Booker (@Sarah_Booker on Twitter), digital content and social media editor for the Worthing Herald series, has kindly provided us with this guest blog from the recent Scraperwiki B2B Hacks and Hackers Hack day at RBI. Pictures courtesy of RBI’s Adam Tinworth. Dealing with data is not new to me. Throughout my career I have […]