New Ruby scraping tutorials – PDFs and Mechanize

Got a PDF you want to get data from?
Mark Chapman has made us two new Ruby tutorials.

Advanced Scraping: Pages Behind Forms shows you how to get data that is buried behind search boxes and drop down query lists. It uses the Mechanize library, which is a class that pretends to be a web browser, so it can work with cookies, and has a familiar interface

Advanced Scraping: PDFs shows you how to extract information from Adobe Portable Document Files. It uses the Ruby library PDF::Reader. It handles the text extract phase – working out how to parse that is a later skill.

You can findĀ all the Ruby tutorials (and links to Python and PHP ones) on one page.

Thanks Mark!

