Comments on: Handling exceptions in scrapers https://blog.scraperwiki.com/2012/05/handling-exceptions-in-scrapers/ Extract tables from PDFs and scrape the web Thu, 14 Jul 2016 16:12:42 +0000 hourly 1 https://wordpress.org/?v=4.6 By: Andrew https://blog.scraperwiki.com/2012/05/handling-exceptions-in-scrapers/#comment-789 Mon, 21 May 2012 10:14:33 +0000 http://blog.scraperwiki.com/?p=758216842#comment-789 Good article – most scrapers will need to handle a lot of errors a lot of the time.

However the flip side of this coin is that service providers (views) should be alerting downstream users (scrapers) of error conditions when they occur – so Scraperiwiki guys how’s about doing this for Scraperwiki Views? We should to be able to set some HTTP Status codes when the view is not going to respond in the normal way, for example:


scraperwiki.utils.httpbusy() # sets 503 status code
print "Too busy today"
sys.exit()

]]>
By: pguardiario https://blog.scraperwiki.com/2012/05/handling-exceptions-in-scrapers/#comment-788 Thu, 17 May 2012 03:49:54 +0000 http://blog.scraperwiki.com/?p=758216842#comment-788 Do yourself a favor and switch to Ruby, you won’t ever look back:

#** date parsing **
require ‘chronic’
puts Chronic::parse ‘2012-04-19’
#> 2012-04-19 12:00:00 +0800
puts Chronic::parse ‘April 19, 2012’
#> 2012-04-19 12:00:00 +0800

#** retrying **
require “retryable”
require “open-uri”

html = retryable( :tries => 3, :on => OpenURI::HTTPError, :sleep => 42) do
open(‘http://thomaslevine.com’).read
end

]]>