Is scraping legal?
Lots of people, when they hear about ScraperWiki, ask “is scraping legal? how can you build a business off that?”. Usually to follow up by saying “we do it in our company, but we would never tell anyone”.
This is strange to us, as we have come from a world of good scraping. Taking Government data, and making it easier for people to use for things that benefit all of society. We’re in favour of that kind of scraping.
It’s obviously a spectrum. At the other extreme, the most evil scraping would be to steal content that somebody else sells, and then to republish it at harm to their business. We’re against that kind of scraping.
It’s not scraping itself which is good or bad, or legal or illegal, but the circumstances in which you’re doing it.
We’ve written up in full our policy about the legality, it’s in our FAQ under ‘What’s your policy on what’s legal to scrape?‘. Lots of details about robots.txt and take down notices, and what is our and your legal responsibility.
Finally, ScraperWiki isn’t just about scraping.
We’re a data hub, and you need to get data into a data hub. As well as scraping, lots of people make API calls to do that on ScraperWiki, or download their own files from their own servers.
This is much more profound than it sounds – when you are using data for a new purpose, even if it is already structured, you still need to get it and convert it to your new needs. How you do that is a detail that depends on the circumstances.
The difference between parsing HTML web pages, and using a JSON REST API is surprisingly small. As an example, Thomas scraped EventBrite even though it has an API (see the post at the end of that thread by Ryan who works at EventBrite!), because it was easier at the time for him.
What matters is getting the data, and converting it into a form where it can do something useful for the world. And doing that legally. Whether you’re using Nokogiri or Nestful.
Reblogged this on Media law and ethics and commented:
ScraperWiki is a Liverpool-based data tools service and community I did some work for in 2010/11 and a winner of the Knight News Challenge 2011. In this post, its CEO Francis Irving looks at the legal issues around screen scraping.
Very interesting post.
In my opinion, if data is publicly viewable / indexed by search engines, expect it to be scraped. There are ways to prevent scraping from happening, and if one really wants scraping of data to be stopped, they should implement various methods to within their website/service.
Although I understand the concern I find this question absurd considering the leading companies in technology make regular use of scraping. Google is nothing more than a very large scraping service that scours the internet for keywords. Need I say more?
Is viewing a website/site’s html in a browser illegal? Here’s our take on legality of web crawling/scraping – http://blog.promptcloud.com/2013/01/is-crawling-legal.html