ScraperWiki News – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 We’ve migrated to EC2 https://blog.scraperwiki.com/2013/07/weve-migrated-to-ec2/ https://blog.scraperwiki.com/2013/07/weve-migrated-to-ec2/#comments Wed, 17 Jul 2013 15:36:35 +0000 http://blog.scraperwiki.com/?p=758218832 When we started work on the ScraperWiki beta, we decided to host it ‘in the cloud’ using Linode, a PaaS (Platform as a Service) provider. For the uninitiated, Linode allows people to host their own virtual Linux servers without having to worry about things like maintaining their own hardware.

On April 15th 2013, Linode were hacked via a ColdFusion zero-day exploit. The hackers were able to access some of Linode’s source code, one of their web servers, and notably, their customer database. In a blog post released the next day, they assured us that all the credit card details they store are encrypted.

Soon after, however, we noticed fraudulent purchases on the company credit card we had associated with our Linode account. It seems that we were not alone in this. We immediately cancelled the card and started to make plans to switch to another VPS provider.

These days, the one of the biggest names in PaaS is Amazon AWS. They’re the market leader and their ecosystem and SLA are more in line with the expectations of our corporate customers. Their API is also incredibly powerful. It’s no wonder that even prior to the Linode hack, we had investigated migrating the ScraperWiki beta platform to Amazon EC2.

Since mid June, all code and data on scraperwiki.com is stored on Amazon’s EC2 platform. Amongst other improvements, you should all have started to notice a significant increase in the speed of Scraperwiki tools.

We have a lot of confidence in the EC2 platform. Amazon have been in existence for a very long time and they have an excellent track record in the PaaS field, where they have curated a reputation for reliability and security. It is for these reasons why we feel confident in putting our user’s data on their servers.

The integrity of any data stored on our service is paramount. We are therefore greatly encouraged by AWS’ backup solution, EBS, which we are currently using. It has afforded us the ability to store our backups in two different geographical regions. Should a region ever go down, we are able to easily and quickly restore ScraperWiki, ensuring a minimum of disruption for our customers. 

Finally, we’re excited to announce that we’re using Canonical’s Juju to manage how we deploy our servers. We’re impressed with what we’ve seen of it so far. It seems to be a really powerful, feature rich product and it has saved us a lot of time. We’re looking forward to it allowing us to better scale our product and spend less time on migrations and deployments. It will also allow us to easily migrate our servers to any OpenStack provider, should we wish to.

The changes we’re making to our platform will result in ScraperWiki being faster and more resistant to disruption. As developers and data scientists ourselves, we understand the necessity for reliable tools and we’re really looking forward to you – the user – having an even better Scraperwiki experience.

]]>
https://blog.scraperwiki.com/2013/07/weve-migrated-to-ec2/feed/ 1 758218832