phone – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 Finding contact details from websites Mon, 25 Nov 2013 17:00:28 +0000 Since August, I’ve been an intern at ScraperWiki. Unfortunately, that time’s shortly coming to an end. Over the past few months, I’ve learnt a huge amount. I’ve been surprised at just how fast-moving things are in a startup and I’ve been involved with several exciting projects. Before the internship ends, I thought it would be a good time to blog about some of them.

When I’ve visited company websites to get, for example, their email address or telephone number, it can be a frustrating experience if it’s not immediately clear how I can get to that information. If you’re carrying out a research project, and have a list of companies or organisations that you have to repeat this process for, it becomes tedious very quickly.

Enter the Contact Details Tool!

User interface of the Contact Details Tool

As its straightforward name suggests, the Contact Details Tool is designed to get contact details from websites automatically. The user interface is definitely still a prototype, but perfectly functional! (It’s a placeholder barebones Bootstrap frontend.) All we need to do is to type in website URLs and then click “Get Details”. In this case, let’s suppose that we want to find out about ScraperWiki itself (yes, very meta).

A few seconds (and some behind the scenes scraping) later, we get back a table of results:

Contacts extracted using ScraperWiki's Contact Details Tool

Addresses extracted using ScraperWiki's Contact Details Tool

Everything you need to say hello to us in just about every way possible: by Twitter, email, telephone or maybe even, if you’re in Liverpool, in person!

With a couple of clicks, these results can be downloaded directly as a spreadsheet for offline use. You can do quick searches or even more complicated SQL database queries on the results directly using ScraperWiki’s platform to carry out any further filtering. For instance, we might only be interested in the Twitter accounts we’ve retrieved. (If you’re then interested in more about particular Twitter users, you can search for their tweets or find their Twitter followers using ScraperWiki’s other tools.)

Where the Contact Details Tool really starts saving you time is when you need information from several websites. Instead of sitting there having to scrape websites by hand or tediously conducting lots of internet searches, you can just enter the URLs of interest and let the tool do the work for you.

For a short project that myself and Sean, another intern (along with adult supervision from Zarino), put together in a few weeks, it’s been great to see how a prototype product goes from concept to reality and, moreover, that it’s useful.

As you’ve seen, it’s easy to contact us! So, if you’re interested in what the Contact Details Tool can do for you, please send us an email and we’ll get back in touch.

Up in the Air with ScraperWiki and Tropo Fri, 23 Dec 2011 11:55:24 +0000 We came across this blog post a few days ago from these cool guys at Tropo in Florida, and thought you’d be interested in how they’ve used ScraperWiki. Tropo is a simple API for adding voice and other goodies to your apps and, as Mark Headd explains, it can be really powerful when combined with data ScraperWiki…

ScraperWiki is a powerful cloud-based service that lets you scrape data from online documents and websites.

When you write a scraper – a script to pull information from a web resource and then parse out the bits you want – it will execute inside the ScraperWiki environment.

SMS flight information

You can store the data that is scraped inside a data store and then access the data from outside the ScraperWiki environment using their API. Scrapers can be written in one of several different languages – Ruby, PHP and Python.

ScraperWiki and Tropo operate in a very similar way. The Tropo scripting environment allows you to write scripts in one of several different languages, including Ruby, PHP and Python (Groovy and JavaScript are also supported).

Your script executes inside the Tropo environment, which means you can make direct connections to external resources – like the ScraperWiki API – from within your executing script. There is no need for extra HTTP overhead, and the additional step of posting to a back end server to connect to other APIs or resources.

In the following screencast, I demonstrate how to use Tropo and ScraperWiki to quickly and easily build an airport information system for the Philadelphia International Airport.

All of the code for this example can be found here. If you’d like to view the actual scraper I wrote on ScraperWiki, you can find it here.

This is still a work in progress – I’d like to run this script multiple times per day (ideally, maybe once an hour) to get updates to flight information and ensure that the app has the most up to date flight status. The voice dialog could also use a little tweaking, and I’d like to offer the option of repeating the information.

But even with these refinements aside, it is evident how combining these two powerful cloud resources can generate a pretty useful application in a very short time.

Tropo and ScraperWiki are a powerful combination. Happy flying!