autobiography – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 Hi, I’m Leonisha Tue, 07 Jul 2015 14:19:17 +0000 IMG_3132My name is Leonisha Barley, and I am the latest addition to  the people fortunate enough to have an internship opportunity at Scraperwiki. I just finished my 2nd year at The University of Manchester studying for a BA(Hons) degree in Sociology and Criminology and I am exciting about developing my skills further this summer.

There is a quote that ‘Every Accomplishment begins with the decision to try ‘ which I agree with and which therefore led me to my internship at Scraperwiki. I say this because I do not actually have an academic background in social statistics, however during the second year of my course I wanted to push myself and try to conquer (or at least reduce) my fear and dislike of mathematics and IT, therefore I choose a module from a different degree programme called Social Statistics.

The module I completed is called The Survey Method in Social Research. Whilst completing this course I learnt about designing surveys such as different types of sampling and what types of questions to ask.I also learnt how to use data from an existing survey to publish frequency tables and graphs, as well as cross tabulations and basic recoding using the software package of SPSS. I really enjoyed taking data from the Crime Survey for England and Wales and using SPSS to do cross tabulations and produce graphs that showed what effect age,sex or level of education had on  agreement about whether gay or lesbian couples should be allowed to get married.

Then came along an opportunity to do an internship organised by Q-step,who aim at helping students develop their quantitiative skills within a real life work environment. After reading about Scraperwiki I was really impressed  by and interested in how it makes data such as government data more accessible and easier to understand and I was lucky enough to secure an internship here. The fact that it is located in my hometown of Liverpool made it even better.

I have only been here for a few hours so far but I have been welcomed by a friendly group of programming experts , who have put my nerves at rest. The team is really quite small as  everyone fits into one room making it comfortable to communicate and I have been introduced to a chat room used by the organisation called Slack which makes asking questions  to a specific person easy without disturbing others.

During my 8 week internship I hope to be looking at GP Prescribing data and searching for trends such as whether there is seasonal periodicity in drug prescription data and whether there are demographic variations in GP prescriptions. I also look forward to developing my skills in SPSS , Excel and learning how to use other programming tools such as R and Python.

Scraperwiki is going to be a completely new experience for me and I look forward to tapping into the quantitative part of my brain with the help of the friendly experts working here.

]]> 1 758223333
Hi, I’m Sophie Tue, 25 Feb 2014 11:31:53 +0000 Hi, my name is Sophie Buckley, I’m an AS student studying Computing, Maths, Physics and Chemistry who’s interested in programming, games, art and reading. I’m also the latest in a line of privileged ScraperWiki interns and although my time here has been tragically short, the ScraperWiki team have done an amazing job in making every second of my experience interesting, informative and enjoyable. 🙂

One of the first things I learned about ScraperWiki is their use of the XP (Extreme Programming) methodology. On my first day I arrived just in time for ‘stand up’, where every morning the members of the team stand in a circle and share with the rest of us what they did the previous working day, what they intend to do that day and what they’re hoping to achieve. Doing this every morning was a bit nerve wracking because I haven’t got the best memory in the world, but Zarino (who’d been assigned the role of looking after me for the week) was always there to give me a helping hand and go first.

He also showed me how they use cards to track and estimate the logical steps that were needed to complete tasks, and how they investigated possible routes with the use of ‘spike’ cards. The time taken to complete a ‘spike’ isn’t estimated, it’s artificially time-boxed (usually to ½ or 1 day). The purpose of a spike is to explore a possible route and find out whether it’s the best one, without having to commit to it.

Zarino had set aside the week of my internship so that both of us could work on a new feature: An automatic way to get data out of ScraperWiki and into Tableau. We investigated the options on Monday, and concluded that we had two options: either generate “Tableau Data Extract” files for users to download, or create an “OData” endpoint that could serve up the live data. Both routes were completely unknown, so we wrote a spike card for each of them, to determine which one was best.

Monday and Tuesday consisted of trying to make a TDE file, and during this time I used Vim, SSH, Git, participated in pair programming, was introduced to CoffeeScript (which I really enjoyed using) and was also shown how to write unit tests for what we had written.

On Wednesday we decided to look further into the OData endpoint, and for the rest of the week I learned more about Atom/XML, wrote a Python ‘Flask’ app, and built a user interface with HTML and JavaScript.

One of the great things about ScraperWiki is the friendly nature of everybody who works here. Other members of the team were willing to help where they could and were more than happy to share with me what they were working on when I was curious. They were genuinely interested in me and my studies, and were kind enough to share with me their experiences, which meant that every tea break (tea being in abundance in the ScraperWiki office!) and lunch was never awkwardly spent with people you barely knew.

The guys at ScraperWiki like to do stuff outside the office too, and on Tuesday I was invited to see Her in FACT, which was definitely one of the highlights of the week! Other highlights included the awesome burgers at FSK and Ian’s ecstatic reaction when our spike magically piped his data in Tableau.

Overall, I’m so glad that I took those hesitant first steps in trying to become an intern at ScraperWiki by emailing Francis all those months ago; this has truly been an amazing week and I’m so grateful to everyone (especially Zarino!) for teaching me so much and putting up with me!

If you’d like to hear more from me and keep up with what I’m doing then you can check out my twitter page or you can email me. 🙂


Hi, I’m Steve Mon, 02 Sep 2013 16:00:41 +0000 319d3fcHi, I’m Steve and I’m the most recent addition to ScraperWiki’s burgeoning intern ranks. So, how exactly did I end up here?

Looking at ScraperWiki’s team page, you can see that scientists working here is a common theme. I’m no different in that regard. Until recently, I was working as a university scientific researcher (looking at new biomedical materials).

As much as I’ve enjoyed that work, I began to wonder what other problems I could tackle with my scientific training. I’ve always had a big interest in technology. And, thanks to the advent of free online courses from the likes of edX and Coursera, I’ve recently become more involved with programming. When I heard about data science a few months ago, it seemed like it might be an ideal career for me, using skills from both of these fields.

Having written a web scraper myself to help in my job searching, I had some idea of what that involves. I’d also previously seen ScraperWiki’s site while reading about scrapers. When I heard that ScraperWiki were advertising for a data science intern, I knew it would be a great chance to gain a greater insight into what this work entails.

Since I didn’t have any prior notions of what working in a technology company or a startup involves, I’m pleased that it’s been so enjoyable. From an outsider coming in, there are many positive aspects of how the company works:

ScraperWiki is small (but perfectly formed): the fact that everyone is based in the same office makes it easy to ask a question directly to the most relevant person. Even if people are working remotely, they are in contact via the company’s Internet Relay Chat channel or through Google Hangouts. This also means that I’m seeing both sides of the company: both what the Data Services team do and the ongoing work to constantly improve the platform.

Help’s on hand: having knowledgeable and experienced people around in the office is a huge benefit when I encounter a technical problem, even if it’s not related to ScraperWiki. When I’m struggling to find a solution myself, I can always ask and get a quick response.

There’s lots of collaboration: pair programming is a great way to pick up new skills. Rather than struggling to get started with, say, some new module or approach, you can see someone else start working with it and pick up tips to push you past the initial inertia of trying something new.

And there’s independence too: as well as working with others on what they are doing and trying to help where I can, I’ve also been given some small projects of my own. Even in the short time I’m here, I should be able to construct some useful tools that might be made publically available via ScraperWiki’s platform.

(Oh, I shouldn’t miss out refreshments: as Matthew, another intern, recently pointed out, lunch often involves a fun outing to one of Liverpool’s many fine eateries. As well as that, tea is a regular office staple.)

It’s definitely been an interesting couple of weeks for me here. you can usually see what I’m up to via Twitter or my own blog. Over the next few weeks, I’m looking forward to writing here again about what I’ve been working on.

Hello, I’m Ed Mon, 19 Aug 2013 16:36:02 +0000 edcawLast week, Pete wrote his welcome post announcing he was the new guy. Not strictly true as I am the real new guy! Actually, Pete and I started on the same day and there are two other new starters you will be hearing about soon, so we are both old news!

Unlike some of the big developer/data science brains we have at ScraperWiki, I have a different role – making sure what we build adds enough value that our customers are willing to pay for some of it!

I started out working life in the RAF as a trainee pilot but a hockey stick to the eye cut that short. It did allow me to start down a career with a more technical bent by completing a Masters in Information Systems and Technology at City University.

From there I was a Sales Engineer for different software vendors and moved into product management. I have worked for different sized companies from the very large (IBM) to the very small (my own startup building tools for product managers) before arriving at ScraperWiki.

Products are created to solve problems for people/companies and therefore make their lives easier, so my first task is to workout what problems our customers and others interested in data experience with ScraperWiki and the tools have today. ScraperWiki can then evolve to become the tool of choice for tomorrow 🙂

Let me know if you have any feedback or ideas – I am really interested to hear your thoughts.

Hi, I’m Peter Mon, 12 Aug 2013 16:06:42 +0000 avatar.. and I’m the new guy. I’ve just completed my PhD in particle physics on the ATLAS experiment at CERN. I loved the physics (because “searching for extra dimensions of space” sounds so cool!) but after 8 years I decided I wanted to do something different. At heart, I’m a programmer and a hacker who is fascinated by computers and the immense power they put in your hands. We live in an age where a single person can sift through billions of records in an instant. Even today I repeatedly find myself saying “we live in the future, Man”. Yet we take Google (or DuckDuckGo) for granted.

On my travels I have spent a lot of time with the lower levels of the machine, writing an optimized data format for ATLAS’ huge amount of data. I also collaborated with friends on a tool for visualizing the nature of our proton collisions. My default state is to be immersed in code and data.

I was searching for a new job to start my future career outside of academia and there was little to be found. Outside of London or Silicon Valley, there seemed to be very few companies in the world — let alone in my locality — which understood who I was and what made me tick. It is very fortuitous that I’ve found myself working with this band of awesome people on stuff we care about at ScraperWiki.

In the short term, the focus of my efforts will be building tools for ScraperWiki’s new platform and enhancing the platform itself to make it work faster so that we can provide deeper value to our customers. In the medium term I’m hoping to introduce Docker to our toolset and eventually expose it to our users, so that you can trivially run your tools and code anywhere!

Think I might be able to help you? Shoot me a mail.

My First Month As an Intern At ScraperWiki Fri, 09 Aug 2013 16:37:43 +0000 The role of an intern is often a lowly one. Intern duties usually consist of the provision of caffeinated beverages, screeching ‘can I take a message?’ into phones and the occasional promenade to the photocopier and back again.

ScraperWiki is nothing like that. Since starting in late May, I’ve taken on a number of roles within the organization and learned how a modern-day, Silicon Valley style startup works.

How ScraperWiki Works

It’s not uncommon for computer science students to be taught some project management methodologies at university. For the most part though, they’re horribly antiquated.

ScraperWiki is an XP/Scrum/Agile shop. Without a doubt, this is something that is definitely not taught at university!

Each day starts off with a ‘stand up’. Each member of the ScraperWiki team says what they intend to accomplish in the day. It’s also a great opportunity to see if one one of your colleagues is working on something on which you’d like to collaborate.

Collaboration is key at ScraperWiki. From the start of my internship, I was pair programming with the many other programmers who are on staff. For those of you who haven’t heard of it before, pair programming is where two people use one computer to work on a project. It’s nothing like this:

This is awesome, because it’s a totally non-passive way of learning. If you’re driving, you’re getting first-hand experience of writing code. If you’re navigating, then you get the chance to mentally structure the code that you’re working on.

In addition to this, every two weeks we have a retrospective where we look at how the previous fortnight went and where the next steps we intend to take as an organization. We write a bunch of sticky-notes where list what was good and what was bad about the previous week. These are then put into logical groups. We then vote for the group of stickies which best represent where we feel that we should focus our efforts as an organization.

What We Work On

Perhaps the most compelling argument for someone to do an internship at ScraperWiki is that you can never really predict what you’re going to do from one day to the next. You might be working on an interesting data science project with Dragon or Paul, doing front end development with Zarino or making the platform even more robust with Chris. As a fledgling programmer, you really get an opportunity to discover what you enjoy.

During my time working at ScraperWiki, I’ve had the opportunity to learn about some new, up and coming web technologies, including CoffeeScript, Express and Backbone.js.  These are all pretty fun to work with.

It’s not all work and no play too. Most days we go out to a local restaurant and get some food and eat lunch together. Usually it’s some variety of Middle-Eastern, American or Chinese. It’s also usually pretty delicious!


All in all, ScraperWiki is a pretty awesome place to intern. I’ve learned so much in just a few weeks, and I’ll be sad to leave everyone when I go back to my second year of university in October.

Have you interned anywhere before? What was it like? Let me know in the comments below!

]]> 1 758219215
Hi, I’m Matthew Hughes Fri, 07 Jun 2013 16:40:04 +0000 Hello! My name is Matthew Hughes, and I am Scraperwiki’s newest intern, where I will be working predominantly on product and tools alongside the likes of Chris Blower and David Jones.

Currently, I’m reading Computing at Liverpool Hope University, where I am about to enter my second year of study. When I’m not hammering out code or squinting at an error log, you’ll likely find me with a cup of coffee in my hand, curled up with a John Green novel or watching a Woody Allen film.

In this brief introductory blog post, I’ve been tasked with telling you about why I wanted to work at ScraperWiki. Truth be told, there are a great many reasons. It’s an awesome company to work for and is staffed with some of the most amazingly smart people I’ve ever had the fortune to come across. The product itself matters to a great many people, and has been lovingly crafted by people who are amongst the best in their field. There is also a culture within the company that fosters a great deal of creativity and respects the creative process. The coffee is pretty great too.

From the perspective of an internship, I’ve learned a great deal. In just five days, I’ve gotten a better understanding of how Express and Backbone work. They’ve also achieved the impossible and pried me away from my text-editor of choice and turned me into a proud Vim user. This is a job where I’m constantly challenged and learning, and I’m incredibly grateful for the opportunity I’ve been provided.

I don’t use Twitter, but you can read my blog here or contact me on Facebook here.

]]> 1 758218823
Hi, I’m Paul Thu, 18 Apr 2013 11:05:37 +0000 Hi!paul furley

I’m the latest member of ScraperWiki, joining the Data Science team this week.

Data Science is a fascinating new direction for me, being “officially” an Electronic Engineer. I’ve spent the last couple of years in a large company hammering out fast C++ and trying (unsuccessfully) to convert everyone to Python. But what really excites me about Data Science is the application of software to discover meaning in data. With the amount of data we’re generating every minute, I feel there must be countless opportunities to understand and exploit the information contained within.

I’ve written some scrapers in the past for trying to discover investment opportunities. The first compared sales and rental prices from RightMove to identify good buy-to-let areas and more recently I’ve being analysing dividend payments of companies listed on the London Stock Exchange. Once these are a bit more polished and migrated to the new ScraperWiki platform, I’ll post an update and hopefully others will find the data useful.

First impressions of ScraperWiki are great, I’m surrounded by talented and enthusiastic people – it’s hard to ask for more than that.


Here’s my Twitter and blog.

]]> 2 758218443
I am Ian, Ian I am* Tue, 05 Mar 2013 11:56:31 +0000 ProfilePic
I have an 8 year itch: I spent the first 8 years of my career as an academic ending up a lecturer in physics at UMIST. Then I was a research scientist at a large “fast moving consumer goods” company for another 8 years. On Monday I started work at ScraperWiki as Senior Data Scientist, so called for my great age compared to the many other youthful members of the company.

To my mind there’s been a lot of data science in the scientific research I’ve done: extracting data from odd instrument file formats, mulling over which bits of data are important (and which aren’t), building physical and statistical models, and visualising the resulting data to gain insights and convince others.

More specifically I’ve been writing scrapers for a while. My first scraper was to split out the the parts of the table of contents for the journal, Physical Review E, in which I was – the full table being rather long. At my last employer, I worked out how to scrape PDF files from from the internal reports database-this went well until someone spotted what I was doing and my scraper started receiving an HTML message that said “Please don’t do that”! Recently, I’ve been writing scrapers, and making visualisations on my blog, so when I saw the advert for data scientist positions at ScraperWiki it seemed like too good an opportunity to miss.

I’ve never worked in an organisation whose members could comfortably fit into a modest sized lift, and I’m looking forward to learning new stuff in a completely different environment!

If there’s some data you’d like to extract, analyse, visualise then drop us a line.

I’m also available on twitter, linkedin and my own blog.

*I’m reading a lot of Dr Seuss at the moment!

How do? I’m Zach. Tue, 19 Feb 2013 16:29:39 +0000 Zach Beavuais

me, looking into the far distance

So, a few years ago, I tended to spend my working time explaining emerging tech ideas (generally around Linked Data, Open Data, and APIs) for a UK-based Semantic Web company called Talis. I helped people tell stories, edited an industry magazine, blogged, podcasted and hosted events.

Over time, I found the role evolving naturally into looking after the people involved in our tech, and my business cards changed from “evangelist” at a software company to “community manager,” at a data startup called Kasabi (a data marketplace incubated by Talis).

Yesterday, I joined the team here at ScraperWiki, and will be helping out with storytelling, events, blogging and no doubt many other things to help turn needs into ideas. They’ve hired me to look after the people using their code, and to find ways of meeting new communities of data scientists and other data-centric specialists.

It looks like the people here are keen to grow and learn, and that’s something I find very attractive in a company. I’m excited to be here, and look forward to meeting folk I haven’t yet (hi), and to catch up with old friends involved in data and its associated emerging buzzwords.

I’m also keen to learn the backstory of ScraperWiki from people in-house, and those who read up on it here. So, please, please drop me a line, and tell me what I should know while I’m still new! You can find me other places online, where I sometimes tweet, take snapshots, and blog about other things like coffee.