Sophie Buckley – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 Hi , I’m Sophie: part II Wed, 03 Sep 2014 11:11:19 +0000 Hi again all! After completing a week of work experience in February, I was eager to return to ScraperWiki as an intern as soon as possible and, a few emails to Francis later, we had organised for me to do a months internship this summer, which has now unfortunately come to an end.

My first two weeks at ScraperWiki mostly consisted of writing reports for Aidan and researching possible competitors to ScraperWiki. I was also tasked with writing a blog post on The history of the Pivot table, which taught me that the history of spreadsheet software alone is surprisingly interesting!

I was then tasked by Ian to make a front-end GUI for the NewsReader Simple API that ScraperWiki created. The NewsReader project (entitled ‘NewsReader: Building structured event indexes of large volumes of financial and economic data for decision making’) started on the 1st of January 2013 and is going to finish on the 1st of January 2016. In creating the beginnings of this GUI, I used HTML, CSS and started using JavaScript for the first time, a language which I now really enjoy using. I was also introduced to jQuery and SPARQL.

In the few moments that I didn’t have anything to do, Peter was always on hand to give me an impromptu lesson on what he does on a daily basis here at ScraperWiki and the many concepts that he has to bring to the forefront of his mind on a regular basis. For example, he spent one afternoon enlightening me on the relationship between C and Python and some of the key differences and similarities between the two.

One of the great things about working at ScraperWiki is how the people here get work done whilst also making the work day feel as if you’re just being a big nerd with a bunch of friends. One afternoon during a rare lull in the office, I asked David (drj) a question about the IBM model b keyboard which he uses when he’s in the office. He explained that he had bought it some time in 1999, but that the keyboard itself was made in May 1996 – making it older than myself! However, his keyboard isn’t older then ScraperWiki’s other intern, Sean, who also interned during the summer of 2013 and who is still working there until September 2014.

The keyboard in question!

But being at ScraperWiki doesn’t always mean that your day solely consists of sitting at a computer. Whilst I was there, the Giant Spectacular was taking place in Liverpool, something which myself, Peter and Aine took advantage of during our lunch hour and after work.

Myself, Aine and Sean also had a look at what Young Rewired State and the iDEAS team were doing down at the SAE institute in the city centre, where Aine gave a presentation sharing business advice and where a ScraperWiki ‘alumni’ Zarino also gave a presentation on design.

Overall, my time at ScraperWiki has been interesting, helpful, educational and incredibly enjoyable. I can’t thank everybody enough for all that they’ve taught me, all the advice I’ve been given and all of the great places in Liverpool I’ve been introduced to. Thanks guys!

The history of Pivot table Wed, 16 Jul 2014 15:38:05 +0000 A pivot table is a spreadsheet feature that allows data tables to be rearranged in many ways for different views of the same data (pivot from one view to another).

Pivot Tables have become ubiquitous amongst power users of Excel, even being listed as a skill in CVs and a “desirable” in job specifications – but it was not always so. Pivot tables were invented in 1986 by the Father of Pivot Tables Pito Salas who was, at that time, working for Lotus Corp. It didn’t see the light of day until 5 years later when Lotus Improv was released for the NeXT platform in 1991.

Lotus Improv on NeXTSTEP.

The Lotus team found development on OS/2 was difficult. They switched to NeXTSTEP, resulting in ‘truckloads’ of flowers from Steve Jobs.

One of the revolutionary things about Lotus Improv was that users could utilise it to define and store sets of categories and then change views by dragging category names with their mouse. This functionality provided the core model for the pivot table feature that is commonly used today.

Pivot Tables can cause problems for data hubs, and people wanting programmatic access to data. This is because the data isn’t “flat” and database-like. For Tableau, it’s a big enough problem that they have specific advice on how to prepare your Excel files.

It’s sad that Improv has died, but it provided the childhood home for Pivot Tables, which now live on in Excel (since 1993, when Microsoft Excel 5 was launched), and Tableau.

]]> 1 758222037
Hi, I’m Sophie Tue, 25 Feb 2014 11:31:53 +0000 Hi, my name is Sophie Buckley, I’m an AS student studying Computing, Maths, Physics and Chemistry who’s interested in programming, games, art and reading. I’m also the latest in a line of privileged ScraperWiki interns and although my time here has been tragically short, the ScraperWiki team have done an amazing job in making every second of my experience interesting, informative and enjoyable. 🙂

One of the first things I learned about ScraperWiki is their use of the XP (Extreme Programming) methodology. On my first day I arrived just in time for ‘stand up’, where every morning the members of the team stand in a circle and share with the rest of us what they did the previous working day, what they intend to do that day and what they’re hoping to achieve. Doing this every morning was a bit nerve wracking because I haven’t got the best memory in the world, but Zarino (who’d been assigned the role of looking after me for the week) was always there to give me a helping hand and go first.

He also showed me how they use cards to track and estimate the logical steps that were needed to complete tasks, and how they investigated possible routes with the use of ‘spike’ cards. The time taken to complete a ‘spike’ isn’t estimated, it’s artificially time-boxed (usually to ½ or 1 day). The purpose of a spike is to explore a possible route and find out whether it’s the best one, without having to commit to it.

Zarino had set aside the week of my internship so that both of us could work on a new feature: An automatic way to get data out of ScraperWiki and into Tableau. We investigated the options on Monday, and concluded that we had two options: either generate “Tableau Data Extract” files for users to download, or create an “OData” endpoint that could serve up the live data. Both routes were completely unknown, so we wrote a spike card for each of them, to determine which one was best.

Monday and Tuesday consisted of trying to make a TDE file, and during this time I used Vim, SSH, Git, participated in pair programming, was introduced to CoffeeScript (which I really enjoyed using) and was also shown how to write unit tests for what we had written.

On Wednesday we decided to look further into the OData endpoint, and for the rest of the week I learned more about Atom/XML, wrote a Python ‘Flask’ app, and built a user interface with HTML and JavaScript.

One of the great things about ScraperWiki is the friendly nature of everybody who works here. Other members of the team were willing to help where they could and were more than happy to share with me what they were working on when I was curious. They were genuinely interested in me and my studies, and were kind enough to share with me their experiences, which meant that every tea break (tea being in abundance in the ScraperWiki office!) and lunch was never awkwardly spent with people you barely knew.

The guys at ScraperWiki like to do stuff outside the office too, and on Tuesday I was invited to see Her in FACT, which was definitely one of the highlights of the week! Other highlights included the awesome burgers at FSK and Ian’s ecstatic reaction when our spike magically piped his data in Tableau.

Overall, I’m so glad that I took those hesitant first steps in trying to become an intern at ScraperWiki by emailing Francis all those months ago; this has truly been an amazing week and I’m so grateful to everyone (especially Zarino!) for teaching me so much and putting up with me!

If you’d like to hear more from me and keep up with what I’m doing then you can check out my twitter page or you can email me. 🙂