Lancaster Data Science – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Hi, I’m Pius…. https://blog.scraperwiki.com/2015/06/hi-im-pius/ Thu, 04 Jun 2015 09:48:10 +0000 https://blog.scraperwiki.com/?p=758222928 profile photo…and I’m the new thing at ScraperWiki. Yes you heard right, thing, not person or guy or anything human. Since I learnt that real-world entities could be modeled using programming language objects in order to answer questions or make inferences, one weird thing in my brain just interpreted it the other way – that real-world entities are the abstractions and programming language objects are the real thing. So I am an object* rather than a person – enough said.

I have always been intrigued by mathematics and computer programs and this informed my choice of Computer Science with Mathematics as my first degree. It’s amazing how far technology has come in simplifying processes and making life more interesting in general (now is not the time to talk about technology’s negative effects, if any!).

While my first real programming exposure was at a normal and acceptable pace, the same cannot be said about my introduction to databases and data management. My first database-related experience was when I joined a team consulting for a multi-national communications service provider. The application we managed rode on an Oracle database holding data for more than 63 million subscribers, with many tables having about half a billion rows of data! (ok, take away the exclamation mark, that is no longer ‘big’ these days).

It is from this exposure that I developed a real interest in data (data management, data analytics, databases, data manipulation), and became up for opportunities to hone my skills in this area. So when the chance to do a Master’s came along, I chose to do it in Data Science. And as ‘serendipity’ would have it, I ended up doing an internship at ScraperWiki as part of the course. Needless to say, the dataset I’m working on, namely the UK MOT data set,  is ‘big’, and I hope to make the best out of it.

With the size of data comes concerns about the speed of processing the data to derive insights, as well as memory and disk space concerns. ScraperWiki’s team of experts especially Peter have been really helpful in providing tips and tricks in this direction – and of course we’re just starting. Watch this space for developments and updates as this project progresses.

But don’t think I’m all about work and more work, and that’s why I like ScraperWiki’s ‘work hard, play hard’ approach. If you’d like to see more of my other side, do feel free to take me out for food or drink (not tea, as I have enough in the office!) I also enjoy swimming and cycling, although I usually get the chance to only walk or run.

Let me end this by saying the ScraperWiki environment is exactly the kind of work environment I wished for. You are given the independence to use whatever technology you deem fit to accomplish your tasks, and you are surrounded by experts and solution-oriented individuals ever willing to help so you just have this confidence that you can get anything and everything done!

*footnote: we at ScraperWiki do not consider Pius to be an object 🙂

]]>
758222928