Case Studies – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Learning to code bots at ONS https://blog.scraperwiki.com/2016/07/learning-to-code-bots-at-ons/ https://blog.scraperwiki.com/2016/07/learning-to-code-bots-at-ons/#respond Tue, 12 Jul 2016 14:44:09 +0000 https://blog.scraperwiki.com/?p=758224407 The Office for National Statistics releases over 600 national statistics every year. They came to ScraperWiki to help improve their backend processing, so they could build a more usable web interface for people to download data.

We created an on-premises environment where their numerate staff learnt a minimal amount of coding, and now create short scripts to transform data they didn’t have the resource to previously.

Matthew Jukes, Head of Product, Office for National Statistics said:

Who knew a little Python app spitting out CSVs could make people so happy but thank you team @ScraperWiki – great stuff 🙂

Spreadsheets

The data the team were processing was in spreadsheets which look like this:

(Not shown: splits by age range, seasonal adjustments, or the whole pile of other similar spreadsheets)

They needed to turn them into a standard CSV format used internally at the ONS. Each spreadsheet could have 10,000s of observations in it, turning into an output file with that many database rows.

We created an on-premises ScraperWiki environment for the ONS, using standard text editors and Python. Each type of spreadsheet needs one short recipe writing, which is just a few lines of Python expressing the relative relationship of headings, sub-headings and observations.

The environment included a coloured debugger for identifying that headings and cells were correctly matched:

Databaker highlighter

Most of the integration work involved making it easy to code scripts which could transform the data ONS had – coping with specific ways numbers are written, and outputting the correct CSV file format.

Training

As part of the deployment, we gave a week of hands on script development training for 3 members of staff.  Numerate people learning some coding is, we think, vital to improving how organisations use data.

Before the training, Darren Barnes (Open Datasets Manager) said learning to code felt like crossing a “massive chasm”.

EOT team

Within a couple of hours he was write scripts that were then used operationally.

He said it was much easier to write code than to use the data applications with complex graphical interface he often has to work with.

Conclusion

Using graphical ETL software, it took two weeks for an expert consultant to make the converter for one type of spreadsheet. With staff in the business coding Python in ScraperWiki’s easy environment themselves, it takes a couple of hours.

This has saved the ONS time for each type of spreadsheet for the initial conversion. When new statistics come out in later months, those spreadsheets can easily be converted again, with any problems fixed quickly and locally, saving even more.

The ONS have made over 40 converters so far. ScraperWiki has been transformational.

]]>
https://blog.scraperwiki.com/2016/07/learning-to-code-bots-at-ons/feed/ 0 758224407
Running a code club at DCLG https://blog.scraperwiki.com/2016/06/running-a-code-club-at-dclg/ Wed, 08 Jun 2016 15:49:39 +0000 https://blog.scraperwiki.com/?p=758224360 The Department for Communities and Local Government (DCLG) has to track activity across more than 500 local authorities and countless other agencies.

They needed a better way to handle this diversity and complexity of data, so decided to use ScraperWiki to run a club to train staff to code.

Martin Waudby, data specialist, said:

I didn’t want us to just do theory in the classroom. I came up with the idea of having teams of 4 or 5 participants, each tasked to solve a challenge based on a real business problem that we’re looking to solve.

The business problems being tackled were approved by Deputy Directors.

Phase one

The first club they ran had 3 teams, and lasted for two months so participants could continue to do their day jobs whilst  finding the time to learn new skills. They were numerate people – statisticians and economists (just as in our similar project at the ONS). During that period, DCLG held support workshops, and “show and tell” sessions between teams to share how they solved problems.

As ever with data projects, lots of the work involved researching sources of data and their quality. The teams made data gathering and cleaning bots in Python using ScraperWiki’s “Code in Browser” product – an easy way to get going, without anything to install and without worrying about where to store data, or how to download it in different formats.

Here’s what two of the teams got up to…

Team Anaconda

The goal of Team Anaconda (they were all named after snakes, to keep the Python theme!) was to gather data from Local Authority (and other) sites to determine intentions relating to Council Tax levels. The business aim is to spot trends and patterns, and to pick up early on rises which don’t comply with the law.

Local news stories often talk about proposed council tax changes.

Council tax change article

The team in the end set up a Google alert for search terms around council tax changes, and imported that into a spreadsheet. They then downloaded the content of those pages, creating an SQL table with a unqiue key for each article talking about changes to council tax:

Screenshot from 2016-04-26 09-55-53

They used regular expressions to find the phrases describing a percentage increase / decrease in Council Tax.

The team liked using ScraperWiki – it was easy to collaborate on scrapers there, and easier to get into SQL.

The next steps will be to restructure the data to be more useful to the end user, and improve content analysis, for example by extracting local authority names from articles.

Team Boa Constrictor

It’s Government policy to double the number of self-built homes by 2020, so this team was working on parsing sites to collect baseline evidence of the number being built.

They looked at various sources – quarterly VAT receipts, forums, architecture websites, sites which list plots of land for sale, planning application data, industry bodies…

The team wrote code to get data from PlotBrowser, a site which lists self-build land for sale.

Plot Browser scraper

And analysed that data using R.

PlotBrowser data graph

They made scripts to get planning application data, for example in Hounslow. Although they found the data they could easily get from within the applications wasn’t enough for what they needed.

planning

They liked ScraperWiki, especially once they understood the basics of Python.

The next step will be to automate regular data gathering from PlotBrowser, and count when plots are removed from sale.

Phase two

At the end of the competition, teams presented what they’d learnt and done to Deputy Directors. Team Boa Constrictor won!

The teams developed a better understanding of the data available, and the level of effort needed to use it. There are clear next steps to take the projects onwards.

DCLG found the code club so useful, they are running another more ambitious one. They’re going to have 7 teams, extending their ScraperWiki license so everyone can use it. A key goal of this second phase is to really explore the data that has been gathered.

We’ve found at ScraperWiki that a small amount of coding skills, learnt by numerate staff, goes a long way.

As Stephen Aldridge, Director of the Analysis and Data Directorate, says:

ScraperWiki added immense value, and was a fantastic way for team members to learn. The code club built skills at automation and a deeper understanding of data quality and value. The projects all helped us make progress at real data challenges that are important to the department.

]]>
758224360
Highlights of 3 years of making an AI newsreader https://blog.scraperwiki.com/2016/04/highlights-of-3-years-of-making-an-ai-newsreader/ Wed, 06 Apr 2016 09:15:01 +0000 https://blog.scraperwiki.com/?p=758224299 We’ve spent three years working on a research and commercialisation project making natural language processing software to reconstruct chains of events from news stories, representing them as linked data.

If you haven’t heard of Newsreader before, our one year in blog post is a good place to start.

We recently had our final meeting in Luxembourg. Some highlights from the three years:

Papers: The academic partners have produced a barrage of papers. This constant, iterative improvement to knowledge of Natural Language Processing techniques is a key thing that comes out of research projects like this.

Open data: As an open data fan, I like some of the new components which will be of permanent use to anyone in NLP which came out of the project. For example, the MEANTIME corpus of news articles in multiple languages annotated with their events, for use in training.

Meantime example

Open source: Likewise, as an open source fan, Newsreader’s policy was to produce open source software, and it made lots. As an example, the PIKES Knowledge Extraction Suite applies NLP tools to a text.

PIKES overview

Exploitation: Is via integration into existing commercial products. All three commercial consortium members are working on this in some way (often confidentially for now). Originally at ScraperWiki, we thought it might plug into our Code in Browser product. Now our attention is more around using PDFTables with additional natural language processing.

Simple API: A key part of our work was developing the Simple API, making the underlying SPARQL database of news events acccessible to hackers via a simpler REST API. This was vital for the Hackdays, and making the technology more accessible.

Hackdays: We ran several across the course of the project (example). They were great fun, working on World Cup and automotive related news article datasets, drawing a range of people from students to businesses.

Thanks Newsreader for a great project!

Together, we improved the quality of news data extraction, began the process of assembling that into events, and made steps towards commercialisation.

]]>
758224299
6 lessons from sharing humanitarian data https://blog.scraperwiki.com/2015/10/6-lessons-from-sharing-humanitarian-data/ Tue, 13 Oct 2015 12:01:19 +0000 https://blog.scraperwiki.com/?p=758222955 This post is a write-up of the talk I gave at Strata London in May 2015 called “Sharing humanitarian data at the United Nations“. You can find the slides on that page.

The Humanitarian Data Exchange (HDX) is an unusual data hub. It’s made by the UN, and is successfully used by agencies, NGOs, companies, Governments and academics to share data.

They’re doing this during crises such as the Ebola epidemic and the Nepal earthquakes, and every day to build up information in between crises.

There are lots of data hubs which are used by one organisation to publish data, far fewer which are used by lots of organisations to share data. The HDX project did a bunch of things right. What were they?

Here are six lessons…

1) Do good design

HDX started with user needs research. This was expensive, and was immediately worth it because it stopped a large part of the project which wasn’t needed.

The user needs led to design work which has made the website seem simple and beautiful – particularly unusual for something from a large bureaucracy like the UN.

HDX front page

2) Build on existing software

When making a hub for sharing data, there’s no need to make something from scratch. Open Knowledge’s CKAN software is open source, this stuff is a commodity. HDX has developers who modify and improve it for the specific needs of humanitarian data.

ckan

3) Use experts

HDX is a great international team – the leader is in New York, most of the developers are in Romania, there’s a data lab in Nairobi. Crucially, they bring in specific outside expertise: frog design do the user research and design work; ScraperWiki, experts in data collaboration, provide operational management.

ScraperWiki logo

4) Measure the right things

HDX’s metrics are about both sides of its two sided network. Are users who visit the site actually finding and downloading data they want? Are new organisations joining to share data? They’re avoiding “vanity metrics”, taking inspiration from tech startup concepts like “pirate metrics“.

HDX metrics

 5) Add features specific to your community

There are endless features you can add to data hubs – most add no value, and end up a cost to maintain. HDX add specific things valuable to its community.

For example, much humanitarian data is in “shape files”, a standard for geographical information. HDX automatically renders a beautiful map of these – essential for users who don’t have ArcGIS, and a good check for those that do.

Syrian border crossing

6) Trust in the data

The early user research showed that trust in the data was vital. For this reason, anyone can’t just come along and add data to it. New organisations have to apply – proving either that they’re known in humanitarian circles, or have quality data to share. Applications are checked by hand. It’s important to get this kind of balance right – being too ideologically open or closed doesn’t work.

Apply HDX

Conclusion

The detail of how a data sharing project is run really matters. Most data in organisations gets lost, left in spreadsheets on dying file shares. We hope more businesses and Governments will build a good culture of sharing data in their industries, just as HDX is building one for humanitarian data.

]]>
758222955
GOV.UK – Contracts Finder… ÂŁ1billion worth of carrots! https://blog.scraperwiki.com/2015/10/gov-uk-contracts-finder-1billion-worth-of-carrots/ Wed, 07 Oct 2015 09:23:14 +0000 https://blog.scraperwiki.com/?p=758224098 Photo by Fovea Centralis / CC BY-ND-2.0

Carrots by Fovea Centralis /CC BY-ND-2.0

This post is about the government Contracts Finder website. This site has been created with a view to helping SMEs win government business by providing a “one-stop-shop” for public sector contracts.

Government has been doing some great work transitioning their departments to GOV.UK and giving a range of online services a makeover. We’ve been involved in this work, in the first instance scraping the departmental content for GOV.UK, then making some performance dashboards for content managers on the Performance Platform.

More recently we’ve scraped the content for databases such as the Air Accident Investigation Board, and made the new Civil Service People Survey website.

As well as this we have an interest in other re-worked government services such as the Charity Commission website, data.gov.uk and the new Companies House website.

Getting back to Contracts Finder – there’s an archive site which lists opportunities posted before 26th February 2015 and a live site, the new Contracts Finder website, which has live opportunities after 26th February 2015. Central government departments and their agencies were required to advertise contracts over ÂŁ10k on the old Contracts Finder website. In addition the wider public sector were able to advertise contracts on there too, but weren’t required (although on the new Contracts Finder they are required to on contracts over ÂŁ25k).

The confusingly named Official Journal of the European Union (OJEU) also publishes calls to tender. These are required by EU law over a certain threshold value depending on the area of business in which they are placed. Details of these thresholds can be found here. The Contracts Finder also lists opportunities over these thresholds but it is not clear that this must be the case.

The interface of the new Contracts Finder website is OK, but there is far more flexibility to probe the data if you scrape it from the website. For the archive data this is more a case of downloading the CSV files provided although it is worth scraping the detail pages indicated from the downloads in order to get additional information such as the supplier to which work was awarded.

The headline data published in an opportunity is the title and description, the name of the customer with contact details, the industry – a categorisation of the requirements, a contract value, and a closing date for applications.

We run the scrapers on our Platform which makes it easy to download the data as an Excel spreadsheet or CSV, which we can then load into Tableau for analysis. Tableau allows us to make nice visualisations of the data, and to carry out our own ad hoc queries of the data free from the constraints of the source website. There are about 15,000 entries on the new site, and about 40,000 in the archive.

The initial interest for us was just getting an overview of the data, how many contracts were available in what price range?  As an example we looked at proposals in the range £10k-£250k in the Computer and Related Services sector. The chart below shows the number of opportunities in this range grouped by customer.

customers-10k-250k

These opportunities are actually all closed. How long were opportunities open for? We can see in the histogram below. Most adverts are open for 2-4 weeks, however a significant number have closing dates before their publication dates – it’s not clear why.

new-site-ad-duration

There is always fun to be found in a dataset of this size. For example, we learn that Shrewsbury Council would have appeared to have tendered for up to ÂŁ1bn worth of fruit and vegetables (see here). With a kilogram of carrots costing less than a ÂŁ1 this is a lot of veg, or a mis-entry in the data maybe!

Closer to home we discover that Liverpool Council spent ÂŁ12,000 for a fax service for 2 years! There are also a collection of huge contracts for the MOD which appears to do its contracting from Bristol.

Getting down to more practical business we can use the data to see what opportunities we might be able to apply for. We found the best way to address this was to build a search tool in Tableau which allowed us to search and filter on multiple criteria (words in title, description, customer name, contract size) and view the results grouped together. So it is easy, for example, to see that Leeds City Council has tendered for ÂŁ13million in Computer and Related Services, the majority of which went on a framework contract with Fujitsu Services Ltd. Or that Oracle won a contract for ÂŁ6.5 million from the MOD for their services. You can see the austere interface we have made to this data below

Search.jpg

Do you have some data which you want exploring? Why not get in touch with us!

Got a PDF you want to get data from?
Try our easy web interface over at PDFTables.com!
]]>
758224098
Henry Morris (CEO and social mobility start-up whizz) on getting contacts from PDF into his iPhone https://blog.scraperwiki.com/2015/09/henry-morris-entrepreneur-for-social-mobility-on-getting-contacts-from-pdf-into-his-iphone/ Wed, 30 Sep 2015 14:11:16 +0000 https://blog.scraperwiki.com/?p=758224084 Henry Morris

Henry Morris

Meet @henry__morris! He’s the inspirational serial entrepreneur that set up PiC and upReach.  They’re amazing businesses that focus on social mobility.

We interviewed him for PDFTables.com

He’s been using it to convert delegate lists that come as PDF into Excel and then into his Apple iphone.

It’s his preferred personal Customer Relationship Management (CRM) system, it’s a simple and effective solution for keeping his contacts up to date and in context.

Read the full interview

Got a PDF you want to get data from?
Try our easy web interface over at PDFTables.com!

 

]]>
758224084
Civil Service People Survey – Faster, Better, Cheaper https://blog.scraperwiki.com/2015/09/civil-service-people-survey-faster-better-cheaper/ Tue, 08 Sep 2015 13:46:21 +0000 https://blog.scraperwiki.com/?p=758223952 CSPS1

Civil Service Reporting Platform

The Civil Service is one of the UK’s largest employers.  Every year it asks every civil servant what it thinks of its employer: UK plc.

For Sir Jeremy Heywood the survey matters. In his blog post “Why is the People Survey Important?” he says

“The survey is one of the few ways we can objectively compare, on the basis of concrete data, how things are going across departments and agencies.  …. there are common challenges such as leadership, improving skills, pay and reward, work-life balance, performance management, bullying and so on where we can all share learning.”

The data is collected by a professional survey company called ORC International.  The results of the survey have always been available to survey managers and senior civil servants as PDF reports. There is access to advanced functionality within ORC’s system to allow survey managers more granular analysis.

So here’s the issue.  The Cabinet Office wants to give access to all civil servants and in a fast and reliable way.  It wants to give more choice and speed in how the data is sliced and diced – in real time.  Like all government departments it is also under pressure to cut costs.

ScraperWiki built a new Civil Service People Survey Reporting Platform and it’s been challenging.  It’s a moderately large data set.  There’s close to half a million civil servants – over 250,000 answered the last survey which contains 100 questions.   There are 9000 units across government.  This means 30,000,000 rows of data per annum and we’ve ingested 5 years of data.

The real challenges were around:

  • Data Privacy
  • Real Time Querying
  • Design

Data privacy

The civil servants are answering questions on their attitudes to their work, their managers, the organisations they worked in along with questions on who they are: gender, ethnicity, sexual orientation – demographic information. Their responses are strictly confidential and one of the core challenges of the work is maintaining this confidentiality in a tool available over the internet, with a wide range of data filtering and slicing functionality.

A naïve implementation would reveal an individual’s responses either directly (i.e. if they are the only person in a particular demographic group in a particular unit), or indirectly, by taking the data from two different views and taking a difference to reveal the individual. ScraperWiki researched and implemented a complex set of suppression algorithms to allow publishing of the data without breaking confidentiality.

Real-time queriesimage_thumb.png

Each year the survey generates 30,000,000 data points, one for each answer given by each person. This is multiplied by five years of historical data. To enable a wide range of queries our system processes this data for every user request, rather than rely on pre-computed tables which would limit the range of available queries.

Aside from the moderate size, the People Survey data is rich because of the complexity of the Civil Service organisational structure. There are over 9,000 units in the hierarchy which is in some places up to 9 links deep. The hierarchy is used to determine how the data are aggregated for display.

Standard design

image_thumb.pngAn earlier design decision was to use the design guidelines and libraries developed by the Government Digital Service for the GOV.UK website. This means the Reporting Platform has the look and feel of GOV.UK., and we hope follows their excellent usability guidelines.

Going forward

The People Survey digital reporting platform alpha was put into the hands of survey  managers at the end of last year. We hope to launch the tool to the whole civil service after the 2015 survey which will be held in October. If you aren’t a survey manager, you can get a flavour of the People Survey Digital Reporting Platform in the screenshots in this post.

Do you have statistical data you’d like to publish more widely, and query in lightning fast time? If so, get in touch.

]]>
758223952
Horizon 2020–Project TIMON https://blog.scraperwiki.com/2015/09/horizon-2020project-timon/ Wed, 02 Sep 2015 09:27:55 +0000 https://blog.scraperwiki.com/?p=758223939 imageScraperWiki are members of a new EU Horizon 2020 project: TIMON “Enhanced real time services for optimized multimodal mobility relying on cooperative networks and open data”. This is a 3.5 year project, that commenced in June 2015, whose objectives are:

  • to improve road safety;
  • to provide greater transport flexibility in terms of journey planning across multiple modes of transport;
  • to reduce emissions;

From a technology point of view these objectives are to be achieved by deploying sensing equipment to cars, and bringing the data from these sensors, roadside sensors and public data together. This data will then be used to develop a set of services to users, as listed below:

  • Driver assistance services to provide real-time alerting for hazards, higher than usual traffic density, pedestrians and vulnerable road users;
  • Services for vulnerable road users to provide real-time alerting and information to vulnerable road users, initially defined as users of two-wheeled vehicles (powered and unpowered);
  • Multimodal dynamic commuter service to provide adaptive routing which is sensitive to road, weather and public transport systems;
  • Enhanced real time traffic API to provide information to other suppliers to build their own services;
  • TIMON collaborative ecosystem will give users the ability to share transport information on social media to enhance the data provided by sensors;

In common with all projects of this type, the consortium is comprised of a wide variety of different organisations from across Europe, these include:

  • University of Deusto in Bilbao (Spain) who are coordinating the work. Their expertise is in artificial intelligence and communications technologies for transport applications;
  • Fraunhofer Institute for Embedded Systems and Communication Technologies ESK in Munich (Germany). Their expertise is in communications technologies;
  • Centre Tecnològic de Telecommunicacions de Catalunya (CTTC) in Castelldefels (Spain). Their expertise is in positioning using global navigation satellite systems (GNSS), of which GPS is an example;
  • INTECS headquartered in Rome (Italy). INTECS are a large company which specialise in software and hardware systems particularly in the communications area, covering defence, space as well as automotive systems;
  • ScraperWiki in Liverpool (United Kingdom). We specialise in Open Data, producing data-rich web applications and data ingestion and transformations;
  • GeoX KFT in Budapest (Hungary). Their expertise is in Geographical Information Systems (GIS) delivered over the web and to mobile devices;
  • XLAB in Ljubljana (Slovenia). Their expertise is in cloud computing, cloud security and integrated data systems;
  • ISKRA Sistemi in Ljubljana (Slovenia) are the technical coordinator of the project. ISKRA are developers and providers of process automation, communications and security systems for power distribution, telecommunications, and railway and road traffic;
  • JP LPT in Ljubljana (Slovenia). JP LPT are fully owned by the municipality of Ljubljana, and are responsible for controlling and managing urban mobility in that city;
  • Confederation of Organisations in Road Transport Enforcement (CORTE) in Brussels (Belgium). They are an  international non-profit association in the transport area, specialising in technology dissemination and user needs research;
  • TASS International Mobility Center in Helmond (Netherlands). TASS for this project they are providing access to their traffic test-bed in Helmond for the technologies developed;

Our role in the work is to bring data from a variety of sources into the project and make it available, via an API, to other project partners to provide the services. We anticipate bringing in data such as public transport schedules, live running information, bicycle scheme points and occupancy and so forth. We will also be doing work, alongside all the other partners, on user needs and requirements in the early part of the project.

ScraperWiki has been a partner in the very successful NewsReader project of FP7. We enjoy the stimulating environment of working with capable partners across Europe, if you are building a consortium to make a proposal then please get in touch!

You can find out more about the project on the  TIMON website, and follow our accounts on Twitter, Facebook and Linkedin.

Or if you’d like to get in touch with ScraperWiki directly to talk about the project then contact on hello@scraperwiki.com.

Got a PDF you want to get into Excel?
Try our easy web interface over at PDFTables.com!
]]>
758223939
Number of prescriptions by location https://blog.scraperwiki.com/2015/08/number-of-prescriptions-by-location/ Fri, 28 Aug 2015 09:59:20 +0000 https://blog.scraperwiki.com/?p=758223912 There are 211 clinical commissioning groups (CCG’s) across England dispensing a range of medications every day. These CCG’s have demographic factors that could affect how much medication is dispensed. Therefore I thought it would interesting to compare the number of items dispensed in CCGs across England for a number of different medications, using the Clinical Commissioning Group Prescribing dataset for January – March 2015.

Antidepressants Blackpool and Windsor

Figure 1: Number of antidepressants dispensed in Blackpool and Windsor, Ascot and Maidenhead

One of the medications I looked at was antidepressant drugs. Figure 1 shows the number of antidepressant drugs dispensed in the Blackpool CCG and the Windsor, Ascot and Maidenhead CCG between January and March 2015. It shows that antidepressant drugs were dispensed almost three times more in Blackpool CCG than Windsor, Ascot and Maidenhead CCG. This is evident as there were 53,986 items per 100,000 people of antidepressant drugs dispensed in Blackpool CCG, whereas there was only 18,898 items per 100,000 people of it dispensed in Windsor, Ascot and Maidenhead. According to research by The Department for Communities and Local Government Blackpool is a poor city as it was part of the 10% most deprived areas in England, while Windsor, Ascot and Maidenhead is described as one the richest areas in England. This may explain why antidepressant drugs are dispensed more in Blackpool CCG than Windsor, Ascot and Maidenhead CCG.

2.Anitdepressant Map

Figure 2a: Antidepressant prescribing per 100,000 of population by Care Commissioning Group

Figure 2a compares the number of items of antidepressants dispensed in CCG’s across England with the population of the CCG’s. The orange map is the number of antidepressants dispensed in CCG’S between January and March 2015. The areas shaded in cream orange dispensed small amount of antidepressants, while the brown areas dispensed large amounts. The green map shows the population of CCG’s in 2012, the areas shaded in light green have smaller populations, while the dark green areas had the highest populations.

3.pop

Figure 2b: Population for each Care Commissioning Group (CCG)

Figure 2b shows that the population size of the CCG did not have much effect on the number of antidepressants that were dispensed. This is evident as Cambridgeshire and Peterborough CCG had a population size of 850,073 which was one of the highest populations but it only dispensed 29,502 antidepressants per 100,00 people. It also shows that although Durham Dales, Ealing and Sedgefield CCG does not have a high population size, it dispensed 47,641 antidepressants per 100,000 people, when the highest number of antidepressants dispensed out of all the CCG’s was 53,986 items per 100,000 people. It also shows that generally there were more antidepressants dispensed in the north of England than the south.

Figure 3

Diabetes Richmond and Lincolnshire

Figure 3: Number of drugs used in diabetes dispensed in Richmond CCG and Lincolnshire CCG

Figure 3 shows the number of drugs used in diabetes dispensed in Richmond CCG and Lincolnshire East CCG between January and March 2012. It shows that Lincolnshire East CCG dispensed over twice as many drugs used in diabetes than Richmond CCG. This is evident as per 100,000 people there were 34,368 items dispensed in Lincolnshire East whereas there was only 13,106 items dispensed in Richmond. According to Public Health England Lincolnshire was worse than the benchmark for excess weight in adults, while Richmond was better than the benchmark. This may affect why there were fewer drugs used in diabetes dispensed in Richmond CCG than in Lincolnshire East CCG as diabetes is often linked to being overweight.

Age comparisons Islington and Somerset

Figure 4: Age distribution in Islington and Somerset 2013

From figures 4 it can be concluded that Islington had a higher proportion of young people than Somerset. This is evident as 25% of Islington’s population was aged 21-30, compared to 10% of Somerset’s population. It also shows that there were more old people in Somerset than Islington as the age group with the highest number of people in Somerset was those aged 61-70. This is evident as 15% of Somerset’s population was in this age category, compared only 7% of Islington’s.

Dementia Islington and Somerset

Figure 5: Number of drugs for dementia dispensed in Islington and Somerset

Figure 5 shows the number of drugs for dementia dispensed in Islington CCG and Somerset CCG between January and March 2015. It shows that there were more drugs for dementia dispensed in Somerset than Islington. This is evident as there were 1,333 items per 100,000 people dispensed in Somerset, compared to 831 items per 100,000 people that was dispensed in Islington. There are more old people in Somerset and this may contribute to why there were more drugs dispensed in Somerset than Islington, which has more young people.

Corticosteriods (Respiratory)

Figure 6: Levels of corticosteroids dispensed

Figure 6 shows the number of items of corticosteroids (Respiratory) dispensed in different CCG’s across England between January and March 2015. The areas shaded in the light grey are where the smallest numbers of corticosteroids (respiratory) that were dispensed, while the areas shaded dark pink dispensed the highest number of items. The map shows that the East of England dispensed large numbers of corticosteroids, especially when compared to areas in and around London. It also shows that Blackpool CCG dispensed the highest number of Corticosteroids (Respiratory) at 16,194 items per 100,000 people, while the smallest number of items was dispensed in Southwark at 4,141 items per 100,000 people.

Conclusion

I found that for many of the drugs the north of England dispensed more than the south of England. It therefore might be argued that the South is generally healthier than the North. I would also conclude from looking at a range of medications that The Durham Dales, Easington and Sedgefield CCG, Lincolnshire CCGs, the Norfolk CCG and the Blackpool CCG are among the unhealthiest CCG’s in England, as they dispensed the highest number of items for many medications.

]]>
758223912
Branded and Generic medication compared https://blog.scraperwiki.com/2015/08/branded-and-generic-medication-compared/ https://blog.scraperwiki.com/2015/08/branded-and-generic-medication-compared/#comments Wed, 26 Aug 2015 09:52:32 +0000 https://blog.scraperwiki.com/?p=758223818 According to the Office of Health Economics for the Association of the British Pharmaceutical Industry (ABPI), the total medicines bill in the UK was ÂŁ13.6 billion in 2011 and ÂŁ10.8 billion of this was spent on branded medication. Prescribers such as GPs are encouraged to prescribe generic medicine instead of its branded version. This is because, as stated by NHS Choices, generic medication can cost up to 80% less than branded medicine, whilst still being as effective. With this in mind I have enjoyed comparing the number of items and the cost per item of the branded medication and its generic alternative dispensed by GP practices in the UK.

 

Findings:

Figure 1 - Number of items dispensed for branded (Zocor tablets) and generic (Simvastatin)

Figure 1 – Number of items dispensed for branded (Zocor tablets) and generic (Simvastatin)

 

This is evident as Figure 1 shows the number of items of the branded medication called Zocor tablets 40mg that were dispensed, and the number of items of it’s generic version called Simvastatin tablets 40mg that were dispensed in August 2014. The generic medication was dispensed at a much higher rate than it’s branded alternative. This is evident as there were 1,748,989 Simvastatin tablets dispensed, compared to the 345 Zocor tablets that were dispensed. These tablets are used to reduce levels of low-density lipoprotein and increase levels of high-density lipoprotein.

4-figure2a

Figure 2: Cost per item of branded (Zocor tablets) and generic (Simvastatin tablets)

Figure 2 shows the cost per item of Zocor 40mg tablets and its generic alternative called Simvastatin 40mg tablets that were dispensed in August 2014. It shows that the branded medication was a lot more expensive than the generic medication. This is apparent as the branded item, Zocor 40mg Tablets cost ÂŁ39.10 per item, while the generic item, Simvastatin 40mg tablets only cost ÂŁ1.33 per item. Figures 1 and 2 illustrate how general practioners often dispense more generic items than branded items to reduce the expenditure pressure on the NHS as they show that the cheaper generic version of the medication was dispensed more than the expensive branded version.

4-figure3a

Figure 3: Number of branded and generic items of skin cream dispensed

Figure 3 shows the number of branded and generic items of skin creams dispensed in March 2014. It shows that the branded medication, called Aveeno cream, was dispensed to a much larger extent than its generic alternative, called Dimeticone/Cetrimide cream 10%. There were 72,597 items of Aveeno Cream dispensed, whereas there were only 170 items of the generic version dispensed.

These creams are used to prevent and treat dry skin.

4-figure4a

Figure 4: Cost per item of the branded and generic versions of skin creams

Figure 4 is the cost per item of the branded and generic versions of skin cream. It shows that the branded cream, Aveeno Cream was more expensive than the generic version called Dimeticone/Cetrimide cream 10%. The branded medication was about ÂŁ5 more expensive than the generic as Aveeno Cream costs ÂŁ8.92 per item while Dimeticone/Cetrimide cream cost ÂŁ3.14.

Patients trusting the branded medication that they are used to over the newer generic versions could affect this, and the doctor may be likely to give in to their requests.

4-figure5a

Figure 5: Number of branded and generic oral powder sachets dispensed

Figure 5 shows the number of branded and generic versions of oral powder sachets that were dispensed in August 2014. The branded version is called Laxido oral powder sachets and the generic version is called Macrogol Co oral powder sachets. The branded version of the medication was dispensed over 2x’s more than it’s generic alternative. This is evident as there were 209,449 items of Laxido oral powder sachets dispensed, compared to only 94,864 items of Macrogol Co oral powder sachets. This medication is a laxative used to treat patients who have had long-term constipation.

4-figure6a

Figure 6: Cost per item of branded and generic versions of oral powder sachets

Figure 6 shows the cost per item of branded and generic versions of Oral powder sachets in August 2014. It shows that the branded medication was half the price of the generic version. This is evident as Laxido Oral Powder Sachets cost ÂŁ5.78 per item, whereas Macrogol Co Oral Powder Sachets cost ÂŁ10.80 per item. When looking at Figures 5 and 6 together they show that the branded medication was cheaper than the generic and was dispensed more than the expensive generic version.

The table below summarises the data on the number of branded and generic items dispensed and the ratio for each of the medications I looked at, as well as the cost per item and the ratio of that.

Number of items Cost per item
Branded/Generic Branded Generic Ratio Branded Generic Ratio
Aricept/Donepezil 54 90669 0.000596 57.09 1.25 45.67
Celluvisc/Carmellose 18499 1379 13.42 11.57 12.51 0.93
Panadol/Paracetamol 24 1436494 1.67E-05 5.39 3.35 1.61
Piriton/Chlorphenamine 4760 61822 0.08 0.86 1.7 0.51
Yasmin/Ethinylestr 38420 9236 4.16 22.6 22.3 1.01
Aveeno/Dimeticone 72597 170 427.04 8.92 3.14 2.84
Pancrease/Pancreatin 96 1 96 111.63 33.3 3.35
Zocor/Simvastatin 345 1748989 0.000197 39.1 1.33 29.40
Laxido/Macrogol 209449 94864 2.21 5.78 10.8 0.54
Zaroxolyn/Metolazone 4 619 0.01 95.62 2 47.81
Seroquel/Quetiapine 113 17893 0.01 95.62 2 47.81
Prograf/Tacrolimus 2454 410 5.99 81.77 81.8 1.00
Voltarol/Diclofenac 155 1716 0.09 17.73 12.78 1.39

Conclusion:

I was expecting to find that the generic version would be cheaper and so dispensed more than the branded version of the medicines, and I did find this for a few medications. However I was surprised to find that there are medicines where the branded medication was dispensed more than generic alternative even though it was more expensive, and that branded medication can be cheaper than it’s generic alternative. Also the absolute level of prescription for the different medication is interesting as there were only 72,597 items of Aveeno Cream dispensed, compared to 1,748,989 Simvastatin tablets. The difference could be because Aveeno Cream can be bought without a prescription and it is used for dry skin, whereas Simvastatin tablets can be used to lower cholesterol and risk of heart disease, diabetes and stroke.

 

]]>
https://blog.scraperwiki.com/2015/08/branded-and-generic-medication-compared/feed/ 5 758223818