Searching by facets
Last year, ScraperWiki helped migrate lots of specialist datasets to GOV.UK.
The user interface is called Finder Frontend, and is used by GOV.UK wherever the user needs to search for items by varying criteria. In the jargon, it’s called “faceted search”.
We enabled this type of searching by scraping the “Aircraft category”, “Report type” and “Date” fields. Users can then filter the accident reports by one or more of those criteria at once.
Most accident prone
Since we scraped it, we also happen to have the data in an SQL database in our Data Science Platform. A quick query reveals which aircraft has the most accident reports about it.
The answer is G-AWNB, a Boeing 747-136. It was made in 1970, and has 10 accident reports (some of those are errata, so it doesn’t mean ten accidents).
Here are three of its accidents, chosen to span time:
- In 1975 in Scotland, part of a flap detached during a training flight and struck the cabin door.
In 1987, shortly after takeoff, a steward noticed a skin panel had ruptured on the left wing, and the hapless plane had to jettison its fuel and return to Heathrow.
Lest you think it was just a badly made or maintained plane, in 1995, also at Heathrow, it suffered bad luck. A faulty passenger jetty rose up damaging the cabin door – repairs took several days.
ScraperWiki often helps with migration projects like the AAIB data. As another example, we’re working on migrating insurance data between two ERP systems at the moment.
The skillset of understanding a (poorly) documented dataset, and producing the best quality output for re-use, is an important part of data science. We use the same skill as part of lots of other projects.
Understanding data fully is the first stage of doing useful analysis with data.