5 yr old goes ‘potty’ at Devon and Somerset Fire Service (Emergencies and Data Driven Stories)

It’s 9:54am in Torquay on a Wednesday morning:

One appliance from Torquays fire station was mobilised to reports of a child with a potty seat stuck on its head.

On arrival an undistressed two year old female was discovered with a toilet seat stuck on her head.

Crews used vaseline and the finger kit to remove the seat from the childs head to leave her uninjured.

A couple of different interests directed me to scrape the latest incidents of the Devon and Somerset Fire and Rescue Service. The scraper that has collected the data is here.

Why does this matter?

Everybody loves their public safety workers — Police, Fire, and Ambulance. They save lives, give comfort, and are there when things get out of hand.

Where is the standardized performance data for these incident response workers? Real-time and rich data would revolutionize its governance and administration, would give real evidence of whether there are too many or too few police, fire or ambulance personnel/vehicles/stations in any locale, or would enable the implementation of imaginative and realistic policies resulting from major efficiency and resilience improvements all through the system?

For those of you who want to skip all the background discussion, just head directly over to the visualization.

The easiest method to monitor the needs of the organizations is to see how much work each employee is doing, and add more or take away staff depending on their workloads. The problem is, for an emergency service that exists on standby for unforeseen events, there needs to be a level of idle capacity in the system. Also, there will be a degree of unproductive make-work in any organization — Indeed, a lot of form filling currently happens around the place, despite there being no accessible data at the end of it.

The second easiest method of oversight is to compare one area with another. I have an example from California City Finance where the Excel spreadsheet of Fire Spending By city even has a breakdown of the spending per capita and as a percentage of the total city budget. The city to look at is Vallejo which entered bankruptcy in 2008. Many of its citizens blamed this on the exorbitant salaries and benefits of its firefighters and police officers. I can’t quite see it in this data, and the story journalism on it doesn’t provide an unequivocal picture.

The best method for determining the efficient and robust provision of such services is to have an accurate and comprehensive computer model on which to run simulations of the business and experiment with different strategies. This is what Tesco or Walmart or any large corporation would do in order to drive up its efficiency and monitor and deal with threats to its business. There is bound to be a dashboard in Tesco HQ monitoring the distribution of full fat milk across the country, and they would know to three decimal places what percentage of the product was being poured down the drain because it got past its sell-by date, and, conversely, whenever too little of the substance had been delivered such that stocks ran out. They would use the data to work out what circumstances caused changes in demand. For example, school holidays.

I have surveyed many of the documents within the Devon & Somerset Fire & Rescue Authority website, and have come up with no evidence of such data or its analysis anywhere within the organization. This is quite a surprise, and perhaps I haven’t looked hard enough, because the documents are extremely boring and strikingly irrelevant.

Under the hood – how it all works

The scraper itself has gone through several iterations. It currently operates through three functions: MainIndex(), MainDetails(), MainParse(). Data for each incident is put into several tables joined by the IncidentID value derived from the incident’s static url, eg:

http://www.dsfire.gov.uk/News/Newsdesk/IncidentDetail.cfm?IncidentID=7901&siteCategoryId=3&T1ID=26&T2ID=41

MainIndex() operates their search incidents form grabbing 10 days at a time and saving URLs for each individual incident page into the table swdata.

MainDetails() downloads each of those incident pages, parsing the obvious metadata, and saving the remaining HTML content of the description into the database. (This used to attempt to parse the text, but I then had to move it into the third function so I could develop it more easily.) A good way to find the list of urls that have not been downloaded and saved into the swdetails is to use the following SQL statement:

select swdata.IncidentID, swdata.urlpage 
from swdata 
left join swdetails on swdetails.IncidentID=swdata.IncidentID 
where swdetails.IncidentID is null 
limit 5

We then download the HTML from each of the five urlpages, save it into the table under the column divdetails and repeat until no more unmatched records are retrieved.

MainParse() performs the same progressive operation on the HTML contents of divdetails, saving it into the the table swparse. Because I was developing this function experimentally to see how much information I could obtain from the free-form text, I had to frequently drop and recreate enough of the table for the join command to work:

scraperwiki.sqlite.execute("drop table if exists swparse")
scraperwiki.sqlite.execute("create table if not exists swparse (IncidentID text)")

After marking the text down (by replacing the <p> tags with linefeeds), we have text that reads like this (emphasis added):

One appliance from Holsworthy was mobilised to reports of a motorbike on fire. Crew Commander Squirrell was in charge.

On arrival one motorbike was discovered well alight. One hose reel was used to extinguish the fire. The police were also in attendance at this incident.

We can get who is in charge and what their rank is using this regular expression:

re.findall("(crew|watch|station|group|incident|area)s+(commander|manager)s*([w-]+)(?i)", details)

You can see the whole table here including silly names, misspellings, and clear flaws within my regular expression such as not being able to handle the case of a first name and a last name being included. (The personnel misspellings suggest that either these incident reports are not integrated with their actual incident logs where you would expect persons to be identified with their codenumbers, or their record keeping is terrible.)

For detecting how many vehicles were in attenence, I used this algorithm:

appliances = re.findall("(S+) (?:(fire|rescue) )?(appliances?|engines?|tenders?|vehicles?)(?: from ([A-Za-z]+))?(?i)", details)
nvehicles = 0
for scount, fire, engine, town in lappliances:
    if town and "town" not in data:
        data["town"] = town.lower(); 
    if re.match("one|1|an?|another(?i)", scount):  count = 1
    elif re.match("two|2(?i)", scount):            count = 2
    elif re.match("three(?i)", scount):            count = 3
    elif re.match("four(?i)", scount):             count = 4
    else:                                          count = 0
    nvehicles += count

And now onto the visualization…

It’s not good enough to have the data. You need to do something with it. See it and explore it.

For some reason I decided that I wanted to graph the hour of the day each incident took place, and produced this time rose, which is a polar bar graph with one sector showing the number of incidents occurring each hour.

You can filter by the day of the week, the number of vehicles involved, the category, year, and fire station town. Then click on one of the sectors to see all the incidents for that hour, and click on an incident to read its description.

Now, if we matched our stations against the list of all stations, and geolocated the incident locations using the Google Maps API (subject to not going OVER_QUERY_LIMIT), then we would be able to plot a map of how far the appliances were driving to respond to each incident. Even better, I could post the start and end locations into the Google Directions API, and get journey times and an idea of which roads and junctions are the most critical.

There’s more. What if we could identify when the response did not come from the closest station, because it was over capacity? What if we could test whether closing down or expanding one of the other stations would improve the performance in response to the database of times, places and severities of each incident? What if each journey time was logged to find where the road traffic bottlenecks are? How about cross-referencing the fire service logs for each incident with the equivalent logs held by the police and ambulance services, to identify the Total Response Cover for the whole incident – information that’s otherwise balkanized and duplicated among the three different historically independent services.

Sometimes it’s also enlightening to see what doesn’t appear in your datasets. In this case, one incident I was specifically looking for strangely doesn’t appear in these Devon and Somerset Fire logs: On 17 March 2011 the Police, Fire and Ambulance were all mobilized in massive numbers towards Goatchurch Cavern – but the Mendip Cave Rescue service only heard about it via the Avon and Somerset Cliff Rescue. Surprise surprise, the event’s missing from my Fire logs database. No one knows anything of what is going on. And while we’re at it, why are they separate organizations anyway?

Next up, someone else can do the Cornwall Fire and Rescue Service and see if they can get their incident search form to work.

Tags: data, html, javascript, open data, scraper, scrapers, views

ScraperWiki

Extract tables from PDFs and scrape the web

Blog

5 yr old goes ‘potty’ at Devon and Somerset Fire Service (Emergencies and Data Driven Stories)

Under the hood – how it all works

And now onto the visualization…