Wednesday, July 23, 2008

The Data Smog

So, I was passed on a link this morn to the NSW Food Authority's register of penalty notices, released in early May this year. My contact expressed her happiness that such information would be released. Her justification, unsurprisingly, was that now people can know who is violating what when it comes to food. Note that this is the penalty register: the offences register is on a separate page. We'll get to a comparison of the two momentarily. For now I want to focus on the former.

Because, being a PhD student, I have a problem with the site, much as I have a problem with lots of things. Too many books = whinges a lot.

So what is what is wrong with this site, for me? It can be summed up best in Alasdair Roberts' catchphrase: data smog. Data Smog is the effect where an institution releases a whole bunch of information under the guise of being a good citizen and letting the folk know what's going on in their world, but ends up releasing so much information so quickly and in such a messy state that they may as well have not released any data at all. The site is a table (sortable by a number of categories) that lists those businesses whom the Food Authority has slapped with a fine regarding a food-related matter: storage, cleaning of food or premises, handling, sale, labelling, etc. On the surface, it looks reasonably manageable. However, the information doesn't really give any substantive insights. To get any information about what particular penalty a particular business has been given, one must click on the link embodied by the penalty code off to the right of the table. This code, again non-descriptive, takes you to the particular information on the case, where you can see the details of the infraction for that business. So, conceivably, if one knew where one was going to dine, or what area, one could scroll through the table to find specific restaurants or restaurants in a particular area.

However, this is all done manually, and requires shifting back and forth from the table on the front page to the specific penalty incurred, and back again, for each offense. Not just each premises, but each offense. So, if an inspector has been having a bad day and goes nuts on a particular joint, there may be a number of incidental penalties that appear in sequence, that one has to look at separately if they are to assess how serious each claim is. For instance, a particular supermarket was fined for labelling a pack of mutton as lamb. Not a serious offense, but still, it is flagged.

Now, I have no problem with all these incidentals being released per se. I'm sure that there are particular religious denominations or others who will find value in such info. But there is no way to screen for a particular offense, for example. So, if someone has a life-threatening condition, say, a bad nut allergy, they can't find who has been charged within the last 2 years (the time period which penalties are noted for) for accidentally introducing nuts into meals. This is important for them, and may influence their choice on where to go based on repeat, or single infractions. However, if one wanted to know this, they'd have to search through all this data manually to get what they wanted. Simple filtering systems aren't exactly new or complicated: If a Microsoft product can do it, it shouldn't be too problematic. Or even just to put a brief description of the infraction within the initial table, for easy viewing.

Never mind that there is no consistency between the penalties and offences tables. In penalties, you click on the penalty rego number to get details, in offences, the business name. Of course, this isn't noted.

I'm all for free information. But it seems like a waste of time and money to throw it out there without at least some rudimentary ability to filter through such information. I mean, because of the two-tier structure of the penalties notices, one couldn't even copy the first table into, say, excel and go from there. They'd have to copy each offense individually. I dont' believe it is enough for governments to provide information to their citizens. They have to provide it in such a way that someone without expert knowledge on the subject can approach the data and manipulate it to acheive their goals, particular when such goals are related to health.


Jason said...

But you could get Catie to download the whole site and spend about 5 seconds of her fearsome programming might writing a script to put it into whatever format you wanted.

It's harder for a government agency to do that because they haven't got Catie and instead they've got a lot of red tape.

Catherine said...

Jason you're a bad man, distracting me with the idea of doing things like that! Rawr!

*goes back to writing her thesis chapter, and *not* writing a script to do that* :P

Nick said...

I'd be willing for my meagre tax dollars to go to paying large salaries to Catie and a number of my highly computer-masterful friends, if they clean up the interwebs. Great idea, Jason!

Nicholas said...

Wow. I just had a quick browse and found the case of some guy convicted for selling fake Scotch Whisky. It turns out that to call something Scotch Whisky it has to contain at least 40% ethanol, but this guy's whisky only contained 38.6% ethanol. For that missing 1.4% he had all his whisky confiscated and paid $95000 in fines and court costs. I don't know if that's a fair punishment, but I bet he needed a drink afterwards.