Towards a National Information Infrastructure

There have been a lot of unpublished datasets appearing on the government’s open data portal over the past couple of months. This is part of the response to Stephan Shakespeare’s review of Public Sector Information.

In his review, Shakespeare recommended that the government identify what he referred to as National Core Reference Data. He defined this as being the high quality core data that the public sector maintains already and said that he would “expect to find the connective tissue of place and location, the administrative building blocks of registered legal entities, the details of land and property ownership” in this collection.

The government’s response has been to rename?National Core Reference Data to the National Information Infrastructure. Rather than deciding which datasets should be part of that infrastructure themselves they have been releasing the details of unpublished datasets held within government on to data.gov.uk.

When doing this they are asking members of the public to comment on them and say if releasing them would create economic, social and/or efficiency benefits.?Following an initial consultation period?potential candidates for the National Information Infrastructure?are being discussed ahead of release during the Open Government Partnership’s Summit at the end of the month.

However, this release will be a first draft and the National Information Infrastructure?will change over time. Which is a good thing as, with over?4000 unpublished datasets on data.gov.uk , there haven’t all that many people commenting on them. At the time of writing, just 95 out of 4305 unpublished datasets have any feedback against them.

If you look at the unpublished datasets that have been listed by the Ministry for Justice, there are 36 out of 43 that have no feedback against them.

Given the work that has been put in by people participating on the Open Data Challenge Series on Crime and Justice – the competition weekend is next week and there’s still time to sign up and take part – we ought to be in a good place to increase the number of comments on unpublished datasets relating to crime and justice. I’m going to spend some time going through them over the coming weeks, especially after the weekend.

There may be some?some significant unpublished datasets that are not listed, and?Owen Boswara has been collating these on an openly editable Google doc. One of the advantages of taking an open approach and crowdsourcing the datasets in the?National Information Infrastructure is that it makes it easier to identify inaccuracies and incompleteness.

One thing I’ve noticed is that there has been some concern expressed that this could be an exercise that initiates the publication of private data. An example of this is the listing of the Dartford Crossing?Payment System whose fields include: “Names, addresses (inc e-mail), telephone numbers, vehicles registration, bank / credit card details, for users of DART-Tag.”. I somehow doubt that there is any intention of releasing loads of drivers’ bank details to the web, but some clarification on the site would be helpful.

This is an opportunity to influence the priorities of open data releases and it would be good if we were to increase the amount of participation in the process.

2 comments

Giuseppe Sollazzo says:

October 2, 2013 at 2:38 am

The approach of opening up at this level was going to be a tricky one. Assessing datasets is, in many cases, a highly specialised task. I’m not suggesting to hide it from the general public, but ways to encourage the public to get involved and understand the data should be sought. The obscure naming of datasets is another issue that will need to be addressed, as well as their tagging.
siwhitehouse says:

October 2, 2013 at 4:18 am

Hi Giuseppe

Yes, I think it’s hard to achieve and with the tight timescales for the initial publication it was always going to be difficult to get the public involved. Hopefully that’ll improve over time.
As for it being a specialised task. I think that I quite like the idea of professionals identifying a large amount of the core datasets that should form part of the NII. But, conversations such as the one around Libra demonstrate that there is a value to the exercise.
Libra is also an example of the obscure naming issue you highlight. I’d imagine that most people inside the home office and Ministry of Justice know of it and what it is, but we certainly need it to be obvious to general viewers without specialist knowledge when we publish online. It’s a good point and one I hadn’t fully thought through before, thanks.

2 comments

Leave a Reply Cancel reply