Road Blocks for Using Open Data

This week, I was invited to give a talk at a workshop on Open Data and Open Government at the Innovationsforum Semantic Media Web in Berlin. There were short talks on different aspects on the topic; I was asked to provide a local Berlin perspective (slides see at the bottom of this post) based on my work on the Berlin Open Data Portal which I have recently started doing on behalf of BerlinOnline.

The presentations and discussions during the day touched on a lot of different topics, and people were generally enthusiastic about the potential of Open Data as an enabler for more transparency, new business models, etc. However, there were several crucial issues that were surfacing again and again:

  1. Data Quality – One of the main benefits of Open Data constantly mentioned is that it would provide a valuable raw material for businesses to build services and products. However, this requires a certain level of data quality. Data needs to be correct, stable and consistent. There was a lot of concern during the day that much of the available Open Data is poor quality and therefore not very usable.
    It might seem that this is in contradiction to the “raw data now!” motto and might play into the hands of administration who say “We don’t want to publish our data because it’s not perfect”! That’s not the case, though. Data can and should still be published early – if it’s interesting and relevant, data users will give feedback and ensure its quality improves. However, that doesn’t change the fact that for a business to build something on Open Data, data quality matters.
  2. Availability of Data – If a data provider announces a dataset at a certain URI today, then in order build a service that relies on that data, I need to be sure that I can still access it at the same URI tomorrow. Several people at the workshop complained that links given for datasets they wanted to explore were broken, or that APIs didn’t work. This is a special case of the general Cool URIs don’t change requirement, but it’s definitely crucial if a dataset is supposed to be the raw material for a product. In fact, I wonder if we should think about some kind of explicit assertion (like the flipside of a license) that data providers could publish to indicate they are committed to keeping a dataset or API available.
  3. Data Coverage – In many cases, the coverage of Open Datasets is spotty – geographically, temporally or otherwise. To give just one example, there is a dataset on the locations of bottle banks in the Wilmersdorf district in Berlin. It’s great that someone in the local administration put in the time and effort to publish this, but what about all the other districts of the city? If I want to provide a city-wide service, then information about just one of its districts is not going to help me much. Another example was where a Germany-wide real estate service was considering to include Open Data into their site, but they decided against it because the data was only available for one of the major cities (Berlin). There are of course many good reasons for this situation, such as the federal nature of Berlin and Germany in the two examples given here, but the fact remains that bad coverage can diminish the usefulness of datasets in many cases.
  4. Political Commitment – Addressing the issues listed above takes time, resources and the will to change and agree on things. One ingredient that could really push these aspects further is the official, political decision to support and implement an Open Data strategy, and ideally a prominent political figure to announce and back this publicly. In countries where this has happened, notably the UK and the US, Open (Government) Data is quite advanced and public administration is much more willing to put in the effort required. Both countries have high-profile national Open Data platforms and programmes. Of course there are still many problems, but the situation is nevertheless much better than in a country like Germany, where no such prominent commitment has been made.
    The bottom-up, grass-roots approach taken here is fantastic and absolutely necessary, e.g. in Berlin, where the Open Data movement started with some local interest groups who were then able to get parts of the administration into the boat. However, to really the the ball going and convince those parts of the administration that are hesitant and lagging behind, I believe some more prominent commitment has to come from higher up.

Banksy by photomonkey on Flickr, licensed under CC BY-NC 2.0.

Categories: News, slider

One Response so far.

  1. […] assurances about quality from the side of the data producer. We’ll see how it works out, but I have been thinking for a while that we really need something like this for Open Data to see broader […]

Leave a Reply to Open Data Certificates : datalysator