A little less than a month ago the Open Data Institute launched Open Data Certificates, a new scheme to promote open datasets and help build trust between data producers and data consumers. Essentially, data producers fill in a detailed online survey for a particular dataset, covering various topics such as legal information (rights, licensing, privacy), practical information (findability, accuracy, quality, guarantees), technical information (locations, formats, trust) and social information (documentation, support, services). Based on the answers, a certificate is created (see this example), with one of four different ratings, from Raw, over Pilot and Standard to Expert (where Expert is an exceptionally high rating that so far no one has achieved). The certificate comes complete with code that the data producer can embed on their dataset’s webpage (just a badge or the complete certificate), as a way to inform and reinsure potential users of the quality of the dataset.

Certificates are “self certified” (and say so at the top). In other words, the ODI does not verify or review the submitted forms in any way. However, the terms state that the ODI reserves “the right to revoke a open data certificate at any time”. So, if future Open Data fraudsters (I wonder when we will first see such a thing) lie when filling in the form, in order to get an Expert rating, the ODI can just remove the certificate. This is possible, because the certificate always remain on the ODI’s website; the data owner only links to it.

In a way these certificates are similar to Tim Berners Lee’s 5-Star Open Data scheme. However, where the 5-Star scheme focusses mainly on technical aspects such as data formats and licences, the OD certificates rate datasets on a much wider basis (where technical aspects are just one component), covering “soft” aspects such as ease of use, timeliness, documentation, support, reliability, etc. Also, the well-designed and good-looking tools (forms, badges, copy&paste code) available on the certificates website makes them much easier for data producers to use. Of course, that they come from a respectable organisation like the ODI, gives the certificates additional weight – both for consumers, who will trust them more, and for producers, who will be keen to have them.

The ODI’s Gavin Starks calls the new scheme the “first robust quality badge for open data”. I believe this is true, as far as this is possible for a self-certified badge. What I particularly like is the fact that we now have something like the counter part of a licence: where a licence dictates how a consumer can use a dataset, the certificate makes promises and assurances about quality from the side of the data producer. We’ll see how it works out, but I have been thinking for a while that we really need something like this for Open Data to see broader use.


Tom Lee Yamaha Music Course Certificate Concert by denniswong on Flickr, licensed under CC BY 2.0.

Categories: News, slider

In my last post I mentioned a talk I gave about my work for berlinonline on the Berlin Data Portal. Several months have passed since then (so quickly!), and yesterday we were finally able to launch the portal’s new version at daten.berlin.de (aka “Offene Daten Berlin”)!

Compared to the upheaval in the open data community that the launch of the federal German data portal caused, the re-launch of the Berlin portal went rather quietly so far. This is probably because, from an Open Data point of view, not so much has changed – by far the most datasets still use Creative Commons licenses, and the number of datasets has grown steadily since the portal’s launch in 2011. Also, and this is probably even more important, the number of departments from the Berlin administration participating in the city’s Open Data initiative has grown as well.

The most obvious changes coming with the relaunch are cosmetic – they are, however, pretty drastic: there has been a complete change of the portal’s layout to suit the new (future) design of city’s berlin.de site. Offene Daten Berlin is one of the first parts of berlin.de that showcase the reboot of the layout; the only other part of the site that has already implemented the new layout is the citizen’s service portal service.berlin.de. I believe the new design (which was not developed by me – I only implemented it for the portal) is a lot cleaner, more user-friendly and just overall gives a much more pleasant user experience than the old one.

There are, however, a good few changes under the hood as well:

  • The (non-public) backend of the portal is based on the brilliant CKAN platform, for which we completely redeveloped a Berlin-specific plugin from scratch.
  • Quite a bit of work also went into maintenance-related inner workings of the (Drupal-based) portal – not something that a regular user will easily notice, of course, but these are things that make administrating and moderating the site much smoother.
  • daten.berlin.de is now integrated into a much larger, Germany-wide Open Data ecosystem: upstream to the federal govdata.de, and downstream from the Open Data portal of Berlin’s energy provider Stromnetz Berlin GmbH (a subsidiary of Vattenfall) at netzdaten-berlin.de (more local portals might follow). This has been made possible quite painlessly with CKAN’s harvester infrastructure.
  • Last but not least, the new portal also comes with a significant increase in available data in general. The largest and latest addition of course comes from the Stromnetz Berlin data (there is a hackday specifically targeted at this data this weekend), but also other data from the local public transport providers VBB has recently been made available through daten.berlin.de, and many other quite interesting datasets as well.

In the future, I will start looking in more detail at some these datasets, maybe in the form of a series of blog posts à la “dataset of the month”, where I will explore and highlight what particular datasets have to offer, what they might be used for, how they might be combined with other data, etc.

Categories: News, slider

Road Blocks for Using Open Data

This week, I was invited to give a talk at a workshop on Open Data and Open Government at the Innovationsforum Semantic Media Web in Berlin. There were short talks on different aspects on the topic; I was asked to provide a local Berlin perspective (slides see at the bottom of this post) based on my work on the Berlin Open Data Portal which I have recently started doing on behalf of BerlinOnline.

The presentations and discussions during the day touched on a lot of different topics, and people were generally enthusiastic about the potential of Open Data as an enabler for more transparency, new business models, etc. However, there were several crucial issues that were surfacing again and again:

  1. Data Quality – One of the main benefits of Open Data constantly mentioned is that it would provide a valuable raw material for businesses to build services and products. However, this requires a certain level of data quality. Data needs to be correct, stable and consistent. There was a lot of concern during the day that much of the available Open Data is poor quality and therefore not very usable.
    It might seem that this is in contradiction to the “raw data now!” motto and might play into the hands of administration who say “We don’t want to publish our data because it’s not perfect”! That’s not the case, though. Data can and should still be published early – if it’s interesting and relevant, data users will give feedback and ensure its quality improves. However, that doesn’t change the fact that for a business to build something on Open Data, data quality matters.
  2. Availability of Data – If a data provider announces a dataset at a certain URI today, then in order build a service that relies on that data, I need to be sure that I can still access it at the same URI tomorrow. Several people at the workshop complained that links given for datasets they wanted to explore were broken, or that APIs didn’t work. This is a special case of the general Cool URIs don’t change requirement, but it’s definitely crucial if a dataset is supposed to be the raw material for a product. In fact, I wonder if we should think about some kind of explicit assertion (like the flipside of a license) that data providers could publish to indicate they are committed to keeping a dataset or API available.
  3. Data Coverage – In many cases, the coverage of Open Datasets is spotty – geographically, temporally or otherwise. To give just one example, there is a dataset on the locations of bottle banks in the Wilmersdorf district in Berlin. It’s great that someone in the local administration put in the time and effort to publish this, but what about all the other districts of the city? If I want to provide a city-wide service, then information about just one of its districts is not going to help me much. Another example was where a Germany-wide real estate service was considering to include Open Data into their site, but they decided against it because the data was only available for one of the major cities (Berlin). There are of course many good reasons for this situation, such as the federal nature of Berlin and Germany in the two examples given here, but the fact remains that bad coverage can diminish the usefulness of datasets in many cases.
  4. Political Commitment – Addressing the issues listed above takes time, resources and the will to change and agree on things. One ingredient that could really push these aspects further is the official, political decision to support and implement an Open Data strategy, and ideally a prominent political figure to announce and back this publicly. In countries where this has happened, notably the UK and the US, Open (Government) Data is quite advanced and public administration is much more willing to put in the effort required. Both countries have high-profile national Open Data platforms and programmes. Of course there are still many problems, but the situation is nevertheless much better than in a country like Germany, where no such prominent commitment has been made.
    The bottom-up, grass-roots approach taken here is fantastic and absolutely necessary, e.g. in Berlin, where the Open Data movement started with some local interest groups who were then able to get parts of the administration into the boat. However, to really the the ball going and convince those parts of the administration that are hesitant and lagging behind, I believe some more prominent commitment has to come from higher up.


Banksy by photomonkey on Flickr, licensed under CC BY-NC 2.0.

Categories: News, slider

Open Data and Digital CitiesWhen talking about the foundation necessary to build smarter, digital cities, the focus is often on ubiquitous broadband, wireless access or sensors – in other words hardware. However, another important ingredient is data, ideally of the open flavour. In other words, open data is one of the key enablers for better services, improved opportunities for citizen participation and new business models (have a look at lokaler for a startup making use of open data). In this discussion, open data is often named synonymously with open government data, i.e., public sector data from local, national or even international government bodies. However, public sector data is not all there is – open data can also include data from businesses (opening hours, details of product and service offers, locations, etc.), from non-governmental organisations, or even from individual citizens.

Data foundations for the digital cityData portals such as daten.berlin.de are an established solution for providing a central place to find relevant public sector data. What is missing is a way for other local (regional, national, etc.) organisations to make use of the same portal. Of course, the data itself is published de-centrally, but a single channel for finding and announcing relevant datasets from all sectors would make a lot of sense. Even data aggregated through embedded metadata à la schema.org could be contained – e.g., the Berlin data open portal could make the subset of all schema.org information relevant to Berlin available.

So, what about “digital graffiti”? I have stolen the label from a talk on the Data Foundations for Digital Cities recently given by my former Talis/Kasabi colleague Leigh Dodds at the Open Data Cities event in Brighton, UK. Basically, digital graffiti is the traces that citizens can leave in their digital cities; it’s a way of contributing to their city’s web of data. This would usually happen through applications such as Fix my Street or any of the similar sites that followed in its wake. In an ideal world, even commercial services such as Qype or Amen could contribute by opening up portions of their data, particularly when they have data with a geographical context.

As a final requirement, all these different data sources would have to be linked to a larger whole, rather than remaining isolated data islands. There is no easy solution for achieving this, but I believe applying Linked Data principles – publishing 5-star Open Data with URI identifiers and a graph data model – is the way to go here.

I have given a presentation with a more elaborate version of these thoughts a while ago at the Xinnovations 2012 conference in Berlin, where Open Data (and Open Government Data in particular) was one of the major topics that surfaced again and again throughout all three days. As this was a German-language event, the slideset is in German as well.

Categories: News, slider
   Next Page »