ZBW MediaTalk

The Linked Open Data Cloud is a global network of semantically interconnected data and databases. The basic principle is that everyone can publish their data and databases in the Linked Open Data Cloud under an “open” license, that is a license allowing a free use, distribution and reuse of these data and databases. There are two incentives for libraries to publish their data in the Linked Open Data Cloud:

  1. A published dataset in the Linked Open Data Cloud increases the visibility of an organisation.
  2. Libraries can develop and offer new services by connecting their own data with data of other libraries.

Most often the license Creative Commons – no rights reserved is applied (CC0). CC0 is not really a license but rather a waiver of all rights. CC0 is applicable to data and databases. CC0 is widely used in the international library community to publish bibliographic data in the Linked Open Data Cloud.

 

The three surprises related to CC0

We faced many surprises when we started our internal discussion about the license under which our data should be published:

Surprise 1: Only few deep and comprehensive discussions take place when it comes to the decision about which license should be used to protect library data in the Linked Open Data Cloud. In 2012, so far no single one of the large library conferences in Germany devoted a workshop or special track on license models for library data but many about publishing data in the Linked Open Data Cloud!

Surprise 2: The library community has often only very simple answers to the question “Why are you licensing you data under CC0”? Typical answers are “because other big players are doing it”, “because you cannot protect catalog data”, “because it is a requirement if you want to link up with Europeana”, “because it was paid with public money”.

Surprise 3: Some argue that CC0 is the only way to avoid that our data is exploited commercially is to waive all rights we have. It is too early to assess this view. And we still have to wait for the first project results, e.g. of EC-funded projects targeting at new commercial services which fully rely on data licensed under CC0 (c.f. SME initiative on Digital Content and Languages of The European Commission)

CC0 – No attribution to the library

Of course, the advantage of CC0 is that no control is necessary if the conditions of a license agreement are complied by third parties. For third parties the benefit lies in the unrestricted use of the data and databases i.e. also commercial exploitation is allowed. At the same time the products licensed under CC0 are compatible with other data or databases that are published under an open license. This facilitates the development of new products and services which of in turn can increase the world-wide use of the data.

And, so the expectation of many organisations, if the world-wide use increases also the world-wide visibility of the originator will increase. But his is not necessarily the case. CC0 does not require any attribution of the organisation which originally provided the data in the Linked Open Data Cloud. The most significant disadvantage therefore is that data provenance becomes impossible if attribution is not required.

But why is this a disadvantage? To give the answer, a deep understanding of the full logic of Linked Open Data is necessary. The Linked Open Data Cloud is a network which nobody owns, which nobody controls and which does not have any quality assurance mechanism. One important indicator for quality is the reputation of the organisation providing data to the Linked Open Data Cloud. If data licensed under CC0 is used without attribution (e.g. for a new library service), it becomes impossible to assess the quality of the data anymore. Are libraries aware of it? But what else if not CC0? To answer this question, we suggest to think about Open Database Licenses or Open Data Commons:

Alternatives worth to think about

The Open Database License (ODbL) allows the free reuse and distribution of the database as well as the modification of the same and the creation of new products. ODbL is applicable to databases. Besides the attribution of the creator of the database, ODbL requires that new products that are generated using a database with ODbL must be released under ODbL or under an equivalent license (ShareAlike). This ensures that one can track where the specific database is used and which new products are generated with it. If needed, one has the opportunity to negotiate with commercially interested providers. Attribution is a mandatory requirement of ODbL, which always ensures the visibility of the creator of the database. Outside the library community, ODbL has received much attention, when OpenStreetMap changed their license from Creative Commons by-share-alike to ODbL.

If one does not want to apply ODbL to their data, other models do exist. Two of which are described here:

  • ODC-by: As a compromise between CC0 and ODbL one can choose the Open Data Commons Attribution License (ODC-by). This license does not have a ShareAlike-clause but requires the attribution of the database creator. This ensures visibility but does not prohibit the commercial use of the database.
  • Core metadata set: Another way to combine CC0 and ODbL is the creation of a reduced metadata set (core metadata set) based upon the full data set. The idea of a core metadata set is pursued by the German National Library (PDF). While the full database version is released under ODbL, the database with the reduced metadata set is published under CC0.

Perishing will follow the publishing hype

We are convinced that more options do exist, if more efforts are devoted to the question about which open license should be used for which type of library data and for which purpose. Still, to trigger this thinking a more visible and lively discussion is necessary. And we hope it will kick off soon. One opportunity will be the next Conference on Semantic Web in Libraries (SWIB). If not, perishing will follow the current Linked Open Data publishing hype.



View Comments

  • Klaus,

    Thanks for the write-up! Two comments: in fact less than 20% of the LOD cloud datasets provide explicit license information [1]. On a related note: we’ve provided guidance how to improve on that situation [2].

    Cheers,
    Michael

    [1] http://lod-cloud.net/state/#license
    [1] http://www.w3.org/TR/void/#license


  • The most significant disadvantage therefore is that data provenance becomes impossible if attribution is not required.

    This is not true. Attribution licenses aren’t a necessary condition for others to provide provenance information. I too think that data provenance information is an important feature and one should of course provide provcenance information – even for the data you have completely produced yourself.

    I’d argue the other way round: The importance of provenance information can lead to the conclusion that a legal requirement like an attribution license isn’t necessary. Thus, I think that any responsible data publisher who wants to build a good reputation will indicate the provenance of her data – whether the underlying third-party data is CC0-licensed or ODC-BY or whatever. And good provenance information will be better than attribution as provenance information goes beyond saying “This dataset contains data from the ZBW library catalog.”

    Thus, I would be happy to see more discussion about actual provision of provenance information than about which open (ODC-BY and ODbL both – like CC0 – comply with the open definition) license to choose.

    See also .


  • Oups, the link to a post over at the Creative Commons blog titled “Library catalog metadata: Open licensing or public domain?” didn’t get through. Here you are: http://creativecommons.org/weblog/entry/33768.


Next Post