The programme of this 2014 conference, following a successful precedent set last year with similar meetings held in Taipei and Beijing, was designed to provide a 360° picture of the DOI system, presenting the most recent developments in DOI services, while unveiling the aspects and applications less familiar to the general public.
Starting from the most well-known use cases of DOI for persistent citation and linking of academic publication and scientific datasets, the Conference moved on to the different contexts of applications and different content types, such as films and television assets, educational content, Massive Open Online Courses and Public sector information. The Conference demonstrated clearly the cross-sector nature of the DOI and how it has enabled targeted services.
With over 100 million DOIs now allocated to many kinds of content, the next step was naturally to put the DOI in a broader environment and to see how the existence of DOIs is beneficial for external projects and initiatives. The choice made this year was to open a window on one of the “hot topics” for the content industries: rights management and easy licensing.
Finally, to satisfy the geeks in the audience, each session was followed by a short “Tech Corner” to explain the technical solutions and the common approaches to support interoperability among services driven by DOI.
But let’s look at each session in more detail.
Session 1: Joining the dots in the academic and research environment
This session was dedicated to the use of DOI to create a linked ecosystem for the academic and research community and was opened by Ed Pentz, the Executive Director of CrossRef, one of the DOI Registration Agencies (RAs). CrossRef is a not for-profit cross-publisher membership association set up in 2000 to provide reference linking services. With 70 million DOIs registered, it is the largest of the current RAs. Although there are different DOI Registration Agencies providing different services, over the years several have put in place collaborations with CrossRef, and now a number of the RAs (mEDRA, Airiti, JaLC, ISTIC, PO) also provide CrossRef services, so that a greater number of academic publications can benefit from the citation-linking network.
Everyone in academic publishing knows the basics about “CrossRef DOIs”. Not everyone though knows the broad range of services that CrossRef has developed over the years; just to mention a few: Cited-by linking (know who has cited your publication); plagiarism screening; text and data mining and integration with ORCID (the identifier for researchers). One of the newest initiatives is FundRef: “a standard way of reporting funding sources for published scholarly research”. The FundRef Registry collects, identifies (with a DOI) and makes available names of funding bodies. When these are included in the DOI metadata of the publication, the funder and the publications become linked and funding bodies can easily track the published output of funding. Finally using standard identifiers and metadata, researchers, funders and publications are connected in a single linking environment.
But…what if any type of scientific content could be citable?
This simple question was the starting point for Jan Brase. If research data could be cited the same way publications can be, then the benefits for the scientific community would be huge. The visibility of the content would be improved. Re-use and verification of data would be easier and duplication of existing research would be avoided, not to mention that it would incentivise new research. And that’s exactly what DataCite does.
Founded in 2009, DataCite has registered over 3.7 million DOIs making primary data easily citable. With the awareness that science is carried out locally but outputs of scientific research have a global reach, DataCite has connected 350 data centres in a single networked infrastructure providing DOI registration on datasets and other non-textual information to local research institutions. But assigning DOI is not the end of the story. Research data can be linked to the researchers profile using ORCID and to the published outcomes of the research, thanks to the collaboration with CrossRef started in 2012 with a joint statement (including STM) to improve the availability and discoverability of research data, also through the use of DOI. This collaboration has deepened and continues to be a key factor to enable bi-directional linking between datasets and publications, and ensure that researchers can seamlessly navigate among all research results. This is an important topic now, as seen in the Research Data Alliance in which DataCite also plays a key role.
A practical example of how DOIs and services of different RAs can be integrated to support the operations of the academic community was then given by Susanna Mornati, who took the audience for a journey into the services integrated in CINECA’s Research Information Management Systems (RIMS).
RIMS are Linked ecosystems that support Italian universities and research centers with a variety of services from collection, management, preservation and discovery of research information to their dissemination, reuse, citation and impact measurement.
This level of integration has been accomplished through interoperability by implementing open standards, open technologies, ontologies and using unique and persistent identifiers for each type of information: DOIs and ISBNs for publications and datasets (integrating metadata from CrossRef, mEDRA and DataCite), ORCID for researchers profiles, and a project for the integration of the FundRef registry. Susanna described the example of IRIS (Institutional Research Information System), a solution adopted by many Italian universities to manage all the aspects of their research activities, including evaluation and review of research outputs.
Some dots have already been joined.
For the first Tech Corner, Geoffrey Bilder invited the audience into a world populated by bots living beneath the surface of our daily online life to explain how RAs have decided to provide different representations of metadata associated with their DOIs in content negotiation supported by the DOI system, to be used for citations, in linked data environments and in a wide range of applications. The richer the metadata, the more the innovative services.
Even the Daleks are convinced. Negotiate!
Session 2: Expanding the DOI to new context of applications
Let’s forget for a moment what you know about DOI and its massive success in the academic and research environment and enter into a complete new world of DOI applications.
Have you ever thought that Hollywood could benefit from the use of DOIs? No kidding, that’s what Raymond Drewry showed as he opened the second session of the Conference. The Entertainment Identifier Registry (EIDR) is a global registry built on DOI for the unique identification of movie and TV content, which allows a higher degree of automation and efficiency in operations along the supply chains. The EIDR Id in fact does not replace the multiple existing identifiers used in the audiovisual industry, but links them to a unique identifier, reducing the misidentification of content across different across platforms, workflows, and distribution channels caused by duplication and lack of ID uniqueness, and so reducing costs.
Three use cases were presented to show the benefits EIDR can bring to an industry that has to face an increasingly complex digital supply chain.
The first use case focused on the value of the EIDR id in applications across media windows (from the Theatrical to online distribution to from TV broadcast to Video on Demand services) to streamline consolidation of supply chain and performance data, reduce customer queries on deliveries, versions, assets.
The second use case explained the advantages of a direct relation between two players, with the example of one studio (Warner Bros) and one retailer (Microsoft Xbox Live), where the addition of EIDR to ordering, delivery, sales and royalty reporting resulted in a direct saving of 650 hours of manual work a year to fulfil the same operations. Multiplying that by the number of such two-player relations in the industry demonstrates a vast potential saving.
The third use case presented the results of including the EIDR Id into “avails” (a term that describes the available information about the time, location and business rules relating to an offer of an asset) for Google Play. With the use of the EIDR id, the processing time for reconciliation of information coming from multiple sources on 1,000 avails updates has been reduced from 50 hours to less than 30 minutes. Clearly, considering the tens of thousands of avails to be processed the benefits become impressive.
And now for something completely different…
Carol Riccalton brought the focus back to the heart of Europe and to the integration of DOIs into the activities of the Publications Office of the European Union (PO). This is the inter-institutional office whose task is to publish the output of the institutions of the European Union. Since 2004 it has also been the DOI RA responsible for DOI assignment on the EU institution content.
Through the PO, a huge amount of public sector information is published every year and needs to be as discoverable as possible for the benefit of all citizens of the European Union. Just to mention some figures, the Official Journal of the European Union is published daily in over 20 languages, in 2013 over 9,000 titles have been published in paper or electronic version and in more than 50 possible language (including Arabic, Chinese, Japanese, Russian), as well as around 450 thousands public procurement notices.
Over 1 million DOIs have been assigned to EU content, mainly to journal articles and general publications (books and monographic content), with DOI registration integrated in the publication workflows of the different author services early in the production process. There is a collaboration with CrossRef to provide citation-linking services. Increasing the visibility and citability of EU publications, both those directly created by the EU institutions and those created with the support of EU funding programmes, is clearly a strategic factor and the use of communities standards like the DOI is consequently endorsed and recommended within different EU initiatives, like OpenAire and Horizon2020 to further increase the use of DOI.
The next steps planned for the PO will be to explore the possibility of assigning DOIs to EU datasets, including the creation of links between publications and datasets, following the path opened by Datacite and CrossRef. To explore further connections/links with ISNI, ORCID, FundRef, and to maximize the synergy between the functions of DOI RA and of ISBN RA by integrating DOI and ISBN into a single citable identifier (ISBN-A), as currently offered by mEDRA in collaboration with the Italian and German ISBN Agencies.
With Chuuk Wei, the audience was taken far away from Europe in geographical terms but not so much in terms of the evolution of the global education environment. Airiti, the DOI RA in Taiwan, has been experimenting recently with the use of DOI to identify educational content in open online course, delivered either to local universities students via the university platforms or globally via Massive Open Online Courses (MOOCs).
The first use case is provided by the Taiwan OpenCourseWare Consortium (TOCWC) that hosts a platform for students to browse and search for online courses across 27 universities and 1 high school. The online course is created by the university professors and the course contents (presentation slides, class video, and other course material possibly including several lessons or sessions) are delivered by each university portal. The TOCWC platform assigns DOI both to the course and lesson level and aggregates the metadata for the central discovery service. For students the benefit is the possibility to access some information about the courses using DOI links also when the semester is over and they are looking for similar courses by the same professor. For the university system, it enables the possibility to include online courses in the list of works by the professor. In addition, the popularity of the courses can be taken in to account in the evaluation of academic performance. Moreover users that are interested in re-using course content can get rights information by querying the DOI.
The second use case concerns the application of DOI to MOOCs, in the framework of a project supported by the Taiwan Ministry of Education. Unlike the previous use case, MOOCs are courses designed for online learning environment and virtually anyone in the world could enroll in a MOOCs program and sign up to participate in the class actively. As part of course material, teachers prepare a reading list including papers and books. The integration with Airiti’s DOI database allows the automatic look up of DOI and metadata to be added to the MOOCs page, including licensing information about the conditions under which the material is accessible to MOOCs enrolled users. Other users, on the contrary, will not have access to the materials. If the content does not have a DOI (or the DOI does not support this service), Airiti can help the publishers with a comprehensive line of services, from digitizing the content to rights negotiation.
And now back to the future.
For the second Tech Corner Godfrey Rust (Rightscom) set out the theoretical principles behind the need to build a linked identification and metadata network supporting cross-domain interoperability and exploit network technologies to enable new services. For creations (content) and parties (people and organisations) identification and interoperability issues have been at least partially tackled, as seen also by the wide range of examples and services implemented within the DOI community and presented in the first two sessions of the conference. However, recently a strong need for additional identification standards and interoperability has arisen in the “new” area of rights information and rights management. Building on several earlier projects in this area from the EU, the Linked Content Coalition has been formed to promote interoperability in the rights data network: a network of authoritative linked data in which each of the entities (creations, parties, “rights”) is identified and the IDs are resolvable to a URL for some action, making content management function really effectively in the digital network.
An approach that the DOI community has implemented and that leads straight to the next session.
Session 3: How DOI can be used to support easy rights management
Following the lead provided by the principles of Linked Content Coalition (LCC), the third and closing session presented use-cases sharing a basic assumption: “If you don’t know what it is, you can’t talk about the rights”. Use cases which, although having their own specific objectives and business purposes are linked to each other by mutual collaboration and knowledge sharing and so require common “rules of the road”.
Paola Mazzucchi (yes, it’s me) opened the session with the joint initiative between mEDRA and CLEARedi, the Italian collective management organization for publishers, to facilitate rights transactions and offer online licensing options in collaboration with CCC. At the core of the initiative is the creation of an aggregation platform to manage the mandates given by Italian publishers for licensing their repertoire of textual works and the related data exchange. This has been identified as a real use case for RDI (see later) as it will be piloting the use of the LCC-based Common Rights Format, to support rights information exchanges between parties. At the same time it will promote the discoverability of the rights related to textual content through the use of rights-aware DOIs. The term “rights-aware DOI” is a specific application of the concept of context-aware resolution. This is the ability to automatically select one or more resolutions out of the available options for a DOI resolution according to the preferences of the user, thus linking a DOI to a context of use. Imagine a piece of content is identified with a DOI (a book, an educational course, a journal article or a video) through which it is linked to a range of different information and services (such as how to purchase or get a license to reuse the content, know whether the content is accessible for a type of impairment or just retrieve the metadata). In this way, the right service can be delivered to the right user.
An example of how to act here and now while thinking global and forward.
Thinking big is the aim of the Copyright Hub, a not-for-profit, permanent initiative based in, but not limited to, the UK that exists to facilitate content licensing using LCC and RDI solutions as well as to develop technologies to “help copyright work the way the internet works”.
Mark Hewis captured the audience with a 2 minute long video explaining what the Copyright Hub is and then moved on to present the strategic and technical approach adopted and to highlight possible synergies with the DOI system.
Technically (and simplifying), the Copyright Hub is made up of a core set of open source services that connect an eco-system of companies and third parties’ services thorough a routing mechanism of Rights and License Information between services in eco-system. All of this is based on identifiers and implementing interoperability between different formats using the LCC model. After a successful Alpha phase, the Copyright Hub has entered the Beta phase, where further synergies can be investigated, for instance with the DOI community in areas such as the use of DOI and its underlying Handle technology; the resolution “types” model discussed in the framework of context-aware DOI resolution; and the possibility to track more complex assertions of ID relationships beyond just “sameAs”.
An effective example of how tracking relations between identifiers and identified entities applied to the context of rights management was again provided by EIDR. Raymond Drewry stressed in this session that EIDR doesn’t know directly about rights, but by identifying works, cross-referencing other identifiers and tracking the relations between parts (for examples series and episodes) and versions (for examples in case of music changes, extended editions, voice talent for dubbing, linguistic versions, etc.) the use of the EDIR Id can support optimisation and automation in transactions related to rights management.
This could help EIDR members and users in their rights-related operations, for example in rights recovery, for the distribution to rightholders and in the distribution reporting chain or in communications about licenses, as in Google and Microsoft examples from session 2. Finally, film archives need rights information for programmes to make content more generally available and participate in Orphan Works projects.
For all these reasons, EIDR is part of the collaboration network linking all the initiatives included in this session.
Almost last but definitely not least Gabriella Scipione had the task to subsume all the different inputs and suggestions under the umbrella of two major projects in the field of rights management, demonstrating that the implementation of different services based on a shared vision is not only possible but it’s on its way.
FORWARD is building an ARROW-like infrastructure and discovery service for the audio-visual sector through an automated system that will search, harvest and process metadata from film archives and producers. Here the lack of pervasive use of standard identifiers to identify audiovisual archive materials increases the complexity of the integration process therefore an approach based on matching metadata describing works and parties from different sources is needed to link existing local Ids and deduplicate entities. And a synergy with the solution implemented in the EIDR registry is being explored.
RDI (Rights Data Integration) is building a prototype multi-media rights data “hub” implementing the LCC data model for metadata interoperability to enable content and rights data with any structure and controlled vocabularies to be mapped and integrated , either for querying or when dealing with results. In this case the key approach is to integrate data between domains using a linked Data methodology, maintaining the relation of the Standard IDs adopted with the other ids while mapping different standards and models to obtain interoperability.
These are two examples that also demonstrate how distributed databases linked through standard IDs can enable rapid implementation of new ideas.
The third Tech Corner was maybe the most mysterious and therefore fascinating of the conference.
Larry Lannom took the audience along an evolutionary line from simple hyperlinking, through DOI multiple resolution using web technology, to resolution using defined DataTypes. Resolutions can be of different “types”, describing different functions or services associated to a DOI, e.g. bibliographic data or rights information. A prototype DOI Type Registry exists describing, also in machine-processable format, the resolution types discussed in the framework of context-aware DOI resolution. Finally a zero-knowledge demo application has been shown to the audience as an example of the kind of applications that could be built by combining the multiple resolution capabilities of DOI with the new Type Registry, such as display all possible resolutions available for a DOI, and provision of contextual links to access the desired services. Practically: how to know from the outset what is available from the service network around a single DOI, before resolving it.
To sum up
A conference that was an example of how it all can be combined in a coherent picture, when we are able to effectively build synergies, exploit the work and the knowledge acquired in other projects and initiatives; and that DOI fits well into this picture.
All the presentations are available on the IDF website HERE.
About the IDF and the DOI Registration Agencies
The International DOI Foundation (IDF), a not-for-profit membership organization that is the governance and management body for the federation of Registration Agencies providing DOI services and registration, and is the registration authority for the ISO standard (ISO 26324) for the DOI system. The DOI system provides a technical and social infrastructure for the registration and use of persistent interoperable identifiers, called DOIs, for use on digital networks. Norman Paskin is the Managing Agent of the IDF.
102 millions DOI registered
Annual growth rate of 16.1%
2 billions resolutions each year
A robust technical infrastructure and a consolidated social and governance infrastructure
9 Registration Agencies worldwide:
- China National Knowledge Infrastructure (CNKI)
- Entertainment Identifier Registry (EIDR)
- The Institute of Scientific and Technical Information of China (ISTIC)
- Japan Link Center (JaLC)
- Multilingual European DOI Registration Agency (mEDRA)
- Publications Office of the European Union (OP)