According to tradition, the Frankfurter Buchmesse starts a day earlier for those who participate to the usual appointment of the International Supply Chain Seminar organized by EDItEUR. Speaking about tradition, the event comes back to the roots with its original format: after four years as a track within the Tools of Change and CONTEC conferences, this year’s 37th International Supply Chain Seminar was relaunched as an independent event on the Tuesday before the Fair.
Together with the Frankfurter Buchmesse and the International ISBN Agency, TISP partnered this event that had diverse contributions from different actors in the book value chain: from publishers to books in print services, from experiences in the R&D field to innovative services for content enrichment, the programme went through stimulating contributions, issues and possible solutions, with a particular focus on standards for books and e-books metadata communication – the trade-mark of EDItEUR initiatives.
Graham Bell, executive director of EDItEUR, welcomed the participants, noting the delegates in the sell-out audience represented 17 different countries and all parts of the book and e-book supply chains. He thanked sponsors and partner projects and introduced the first session on the use of metadata standards, particularly Thema, the new and expanding global subject classification scheme for books.
The keynote speech “Metadata and standards: the good, the bad and the counterfactual” was delivered by Jonathan Nowell, president of Nielsen Book, a subsidiary of the world largest research business Nielsen and a metadata supplier, sales measurement and consumer research centre for the book industry. Nowell identified some key benefits and challenges related to standards adoption in various segments of the publishing sector (“good” and “bad” in his presentation). Among the identification standards, the ISBN is certainly “good” as it drives supply chain efficiency, facilitates metadata management, makes sales measurable and is adopted globally – despite some black marks in its history when it has been misused (for example under the Franco regime in Spain) as a means of censorship, to set controls on access to the book market.
The ISBN history proved that the most successful standards are those that best meet supply chain needs, and that are able to adapt to user requirements changing over time.
In the STM publishing, the DOI (Digital Object Identifier) identification system has been widely adopted and successfully implemented in the CrossRef system as the main standard for citation linking; however, the DOI story shows how market fragmentation can be an obstacle to standards take over, as proved by its slower diffusion in HSS (Humanities and Social Sciences) disciplines compared to STM subject areas.
As for bibliographic standards, ONIX for Books transition from version 2.1 to 3.0 is a fundamental point for proper description of digital publications, and the use of Thema, representing the “lingua franca” of books classification, is doubtless one of the most effective acquisitions.
What can threaten the value of metadata to the industry is the way metadata is sometimes managed: as showed in a white paper published by Nielsen on the link between metadata and sales, good metadata and proper use of standard identifiers drive sales, increase discovery and facilitate sales measurement. As a consequence of this key role, its curation must be well planned and delivered by qualified personnel who know the books well – not given to the most junior person in the office. Finally, cooperation among players is a good factor as long as they share metadata and avoid proprietary schemes.
As for the counterfactual, argued Nowell, what would life be like without metadata and standards? In one word, it would be chaos.
From a high level approach to the metadata issues, the panel moved to some practical implementations. Howard Willows, senior manager of data development at Nielsen Book, talked about “Thema: a history and overview”, starting from its early steps: from the first discussion at the London Book Fair in 2011 to a joint coalition of EU, US and Canada partners at Frankfurt Book Fair in 2012, that laid the foundation of a global standard based on a multi-lingual, internationalised and elaborated BIC classification scheme that saw the light in late 2013 and is currently managed by EDItEUR. The great benefit of using Thema emerged clearly: instead of maintaining many one-to-one mappings between national or proprietary schemes, Thema’s global approach allows to each local scheme to be mapped to a central standard, that is also structured to accommodate national extensions: in fact national groups participating to Thema development can add codes to the scheme so that players implementing Thema can “look local and act global”. An alternative to the so called “map-o-rama” between national or specialist schemes would therefore be a common language trading partners can refer to, being able to communicate unambiguously.
In view of an increasingly international book trade, the ability to exchange data in an interoperable fashion becomes more and more important, and Thema is designed to interact with other standards besides ONIX: MARC (one of the most widespread bibliographic standards for libraries) recognises Thema categories too.
Among the international players that have already adopted Thema as their primary classification scheme, VLB, the German books in print (BiP), has fully integrated the standard in its systems: Ronald Schild, CEO of MVB, gave a presentation about “Establishing Thema in the German language book market: the stick and carrot”. After an overview of VLB, that provides a single point of access for the book trade in Germany by managing a number of data flows between trading partners, Schild explained how they manage the data quality in VLB system, basing on solid criteria of metadata timeliness, completeness and consistency. Publisher interface on VLB allows for an accurate process of metadata communication: besides the basic bibliographic description, also enriched semantic metadata and Thema categories can be completed, and automated procedures of data cleaning, validation and normalization enable data quality check.
The “stick” consists in a higher price per title for maintenance in the BiP if the publisher doesn’t provide good and rich metadata with subject classification or forgets to regularly update title availability. But that’s an affordable effort comparing to the added value services that publishers can benefit from: VLB-TIX, for instance, collects in a separate database all the newly published titles, each with promotional materials and enriched metadata that are sent to booksellers for a targeted browsing, selection and ordering. This way, new releases gain in discoverability and in the same time booksellers are educated in using Thema, included in the metadata records, as search criterion. The second “carrot” is VLB services, providing semantic enrichment to book metadata: all semantic information extracted from book records is matched against a set of thesauri and automatically completed with Thema categories, so that publisher records are improved with added value information. It seems that the “stick and carrot” method works well, concluded Schild, as long as publishers understand the value of having rich metadata and perceive the disadvantage of not using Thema in their business.
The stage of adoption of Thema and “The supply chain in Turkey: Potential and possibilities” have been reported by Merve Okçuoḡlu, representing the Turkish Publishers Association. Despite the recent establishment of the Publishers Association, the Turkish book market is well structured, counting 1732 publishers, about 6000 retail outlets, 145 distribution companies, a considerable amount of titles issued in 2014 (more than 50,000) and a high number of copy sales (more than 560 millions): figures that show how the market is increasing and thus in need for a consistent way of managing the book trade with standard information formats able to interoperate among different systems. But this goal struggles to be achieved: for instance ISBN and anti-piracy bandrole data are collected by different bodies, that lack of data integration. Thema can surely be part of an improvement in data management and it is actually being implemented and translated, although the adaptation to local subject headings and the full adoption by the players of the supply chain (both publishers and distributors/retailers) is still a hard process.
The second keynote speech by Andrew Weinstein, VP of Content Acquisition at Scribd, focussed on the illusory dichotomy between “Ownership or access – What do consumers prefer?”. Starting from the consideration that consumers are transitioning to access model in all the content fields (music streaming, VOD etc.), nevertheless Weinstein stated that access is actually an additional business model, not a replacement of content ownership, and he brought some arguments to this assumption. First, subscription brings added value to discovery, making available backlists and rare titles and influencing discovery methods through recommendations. Second, a research on reading and access models by Nielsen shows that US subscribers spend more on books purchase than non-subscribers. Moreover, the impact of subscriptions on the supply chain would be frictionless if publishers’ upstream systems would adopt standards like ONIX, which supports access models but is often not fully integrated in publishing workflows. So what can publishers take away from consumers reaction? It seems that the reading activity is more distributed across a wider base of titles like those offered by subscription services, that should therefore drive sales to titles that other channels don’t, and represent a new sales window bringing direct benefits to reading and to the market as a whole. Subscriptions and purchases are not mutual exclusive choices: instead of being an either/or, subscriptions offer an either/and reading solution increasing the revenue opportunity for publishers.
So in the end, what do consumers prefer? Weinstein said: both!
After the networking break, the conference entered the “hot” session on ONIX for Books best practices and implementations. Graham Bell opened with an overview of “ONIX: transitions, extensions and updates”, where he stressed the importance of standards as critical enabler for the book business. From the roots of ONIX and its first release in 2000 to the current status of adoption, ONIX has grown in importance in the major book markets worldwide and is widely used in US, Canada, Western Europe, parts of Eastern Europe and Russia, Korea, China, Japan. It is worth remarking its broad interoperability and ease of integration with other metadata sets (e.g. BISAC and Thema subject schemes) as well as its evolution following the market requirements: transition from version 2.1 to version 3.0 reflected the critical introduction of digital publications in the book trade, and the current BISG working group mapping of key ONIX data elements to Schema.org mark-up language for Web pages highlights how book data communication can adapt to the semantic aspects of the Internet.
A valid set of recommendations for best metadata use has been provided by Patricia Payton, senior manager provider relations for ProQuest and Bowker, who presented “Metadata and discoverability: key data and common problems”. With the aims of improving accuracy of product data, increasing efficiency between trading partners, and making content more discoverable, she identified a number of issues and shared a series of good practices to be applied to the main metadata fields of bibliographic records. Just to give a few examples:
- the product title is critical for discovery and must not include other data (e.g. subtitle, edition, format, other values);
- the contributor names must be correctly structured (e.g. first name separate from family name) and displayed for sorting purposes;
- the audience range is an important indicator that has to be combined consistently with subject classification: it is critical for purchasing decision and conflicting data may lead to sale loss;
- subject categories must be specific and consistent between print and digital format: retailer budgeting, merchandising and marketing plans, as well as drivers for consumer discovery come from a proper use of classification schemes, as demonstrated all along the conference by other presentations.
These and many other suggestions came up from Payton’s speech, that was not a mere list of instructions, but an effective set of criteria to efficiently manage book product information for a smoother and profitable trade.
A very good example of metadata best practices and technical developments based on standards applied to the publishing workflow has been given by Gregorio Pellegrino, consultant and production supervisor at the Italian publishing company Effatà Editrice, who showed how “My publishing house is ruled by bots”. As a consequence of a precarious economic situation, explained Pellegrino, the company decided to reorganize the publishing process in order to optimize the procedures and reduce the management costs. The main issues to be addressed were the high costs of unsold books (implying both print and stock management costs) and the complexity for a small publisher to “act like a firm”, able to forecast the sales volume and to write off eventual extra expenses.
The solution has been found in a reengineering process based on automated actions and on the use of ONIX 3.0 to structure book metadata communication internally and externally: a central platform has been developed to automatically monitor the stock level, the print orders according to the actual demand received by distributors, books movement in the warehouse, and also the editorial agreements and the accounting department.
The database is connected to a metadata manager based on ONIX that centrally handles the e-commerce website importing product data in ONIX 3.0, which ONIX elements must be mapped to Schema.org to enrich the semantic of HTML content, and the reporting platform. Of course other developments are still underway, but the considerable results obtained (reduced costs and increased profits of 45% in 10 months) give evidence of how the use of standards facilitates production and monitoring, being also a flexible enabler for the communication with trading partners.
The “technology gradient” of the session grew towards the presentation of various solutions for the semantic enrichment of content applied to the publishing industry. Sam Herbert, founder and client services director of 67 Bricks, explained in his “Semantic enrichment: the promise and the practicalities” how they work with publishers to make their content more structured, granular, reusable and flexible for enrichment with extra information. He presented the case of a medical text where top score keywords are captured and an entity identification process runs according to the underlying taxonomy and knowledge model; entities are measured for statistical trend analysis, classified and related with other chapters to build a “semantic fingerprint”. This can be used to enrich other contents to be published, or to produce external resources like marketing materials, showing how the addition of extra meaning to content can improve production processes and discoverability of products.
Another use case of semantic technologies has been provided by Frank Salliau, senior researcher at the Flanders digital research and entrepreneurship hub iMinds. His experience with “FREME and Storyblink – Research applying semantic technologies in book publishing” offered an interesting perspective of R&D applied to the publishing industry. The first project presented, FREME (open Framework of E-services for Multilingual and semantic Enrichment of digital content), aims at building an open framework of e-services for multilingual and semantic enrichment of digital content with data available on the Web: through open source APIs users will be able to exploit open technologies like named entity recognition (to recognise, link and classify entities in multilingual texts basing on linked entity datasets from Linked Data and Open Data), translation suggestions, a cloud authoring tool exporting in ePub format with SEO optimizations for content discovery. The general idea of FREME is to make it easier for content creators to use technologies still considered complex due to lack of awareness and skills, and to integrate them with content and metadata, thus creating a link between the research and the industry.
The second project, Storyblink, is a proof of concept for content-based exploration through linked data. Concepts are extracted from full text content through Natural Language Processing technologies, classified and connected to Linked Open Data sources like DBpedia. This process improves semantic relations between books discovering new and unexpected connections, with possible applications to recommendation engines also for movies, music and any entity having a linked data representation, as well as to recommendation systems based for instance on Facebook users’ interests.
The conference came to conclusion with a happy celebration: in occasion of the 50th anniversary of standard book numbering, Stella Griffiths, executive director of the International ISBN Agency, briefly sketched the story of the most popular book identifier: from the ex wartime code breaker who elaborated a 9-digits code with immediate success in the UK industry, to its international standardization in 10 digits, and finally to the 13-digits current form defined in 2007, the ISBN has brought unquestionable benefits to the supply chain, counting 151 registration agencies, more than 1.5 million publishers and about 60 million ISBNs assigned worldwide. A story of success celebrated at the end of a conference (curiously held one day ahead of the World Standards Day) where the standards for the publishing sector worthily act as leading character.