Ontotext

Sector ICT for academic, educational and non-fiction publishing
Target Users Academic, educational and non-fiction publishers
Country Bulgaria, United Kingdom and USA
Dimension International
Nature of the initiative Private
Contact Jarred McGinnis
Head of London Office
jarred.mcginnis@ontotext.com
Link

Summary

As humans, we are interested in things in the world: people, places, events. Content management systems try to approximate these interests by putting content into broad topics and categories like Sport, Politics, History, et cetera, but these are gross mechanisms and don’t fully represent what the content is about, whether it be a traditional book, website or digital product. The semantic approach enables users to explore the content in a more intuitive way and at a finer granularity of detail.

Ontotext was one of the first organizations in the world to recognize the potential power of semantic-driven technology. Ontotext is unique amongst its competitors by bundling a robust enterprise semantic database with natural language processing (NLP) software, both notoriously complex technologies.

The semantic database of Ontotext, called OWLIM, organizes data via a description of things in the world and their relationships. As the world changes, so can the database’s description of the world and the associated data. This enables the users of semantic databases a flexibility and dynamism that was not possible before.

Semantic databases require a significant amount  of expertise and commitment to create the descriptions of the world, i.e. the ontology, and to migrate data and content to the structured formats required to populate the database. There are very few organizations that have the resources and know-how to achieve this. Ontotext developed software so that publishers could deal the inherent messiness natural language content.

By leveraging natural language processing algorithms with a semantic database, publishers create a virtuous cycle of enriching content, improved precision and recall of the NLP pipeline and increased wealth of data within the knowledge base. The result is an increasingly automated system performing knowledge tasks that previously required the attention of expensive knowledge management experts. This expertise can now be diverted to adding value to content rather than spending time doing maintenance and curation.

The use of semantic technology is a new but proven approach for publishers to address this issue.

Ontotext creates software and services that use semantic technology to bring together metadata and content to search, navigate and analyze information in more productive ways. Their clients include world renowned media agencies like the BBC and the Press Association, top pharma companies such as AstraZeneca, important government agencies including the US DoD, The National Archive of UK, US Medicare, and cultural institutions like the British Museum.

Business needs

The implementationof semantic technologies answers a number of business needs in publishing companies:

Automating 3rd party content use and automated rights management. A client comes to a publisher of history books with a brief or the publisher themselves know there is demand for a 'History of the Roman Empire' title. To fill the brief, the publisher will have to commission some brief biographies of Roman emperors, campaign maps, etc. Rather than spending money rewriting about the 'reinvention of the wheel', it would be better to make use of content that already exists. Some of this content, the publisher will have internally. However, some of this material already exists, but with another non-fiction publisher. The current process depends on personal relationships between the organizations. Semantics could enable the sharing of content at a level of detail where the publisher could make use of content from others enabling that publisher to take advantage of a commercial opportunity and the other publishers to monetize archival content.

This approach can be seen in the Newz project commissioned by the members of NDP Nieuwsmedia, a trade association of the major Dutch newspaper publishers to share content and harmonize the semantic metadata across their industry. It enables them to share the burden of disambiguating new entities, improving NLP-pipelines, create new products and enable 3rd parties to use (and pay for) their pooled content.

Real-time knowledge-based content integration. Time and effort must be spent mapping external content and its format to internal formats. Every time there is a change (or an error) on either side that effort must be repeated. Semantics provide the opportunity to standardize around what the content is about and how the consortium and their customers are using it rather than what input format a particular system expects.

For example, a business and financial publisher will be getting information from analysts and exchanges, each with their own and probably an XML-based representation of the same things (e.g. companies, board members and financial instruments). With the use of Ontotext's solution and semantic disambiguation, data about the same concept can be federated regardless if it is referred to by its NASDAQ ticker symbol, its legal name or even nickname (e.g. IBM is also known as Big Blue).

Automating knowledge management and reducing content silos. Organizations that are large enough to support divisions will inevitably have content silos and a lack of communication across those divisions. Just as this approach simplifies exchange of content between organizations, a large publisher can benefit from facilitating the exchange/use/monetization of internally created content. It is inevitable that a publisher will have content hidden in databases, stored using a markup language only used locally as well as geographical, cultural and managerial divisions. All of these make it hard for one part of an organization to utilize or repurpose content created somewhere else in the organization.

For example, an Australian subsidiary of a large publisher of educational textbooks has produced materials that also satisfy the learning outcomes of its US subsidiary for a given subject. Without a semantic representation of those concepts and how they relate to learning outcomes, it will be impossible without considerable manual effort to judge the Australian material to be appropriate for the American market. With Ontotext's semantic knowledge base and text analysis, it is possible enable faceted searches for the concepts important to the organization or tools that highlight relevant content regardless of formatting or source.

Solutions

One example of Ontotext technology providing a solution to the business needs described above is the work with the BBC.

Four years ago, the BBC used Ontotext technology to deliver a new approach to publish and manage their content. An approach described as revolutionary by their Chief Architect, John O'Donovan who has recently been appointed as CTO of the Financial Times. Initially, Ontotext technology was used to deliver dynamic content for their World Cup site, which already had more index pages than the rest of the BBC Sports website. Since that initial success, different divisions of the organization have adopted this approach for their own content.

This has culminated with the development of the BBC's Linked Data Platform in January 2013. The BBC's Linked Data Platform, which has Ontotext technology at its core, is the next generation for dynamic semantic content. The BBC as an organization has the strategic vision to integrate content and products across the entire BBC.  This is not a trivial feat for an organization of £5 billion in revenue and 23,000 employees, but the BBC has recognized the significant strategic in being able to deliver products and content in an intelligent and at a finer grain of detail than currently possible. Rather than being limited to one division or genre of content such as sport, TV programs or music, Ontotext technology has enabled the organization to create products, reuse content across divisions and enhance the visibility of BBC content through better SEO while reducing production costs. The innovation of the LDP and the use of Ontotext's software is the move from an organization thinking they publish pages to the understanding that they publish content about things in the world such as people, places and events. This innovation is not to be underestimated in a world where for 500 years content appeared solely in the form of physically published printed pages.

The BBC is one example of the many clients that Ontotext has helped push semantics, sometimes described as Web 3.0, to the tipping point of adoption. Many technological forecasters such as Gartner have reported that that tipping point has been reached. It is a testament to Ontotext's foresight, ingenuity and engineering rigor that a relatively small Bulgarian software company has had such a huge impact on the global Internet.

The role of technology

Ontotext's Semantic Database OWLIM. Databases have been around in one form or another since 1960's. The technology behind relational databases has gone through countless iterations of refinement, improvements and optimization. It is only relatively recently that performance limitations have pushed organizations to look for alternatives such as the use of semantics.

The semantic database of Ontotext, called OWLIM, organizes data via a description of things in the world, what interests users, and their relationships amongst each other. Through this description of the world and organization of data to conform to that description, the database is able to infer new information, automatically and dynamically. Additionally, as the world changes, so can the database's description of the world and the associated data. This enables the users of semantic databases a flexibility and dynamism that was not possible before via traditional approaches.

Semantic Tagging and Extraction. Semantic databases require a significant about of expertise and commitment to create the descriptions of the world, i.e. the ontology, and to migrate data and content to the structured formats required to populate the database. There are very few organizations that have the resources and know-how to achieve this. Ontotext recognized their need for a software framework marshaling a number of components to process text and produce useful semantic metadata.

The process is not unlike the refinement of oil. The raw resource, human-readable text, is put through a number of text-enrichment steps until it produces a number of refined products that has a calculable value to the organization. The refinement of text includes the identification of the parts of speech, the identified entities (e.g. people, places, organizations, etc.), the extraction of relationships between entities and algorithms for disambiguation. The output is semantic metadata, which can drive products and services such as improved discoverability and a finer-grained understanding of the content and organization produces.

This also enables organizations to make use of the raw text that exists out in the 'wilds' of the Internet. By the addition of semantic metadata, It is possible to ingest and incorporate information from a variety of sources in a reliable and unambiguous way.

A number of Ontotext clients have used this approach to create new revenue streams, but it is when semantic tagging and a semantic database are used in conjunction that the technology's advantages are maximized.

Semantic Database + Semantic Tagging = Publishing Platform. By leveraging natural language processing algorithms with a semantic database, organizations create a virtuous cycle of enriching content, improved precision and recall of the NLP pipeline and increased wealth of data within the knowledge base. The result is an increasingly automated system performing knowledge tasks that previously required the attention of expensive knowledge management experts. This expertise can now be diverted to adding value to content rather than spending time doing maintenance and curation.

Ontotext

Results obtained

The theoretical underpinnings of the semantic web were developed as early as the late 1970's. However, the adoption of Web 3.0 has been slow despite being recognized as the inevitable next developmental stage of the Internet. Ontotext technology has been key for enabling a number of organizations with global reach to take advantage of semantics. The company has taken a technology largely seen as too academic and ungainly and demonstrated with sound engineering and long term commitment that the semantic web was possible and demonstrated its business value.

This is having a profound effect on the technological landscape, but it is also changing work patterns for publishers. Semantics is enabling traditional publishers to navigate the new digital environment and automate knowledge tasks that were too expensive or time consuming. With Semantics, more time is spent creating content and therefore value and less time is spent managing and processing it.

Ontotext is building upon its tradition and reputation for innovation. As the Semantic Web continues is exponential growth, the problems of big data are becoming more urgent. Ontotext is ensuring their portfolio of semantic products is able to deal with the volumes, variety and velocity of metadata that its customers expect. As Web 2.0 was the social web, the semantic web is considered Web 3.0, it will be vital that there is continuity in the technology to support this evolution.

Most recently, a number of publishers have taken advantage of the benefits of Ontotext's knowledge base, semantic-driven technology and natural language processing (NLP).

Leave a Reply

Your email address will not be published. Required fields are marked *

I have read the Privacy Information Notice and I give my consent (to my data processing)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>