Insights from the ALPSP International Conference


ALPSP CEO Audrey McCulloch welcomed representatives from sixteen countries at this year’s ALPSP (Association of Learned and Professional Society Publishers) International Conference (the eighth), held at the Park Inn, Heathrow, 9-11 September. Delegates also included members from other trade associations, including The Publishers Association and the Publishers Licensing Society (which was one of the sponsors).

Keynote: Anurag Acharya, Google Scholar: What happens when your library is worldwide and all articles are easy to find?

The ubiquity of specialist information

Acharya, co-creator of Google Scholar, took as his topic What happens when your library is worldwide and all articles are easy to find? He compared the library of 1990, when all documents were in print format and what the shelves could hold was what the user could read, with the library of 2015, when almost all journals are available online, a large proportion of archives are also online and ‘anyone, anywhere’ can find specialist information.
He said the ‘distribution pyramid’ has flattened. Relevance ranking allows all articles to rise in frequency of use by merit alone, and full-text indexing ensures that every part of an article is accessible. ‘If you can get on line, you can find it.’

The impact on researcher behaviour

These sea-changes in availability and accessibility have had a major impact on how researchers behave. There has been a tremendous growth in queries, not only producing many more users, but many more queries per user. The largest growth of all has been in keyword or concept queries. The average length of a query has increased from one or two words to four or five words. Users have to know more precisely what they are looking for in order to get the right results. Most queries are unique and many are no longer limited to the researcher’s own area of specialism. Relevance ranking therefore now embraces a mixture of expert and non-expert queries. There has been a sustained growth in related-area queries.

Users prefer full text, which means that abstracts offering links to full text documents are selected much more frequently (even if the user doesn’t actually go on to read the full text). PDF files are extremely popular when the user wants the full text, because they replicate the book on the shelf and reassure the user that, if downloaded, the information will still be available on the next day (whereas on line documents might ‘disappear’).

The impact on citation patterns

Citation patterns have also changed. Not all of the most cited articles are now from the most elite journals. Google has made a study of articles published over the last 10, 15 and 20 years across 261 subject categories, and concluded that older articles are often cited more frequently (which offers a powerful argument to publishers for digitisation of archived and backlist material). These findings suggest that increasingly every article stands or falls on its own merits, since all are equally easy to find.

The broader context

Acharya emphasised that his comments should be taken in context. The top 10 journals still publish 75% of the top articles: but there is a clear, consistent trend to embrace the long tail. ‘The elite are still elite, but less so.’ This gives authors more choice: they may decide, for example, that they prefer to be published in a journal that can get their work out quickly rather than in one that is prestigious but much slower.

He was also eloquent in emphasising the need for accurate and well-articulated abstracts. He said that abstracts were part of the essential filtering process that users have to engage with now that so much material is available. ‘Forcing full text on early-stage users is not useful.’ He argued that usage analytics should concentrate more on user workflows, rather than on analysis of full-text usage, Open Access, etc. Abstracts have long been written for a broader audience than specialists and have a unique role to play now that so much research is cross-disciplinary.

Conclusions

Finally, he concluded that we are lucky to live in an era of information plenty; lucky to live in an era of connectivity; and lucky to live in an era of rapid change. He said that Google Scholar only has one goal: to make it easier for people solving difficult problems to do more.

This keynote talk was impassioned in places and certainly opened the conference with éclat. However, there were some large moral and financial issues which were not addressed, either during the presentation or in the questions that followed it – in part because Acharya was very clear about in which areas he was prepared or qualified to engage in debate.

Keynote: Kuansan Wang, Microsoft Research, From web publishing to knowledge web publishing

Wang, Director of the Internet Service Research Centre at Microsoft Research, noted that this year marked the 25th anniversary of the launch of the World Wide Web and the fifteenth anniversary of the ‘semantic web’. He went on to explain the differences between the semantic web and the more recent ‘knowledge web’.

The difference between the semantic and the ‘knowledge web’

The semantic web distinguishes between human readable and machine readable content; humans define the standards for the data formats and models used; and an explicit and precise specification of knowledge representation has to be agreed upon by everyone involved.

The knowledge web enables machines to read human readable content; the machine learns to conflate different formats of the same thing; and the machine is able to interpret ‘latent and fuzzy’ representations of knowledge by mining big data. This has major implications for the application of search technology in the academic domain. One way to train future intelligence (i.e., machine intelligence) to read is by training it to write.

When on line research is being carried out, ways have to be found to recommend methods of completing seldom or never foreseen queries; to rank suggestions; and to avoid making suggestions that will lead to no or bad results. Programming computers to do this is complex.

The rise of the proactive research experience

Wang said that up to this point what he had been discussing was the ‘reactive research experience’. Microsoft’s Cortana is a system that allows proactive suggestions on Windows Android IOS. Windows 10 allows Microsoft to monitor user behaviour (he emphasised that this application can be turned off). Essentially, it restores the ‘personal assistant’ that was a feature of earlier versions of Windows. It can help groups of professionals – e.g., doctors – to find journal articles that typically they should read for the particular condition they’re interested in. Wang added that sometimes at this point the user might encounter a paywall, and made a plea to publishers to help with this. He said that, taken to its logical conclusion, this facility could empower every person and every business to achieve more.

Microsoft’s areas of focus

He went on to pinpoint the areas in which Microsoft is most interested in creating business cases and making money. These are ‘imagined productivity’ (which includes academic publishing); more personalised computing (e.g., Cortana Academic, which creates a user profile); and ‘Most Intelligent Cloud’ (for example, ‘Project Oxford’, which has developed APIs for vision and speech).

The disconnect between the plea to publishers to make more material available free and the frank description of areas in which Microsoft hopes to make more money did not escape the notice of at least some members of the audience.

Wang concluded by describing the Microsoft Academic Graph [MAG]. More information about this can be found at
orcas@microsoft.com.

Windows onto different worlds

The audience responded warmly to Wang, who was an engaging speaker. Unsurprisingly, he was questioned about some of the practices that he described, especially the monitoring of user behaviour by Microsoft. He said that ‘take-down’ notices were always acted on immediately by the company and could be issued by anyone, not just celebrities.

These keynotes were fascinating and disturbing in equal measure. The audience was grateful to both speakers for sharing their thoughts and plans so candidly; this revealed that quite a different mindset pertains within their huge organisations from that which is ‘normally’ held by academic publishers.

Key points from the keynotes:

  • The ‘distribution pyramid’ has flattened. Relevance ranking allows all articles to rise in frequency of use by merit alone, and full-text indexing ensures that every part of an article is accessible.
  • There has been a tremendous growth in queries in Google Scholar, not only producing many more users, but many more queries per user.
  • Most queries are unique and many are no longer limited to the researcher’s own area of specialism.
  • Citation patterns have also changed. Not all of the most cited articles are now from the most elite journals.
  • Abstracts are part of the essential filtering process that users have to engage with now that so much material is available.
  • The semantic web distinguishes between human readable and machine readable content; humans define the standards for the data formats and models used; and an explicit and precise specification of knowledge representation has to be agreed upon by everyone involved.
  • The knowledge web enables machines to read human readable content; the machine learns to conflate different formats of the same thing; and the machine is able to interpret ‘latent and fuzzy’ representations of knowledge by mining big data. This has major implications for the application of search technology in the academic domain. One way to train future intelligence (i.e., machine intelligence) to read is by training it to write.
  • When online research is being carried out, ways have to be found to recommend methods of completing seldom or never foreseen queries; to rank suggestions; and to avoid making suggestions that will lead to no or bad results.
  • Windows 10 allows Microsoft to monitor user behaviour.
  • Areas in which Microsoft is most interested in creating business cases and making money are ‘imagined productivity’ (which includes academic publishing); more personalised computing; and ‘Most Intelligent Cloud’.

This article was originally published on APLPC Brief, the Publishers Association’s academic and professional newsletter.

Leave a Reply

Your email address will not be published. Required fields are marked *

I have read the Privacy Information Notice and I give my consent (to my data processing)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>