Big data. Googling this term gives us about 222.000.000 search results. It is one of the most widely used buzzwords of the last years, often used in phrases like “Big data is the oil of the future!” or “Big data is the Holy Grail”. But what is Big Data all about? And how big is big?
Billions of people as well are connected via the Internet, consuming and generating huge amounts of data: Facebook messages, Youtube clips, Snapchat pictures, etc. During the minute that you are reading this paragraph people upload 300 hours of video on Youtube. During that same minute Netflix streams more than 77.000 hours of video to its customers and people all over the world like over 4 million Facebook posts.
Recently, machines have found their way to the Internet as well, via the Internet-of-things or IoT (another buzzword). A rapidly increasing number of machines generate ever increasing amounts of data: A wind turbine can easily generate almost 600 GB of sensordata per day.
These examples give us an idea on the volumes that make up Big Data. Typically, we talk about Big Data as being characterized by “the four V’s”: Volume, Variety, Velocity and Veracity. Volume refers to the scale of the data, as mentioned above. We are no longer talking about megabytes or gigabytes, but about petabytes and exabytes (1021 bytes, a one followed by 21 zero’s). Variety refers to the diversity of the forms of data: numerical sensor data, textual data (e.g. tweets), video data etc. are being gathered, combined and analysed.
Velocity refers to the sheer speed with wich data is being generated: 50.000 GB per second is the estimated rate of global internet traffic by 2018! Lastly, veracity refers to the importance of the trustworthiness of data. Poor data quality leads to distrust of the information used to make decisions, leading in its turn to substantial financial losses in the economy.
Handling Big Data requires new methods of storing, processing and analysing data. It requires scalable, distributed architectures with considerable computing power that can perform well with these huge amounts of data.
Big Data is not a goal in itself. The goal is to generate useful insights and to learn from this data. To give an example: Netflix, the video streaming service, analysed their customers viewing behaviour and came to the conclusion that people who liked movies with Kevin Spacey tended to like movies directed by David Fincher (The Social Network) and also liked the British original “House of Cards” series. This made them decide to create the new “House of Cards” series featuring Kevin Spacey and directed by David Fincher. House of Cards was one of the biggest success stories of Netflix.
So what can Big Data do for the book publishing and retail industry? Quite a few things actually.
The best known example of a player in the book industry that uses Big Data is Amazon. Their book recommendation system is based on data analysis of customer shoppping behaviour. Big players like Amazon have the advantage of scale, making their recommendation algorithms more efficient and valuable.
Digital reading also offers opportunities to gain insights in reading behaviour. Kobo sells e-readers and digital books. The e-readers are equipped with software that keeps track of the readers behaviour: which books are being bought but remain on the shelf? When, where and how long do people read? In which chapter do they abandon a certain book?
All this data is being stored – with approval of the customer – and analysed to generate useful business insights. One of the results the data scientists at Kobo look at is the following: books with a high engagement level, meaning that a high percentage of readers tend to finish the book quite quickly, but with low sales figures get a marketing boost in order to improve sales.
Even small bookshops can benefit from data analyses, albeit on a smaller scale. Keeping customer records in CRM software enables them to have tailored marketing campaignes at their customer segments. Especially in small markets, book retailers and publishers can benefit from working together, e.g. by investing in a joint platform where data from the different participants are gathered, combined and analysed in order to create business intelligence that in its turn can be used by all participants. As Aristoteles wisely stated in ancient Greece: “The whole is greater than the sum of its parts.”