E-science: Context of big data and analytics for knowledge societies

15 Jun 2017 11:00h - 13:00h

Event report

[Read more session reports from WSIS Forum 2017]

The moderator of the session started by highlighting the importance of data in many important documents, such as the Global Sustainable Development Report, published in 2016. The world has become extremely data hungry, but we are not necessarily data-savvy.

Michelle Woods (World Intellectual Property Organisation (WIPO)), explained the links between intellectual property and big data. Big data is usually digitised data, such as scientific databases, which can be copyrighted. In addition, cultural information could be put in digital format in order to preserve cultural heritage, raising the issue of control over traditional knowledge, for example. In the digital context, discussions on copyright are starting to take on different angles. Some believe that copyright should follow a system that requires registration and renewal. A life-long and post-mortem protection for 50 or 70 years does not seem to be adequate in the current digital scenario. Governments are also discussing if publicly funded research should be made available, together with the underlying data, under open licences. Exceptions and limitations to copyright could be used to foster access to databases, digitisation, data-mining, uses by educational institutions and access by libraries.

Maria Fasli (University of Essex) opined that there are many ways in which big data analytics can help with implementing the sustainable development goals (SDGs), not only in monitoring, but also in policy development and evidence-based policy-making. Nevertheless, the field has significant conceptual misunderstandings that should be clarified:

  • Data science is a new multidisciplinary discipline.
  • Big data analytics is the use of advanced statistical and quantitative methods to understand data.
  • Big data: structured and unstructured large volumes of data that cannot be processed/analysed with traditional methods. It should be remembered, however, that big data is not knowledge. Information has meaning and shape, and comes from the analysis of data.

The right frameworks, standards and institutions need to be in place to support open e-science, in particular for the SDGs. There are specific platforms and toolkits that help in the process of data mining, and the infrastructure for analysing data is starting to be shared among organisations. Data scientists need to have the necessary skills, tools and methods, which range from statistical methods to computer science and cultural understanding.

Big data also needs to be used with caution. Firstly, the use of big data needs to be aligned with human rights. Data could be used, for example, to target minorities, or to breach privacy rights. Correlations need to be drawn between datasets, but we need to remember that correlation is not causation.

Robert Jones (CERN), provided examples of open e-science and challenges posed to it. The first of them is how to share data. In 2016, CERN’s main machine recorded 50 petabytes (PB) of data, which corresponds to 11 million DVDs. This data needs to be shared with the global population of physicists. However, they need not only to have access to data, but also access to software and tools to read the data. In addition, there is considerable ‘insider’ knowledge among the group of people involved in the experiments that is hard to communicate. The CERN open data portal has been created as a starting point to share data. In 2015 40 terabytes (TB) of data were available in the portal; in 2016, the figure had increased to 310 TB of data.

Challenges to open data include technical aspects (e.g. process and organisation, funding and support), human aspects (e.g. skills, incentives, mindset) and the need for data to be interoperable (the FAIR approach). FAIR means that information should be:

  • Findable by humans and by computer systems for the purpose of automation.
  • Accessible and preserved over a long time.
  • Interoperable, able to migrate over IT systems over time
  • Reusable

Jones mentioned that the European Commission had launched the idea of an Open Science Cloud. There is also need for a hybrid cloud model to ensure that data gets stored in a cloud with procurement and governance approaches that suit the private sector and also publicly funded research. Data commons  and Zenodo are interesting projects.

Paolo Casini (European Commission), called attention to the fact that big data is not ubiquitous. Large big data firms are located in just a handful of countries, mainly: Vietnam, Thailand, China, US, etc. In terms of revenue, the order changes, and the first country in the list is the US, and the second the Cayman Islands. Casini provided examples of obstacles to sharing data. In the market, trust among players in the production chain is crucial for sharing data. Firms need to have control of their data for the data market to grow.  

More control requires, for example: application programming interfaces, distributed ledger technologies, persistent identifiers for data sets, clearer international frameworks, data portability and guidance for SMEs. Most of the negotiations about data happen bilaterally. Usually there is one big firm and some smaller ones who are often unable to understand the legal consequences of what they are signing. Casini proposed some takeaways:

  • It is important to understand market forces in order to be able to use big data to foster SDGs.
  • Data is not ubiquitous
  • There should be no overreach: open data sometimes is not a viable option in some industries, and we need to consider these cases
  • Not all data value chains are the same.

A representative from the private sector emphasised that every company will become or has already become a data company. The chain of Marriot Hotels, for example, is nowadays a data-based company: they have outsourced the administration of their hotels to franchisees and only manage the brand.

Rajinder Jhol (UNESCO), emphasised the role of market forces and government forces in achieving SDGs. He provided the example of how data about differences in salaries between men and women could create change, by exposing those actors that pay an unfair salary. Companies make money out of our data, they understand the habits of their customers and offer them ads based on the big data they have collected. For these companies, the market incentives are clear. For governments, the question is how to use data to provide better service delivery.


by Marilia Maciel