WS #77 The construction of collective memory on the Internet

19 Dec 2024 07:45h - 09:15h

Session at a Glance

Summary

This panel discussion at the Internet Governance Forum focused on the challenges of preserving collective memory in the digital age. Experts highlighted how the internet has fundamentally changed how memories are created, stored, and accessed. Key issues raised included the ephemeral nature of online content, with studies showing a significant percentage of web pages becoming inaccessible over time. Panelists emphasized the political and economic aspects of digital memory preservation, noting that curation decisions reflect power dynamics and monetary interests. The digital divide was identified as a major concern, with many countries, especially in the Global South, lacking robust internet archiving capabilities. Speakers discussed various initiatives to address these challenges, such as Brazil’s Grauna Project for archiving threatened websites. The discussion touched on the impact of emerging technologies like AI on collective memory, raising questions about data sovereignty and the authenticity of AI-generated historical content. Panelists stressed the need for more inclusive approaches to digital preservation that consider marginalized communities and indigenous languages. The conversation highlighted the complex interplay between memory, technology, and societal power structures, emphasizing the urgent need for comprehensive strategies to preserve diverse digital heritage for future generations.

Keypoints

Major discussion points:

– The challenges of preserving collective memory in the digital age, including issues of data storage, accessibility, and curation

– The unequal distribution of internet archiving efforts globally, with most concentrated in the Global North

– The political and economic aspects of memory preservation, including questions of whose memories are preserved and why

– The impact of new technologies like AI on collective memory and information retrieval

– The need for more inclusive approaches to digital memory preservation, especially for marginalized communities

The overall purpose of the discussion was to explore the complex challenges and implications of preserving collective memory on the internet, considering technological, social, political and ethical dimensions.

The tone of the discussion was largely academic and analytical, with speakers providing in-depth perspectives on various aspects of digital memory preservation. There was an underlying sense of concern about current inequalities and challenges, but also cautious optimism about potential solutions and the importance of addressing these issues. The tone became slightly more urgent towards the end as speakers emphasized the need for action on these topics.

Speakers

– Bianca Correa: Board member of the Brazilian Internet Steering Committee, PhD in law and technology

– Marielza Oliveira: Chair of the advisory board of the e-Government Institute at the United Nations University, former director of UNESCO Communications and Information Sectors Division

– Juliano Cappi: Manager of the Brazilian Internet Steering Committee Advisory Team, PhD in communications

– Ricardo Medeiros Pimenta: Coordinator of teaching and research at the Brazilian Institute of Information Science and Technology, professor at Federal University of Rio de Janeiro

– Samik Kharel: Journalist and researcher from Nepal

– Carlos Alberto Afonso: Director of NUPEF Institute in Rio de Janeiro, co-founder of Brazilian Internet Steering Committee

Additional speakers:

– Jean-Carlos Ferreira dos Santos: (role not specified)

– Tatiana Jereissati: (role not specified)

– Juliana Holmes: (role not specified)

Full session report

Revised Summary of Panel Discussion on Preserving Collective Memory in the Digital Age

This panel discussion at the Internet Governance Forum explored the complex challenges of preserving collective memory in the digital era. Experts from various fields discussed technological, social, political, and ethical dimensions of digital memory preservation.

Key Challenges in Digital Memory Preservation

1. Ephemeral Nature of Online Content: Bianca Correa highlighted the rapid disappearance of online content, emphasizing the need for robust archiving systems.

2. Selective Digitization: Marielza Oliveira pointed out that high costs lead to selective digitization and storage, potentially excluding important information.

3. Global Disparities: Carlos Alberto Afonso noted the lack of internet archiving capabilities in Global South countries, particularly in South America, the Caribbean, and Mexico.

4. Government Accountability: Ricardo Medeiros Pimenta discussed the issue of broken links and vanishing government websites, which poses challenges for maintaining public records and accountability.

5. Technological Obsolescence: Oliveira highlighted the problem of obsolete storage formats, emphasizing the need for continuous technological updates in preservation efforts.

6. Indexing and Searchability: Oliveira mentioned the challenges of making preserved content easily searchable and accessible.

7. Political Transitions: Afonso pointed out the risk of content disappearance due to political changes, particularly in government websites.

8. Real-time Backup: Afonso emphasized the challenge of real-time backup for archiving projects, especially for rapidly changing content.

Political and Economic Aspects of Digital Memory

1. Political Agenda: Pimenta framed memory preservation as a political agenda, suggesting that decisions about what to preserve reflect power dynamics.

2. Curation as a Political-Economic Process: Oliveira emphasized that curation decisions are shaped by political and economic factors, raising questions about whose memories are being preserved and why.

3. Monetization of Data: Oliveira noted that the monetization of data often drives preservation efforts, potentially skewing priorities.

4. Forensic Evidence: Afonso highlighted the potential use of archived content as forensic evidence in legal and historical contexts.

5. Government Accountability: Samik Kharel discussed the need for accountability in data collection and use by governments.

Emerging Technologies and the Future of Collective Memory

1. AI and Language Models: Kharel explored how AI and large language models are reshaping memory construction and access.

2. Algorithmic Governmentality: Pimenta raised concerns about the challenges of algorithmic governmentality in social existence and its impact on memory formation.

3. Rapidly Changing Technologies: Oliveira discussed the challenges of preserving memory in the context of constantly evolving digital technologies.

Proposed Solutions and Action Items

1. Developing technologies to mine the Common Crawl for preserving collective memory in Global South countries.

2. Building capacities of individuals to preserve their own meaningful memories online.

3. Increasing efforts to digitize older content still in paper formats or obsolete digital formats.

4. Improving indexing and searchability of preserved digital content.

5. Considering data sovereignty issues in storing and accessing preserved memories.

6. Creating a dedicated institution in Brazil for digital preservation, as suggested by Alex Moura.

7. Exploring the possibility of NIC.br taking on the challenge of creating a Brazilian Internet Archive, as proposed by Carlos Afonso.

Additional Points of Discussion

1. The Grauna project: Afonso discussed this initiative aimed at preserving indigenous languages and cultures online.

2. Preserving multilingualism: Oliveira emphasized the importance of maintaining linguistic diversity in digital preservation efforts.

3. The Tempora tool: Pimenta mentioned this platform for analyzing temporal aspects of digital content.

4. Public vs. Internet Memory: An audience question raised the issue of mismatch between public memory and what’s preserved online.

The discussion concluded with a recognition of the urgency and complexity of preserving collective memory in the digital age. Panelists emphasized the need for multifaceted approaches that consider technological, social, and ethical dimensions, as well as the importance of inclusive and equitable preservation efforts that represent diverse perspectives and experiences.

Session Transcript

Bianca Correa: Welcome to the workshop, the construction of collective memory on the Internet. As the IGF draws to a close, I believe this has been an intense and productive week for debates on Internet governance. You must be tired, but we have a very interesting discussion ahead that’s sure to energize and inspire you. I would like to introduce myself. My name is Bianca Correa. I’m a board member of the Brazilian Internet Steering Committee, and I hold a PhD on law and technology. And I would like to thank the audience, both online and in person, here in Riyadh. A special thanks to the expert panelists who have kindly agreed to share their ideas and thoughts on this topic today. The workshop, titled The Construction of Collective Memory on the Internet, will last for 90 minutes. To make the most of our time, we will follow this discussion format. Each speaker will have 10 minutes to present their ideas. After that, we’ll move to a question-and-answer session, prioritizing interaction with both the in-person and online audiences. Finally, the panelists will deliver their closing remarks. So let’s get started. Memory is a vast and complex topic. It becomes even more complex when we think about the relationship between memory and the Internet, in preserving memory, promoting social memory, and constructing memory itself. This workshop aims to foster a debate on the challenges of preserving memory in the digital environment. It seeks to explore how the Internet and digital technologies can serve as tools for preserving, promoting, and constructing online memory. especially in a context where much of our culture, social and political processes are mediated by and even originate on the Internet. Memory preservation on the Internet involves tackling issues such as preserving the integrity of information, countering disinformation, protecting the right to information, promoting underrepresented cultural heritage, preserving multilingualism and more. We often say naturally everything is on the Internet but is everything on the Internet? Feeling frustrated at not being able to find information online seems to be becoming more and more common. Whether it is a news page, a blog, tweet and etc. Content on the Internet can disappear for different reasons. Online materials can be deleted, vanishing information is a reality and a study conducted by the US-based think-tank Pew Research Center Research that suggests that a quarter of all web pages that existed at one point between 2013 and 2023 are no longer accessible as of October 2023. In most cases this has become because of individual page was deleted or removed on an otherwise functional website. For older content this trend is even starker. Some 38% of web pages that existed in 2013 are not able today, available today, compared to 80% of pages that existed in 2023. So 23% of news web pages contain at least one broken link as do 21% of web pages from government sites. News sites with the high level site traffic and those with less are about equally likely to contain broken links. Local level government web pages and those who belong to city governments are especially likely to have broken links. So, given this context, this workshop aims to address some questions. What are the challenges brought by the Internet and the digital platforms to the preservation of collective memory? How do these new challenges relate to the promotion of information integrity, the protection of rights to information, the promotion of underrepresented cultural heritage and other issues traditionally debated in the Internet governance field? We’ll start the discussion with Maria Usa Oliveira. She is online. She is the chair of the advisory board of the e-Government Institute at the United Nations University, a former director of the UNESCO Communications and Information Sectors Division for Digital Inclusion, Policies and Transformation, where she led the support member states to strengthen capacities for access to information, digital inclusion, digital transformation and protection of documentary heritage. Before I call Maria Usa, I would like to introduce our moderator. Unfortunately, I won’t be able to be here the whole panel due to crossing agendas with other panels of IGF, but I would like to introduce a very important person for us that will be on my behalf moderating this panel. He is Juliano Cappi. He holds a master’s and a PhD in communications from the Pontifical Catholic University of Sao Paulo. He is the manager of the Brazilian Internet Steering Committee Advisory Team. He coordinated the creation of the Center for Studies on Information and Communication Technologies, the CITIC.br, the UNESCO Regional Center for Studies on the Development of Information Society and the Brazilian School on Internet Governance, the EGI. And I would like to introduce and to raise their names, last but not least. Jean-Carlos Ferreira dos Santos, Tatiana Gereissati, and Juliana Holmes, without whom this panel would not be able to exist. Thank you so much for your hard work on this topic. Marielza, thank you so much for being with us, our dear friend, so the floor is yours.

Marielza Oliveira: Thank you very much, Bianca. Can you all hear me well? I hope so. Yeah, we do. Okay, great. Thank you. It’s so nice to see you again, and it’s nice to be with the CGI colleagues and all the colleagues around the room and on the internet that are participating and watching this panel. I think that this is one of the most absolutely relevant topics that we could be discussing because the internet is really changing the way we think about and record and recall our own memories. It’s changing it completely, and it has done that from the very beginning. When we started and we started accumulating some information online and when the browsers, you know, first browsers came around, we stopped really thinking about memorizing things because we could always find it on the internet. You know, it was just about, you know, oh, you can Google it. You know, it’s literally, you know, the browsers became our collective memory of what was happening, except that the browsers, you know, and the internet itself, it doesn’t have everything, you know, that, you know, we have in our own minds. We digitize very selectively, but we are less selectively, we’ve been less selectively over time. And the internet actually changed the way that we actually record things. And artificial intelligence made a huge change in the process as well. The first steps that we had was essentially we put things online. We created content, digital content online, and we digitized material. But now we actually went beyond just digitizing to actually data-fying content so that we could actually start searching and using content in a different way than before. In the beginning, because there’s a huge gap, a lot of disparity on the internet in terms of who has compute capacities and who has the skill sets and who actually can access the internet. In the beginning, it was even worse. Nowadays, we have about 70% of humanity online already. It’s 5.6 billion, if I’m not mistaken, out of the 8 billion that exists. In the beginning, we had quite a few less people online. And then, therefore, the content that was online was essentially the content that came from northern countries, from the US, from Europe, essentially, and with a lot less content being recorded by other countries that had less computer capacity, less access to the internet, and so on. So we end up with, for example, nowadays, 46% of the content that we have on the internet is actually in English, and very little. content is in other languages. We have 7,061 languages in existence in the world, in use in the world, and actually less than 300 of those are in use online. And of course, we’re seeing quite a lot of effort to increase that number of languages that are active, that we can actually translate from one language to another. But still, we see the vast majority of content that the internet has memorized in its 15.3 million websites. It’s essentially from a subset of the countries that are available and that exist. But we digitize and we digitize with a lot of disparity as well, like I was saying, because we simply don’t, not all countries have the capacity, but also the digitization process itself is a costly process. And in the beginning, we ended up with, we’re using technologies that are nowadays quite obsolete already. So for example, I don’t know about you guys, but raise your hand if you have CDs. If you have CDs, do you have CDs? I have 400 CDs and no CD players anymore. It used to be that computers came with a CD player. And nowadays, if you ask for one, people go, why do you need that? It’s essentially we moved from a technology that existed before that no longer exists. And a lot of the storage that this technology had, that is the capacity. to restore was left behind and a lot of the archives that were digitized already, you know, were lost. Were lost, you know, because this is no longer an accessible format, you know, for most computers. And just like that kind of format became obsolete, there are quite a lot of different formats obsolete as well from the very beginning. You know, computers started, my first computer, I actually, my first personal computer that I used, you know, it had recorded things on tape, you know, and, you know, so those are gone. And, you know, so we lost, you know, quite a lot of what was memorized, you know, and recorded in this kind of archive. And that’s not the only gap that exists, you know, in terms of storage. In terms of storage, you know, collectively, we store less than 10%, you know, we in data centers than what we actually produce, you know, in terms of information or content. Now, I’m not going to even call it information because a lot of it is not necessarily information, it’s content that we put on the internet. In 2010, already, you know, about 15 years after the first browser, you know, was made available, in 2010, we had two zettabytes of data online, a zettabyte being one trillion gigabytes, essentially. Now, we had in 2010, two zettabytes. In 2020, we had 64.2 zettabytes online. And in 2025, five years later, we are expected to have 181. zettabytes of content. So in 10 years, we went from two zettabytes to 64. And now in less than five years, we are going to multiply that by three. So the amount of content that we produce with the number of people online is growing at a pace that is incredible. But storing this content is highly expensive and very selective. So what we have online is not necessarily what we have in storage in terms of data centers, for example. And those are very expensive, very expensive technologies. And not only expensive in terms of the creation of the tech itself, the infrastructure itself that is very costly, but also environmentally costly in terms of water that it drinks to cool the data centers, the energy that it consumes to power these data centers for them to continue working. So digitization is a process that is incredibly expensive. So selection of what ends up stored is a process that is on a continuous base, making a lot of what we produce being discarded. And that discarding is not necessarily done by us. It’s not a process that we select to do. It’s by the organizations and the platforms that we use that end up making that kind of selection. What is worth keeping? And what is getting thrown away on a daily, on a continuous basis? So for every byte that we have to store nowadays, another byte has to be thrown away. So, how do we select that? Organizations make that choice, and we end up not having access to a lot of information. We have the broken links that were mentioned in the beginning by Bianca. We have a lot of this loss of content that we use to store in the cloud, or in different types of systems that end up obsolete, and discarded, and so on and so forth. But it’s beyond that. Digitization is this costly process, but datafication is a costly process as well. We need to be able to actually search this content, and the vast amounts of content that exist to be able to be searchable, to be accessible. They have to become beyond just a record. They have to be a searchable record, and being searchable is a complex process as well. You actually have to datafy, create, index this kind of information, this kind of content, so that this content can then be accessed in different ways. The process of indexing is very complex as well. It used to be, for example, that when we scanned text, for example, we scanned a book, that we took a picture of that book. Essentially, it was a digital Xerox copies of that book. It’s not a searchable mechanism. You just have this kind of a picture. Now, you actually, then we started using. using OCR technology, you know, the optical character readers technology that actually converted a page, you know, instead of being just a picture to being, to reading the text and absorbing it. But now it’s, even that, you know, it became at some point the heart. So you actually have to index it in different ways, finding, you know, keywords, for example, for text and et cetera. So who decides those keywords? Who decides on what basis you access information on the internet? It is the kind of thing that when we start looking, we find all kinds of issues with that. For example, I don’t know whether, you know, you’re familiar with a, the ImageNet, you know, a dataset, which was a dataset created, I think it was Harvard, you know, that created this dataset, you know, and started, you know, putting a big set of pictures, you know, together and somebody had to figure out a way of making sure that this, this data was searchable. And so they started labeling the pictures and the labeling pictures, and it became an issue that brought in all kinds of biases and discriminations, you know, and, you know, for example, you know, it would look at, at the faces of, of, you know, people that are black or brown, you know, or faces that are not the typical blonde, blue eyed Northern, you know, and, and label them in different, you know, as, you know, in many derogatory ways, you know, hyper sexualizing, you know, women, for example, women of color, you know, or calling men of color with a criminality linked associations and so on and so forth. So that’s the kind of thing that ended up happening. And then when we search for these images, when we try to recall the memories that these images encode, you end up bringing these biases in as well. So you have all kinds of issues with digitization. Then you have the process of datafication. And then you actually try to generate. Nowadays, we use this vast amount of data to generate applications. For example, using to generate generative AI, artificial intelligence, large language models, diffusion models, and so on. And those encode this datafication mechanisms that are quite biased, quite disrespectful, actually, of different cultures, and are keeping content from cultures that are not necessarily representative of all the cultures of the world. So we end up with generative AI, a set of collective memories on the internet, and particularly in data centers that are not the memories that we put in, the content that we put in. And then we end up with this content that is coming out that is not necessarily. It’s a kind of pasteurized, amalgamated, average content that is not the memory of the world. you know, but it’s the memory of everyone, and it’s not respectful of cultural heritage and cultural precedence. But, you know, this is what we have online. And of course, generative AI, it actually generates content as well. And the generation of content by generative AI actually creates tremendous issues on memory that we collectively have on the internet. First, it hallucinates. You know, it creates information or content of things that never happened. It doesn’t exist and don’t exist. You know, it doesn’t have any links to reality or to facts. It simply predicts, you know, the next image or the next picture or the next word. And, you know, so it predicts, you know, those and end up creating, you know, citations of books that don’t exist, pictures of events that never happened, and so on. And actually, historians, many of those are actually using, you know, the images to illustrate, you know, generating images to illustrate episodes in history that had no photograph, you know, of them happening, you know, before photography was invented. So you actually now have pictures of, you know, that never existed of an event. And, you know, and those pictures are incredibly biased as well. For example, generative AI, one of the things that it’s interesting, it just generates on the basis of what exists. There are quite a few tests, for example, about, you know, for it, but trying to generate. images of black doctors treating white children in hospitals. It happens every day. But generative AI has enormous difficulties creating this kind of image. But it makes it easy for you to create, for example, images of Indians in the US, wearing traditional clothing and sitting around negotiating treaties with cowboy-dressed white men in the 16th century. So it’s not accurate. And we end up with these images polluting our environment as well. So we have hallucination, which is the unintended creation of fact-free content, when I call it fact-free. Then you have actual intentional creation of content that is also fact-free. It’s not linked to reality. And then you have actually malignant kind of distribution of this, which is misinformation, disinformation, which is actually created with the intention to deceive. And so we actually spit it all out on the internet again. And we keep polluting our information environment to the point that now we are in the process of digitization, datafication, and usage of, you know, this information online, the biggest skill that we need to have is actually the skill to verify, you know, to say, is this real? You know, is this true? And how do you do that is becoming more and more difficult, exactly because of the broken links and the disappearing behind paywalls of the content that is trustworthy. So, such as content from media organizations that have to charge, you know, for this presentation of this content in order for them to survive, you know, instead of what platforms do in presenting information to us with by that they monetize through ads and other means such as that. So, yeah, you know, we live in a completely different world, you know, from when you could just Google it, when actually search engines are using generative AI to hallucinate results and offer them to us, you know, including as a first option. So, we don’t have the memories of humans anymore. We have, you know, content generated by computers being presented to us as, you know, the collective memory of the world. We need to be very, very cognizant, you know, very, very, um, we need to really understand the impact that this has on everything we do, you know, the valuing of science, for example, you know, if facts can be mixed up with non-facts, you know, with fact-free content such as that, what is the value? of the trustworthy organizations that use to generate content for us, you know, science, media, you know, authorities, and so on. They’re becoming less trustworthy as well, you know, simply because we cannot differentiate between content that is generated, you know, that is part of our collective memory, that is fact-based, evidence-based content, to, you know, something that is being, you know, put on the internet by, you know, some artificial entity. So, just some provocation to start, because I think that this is one of the most important topics that we have. How do we preserve the, you know, the validity, the reliability of our information environment? This is the question that we have, you know, for the next few years. It’s the most important question that we could, you know, be discussing. Thank you.

Juliano Cappi: Thank you so much, Mariela. I’m assuming that we should now pass to the next speaker, which is Ricardo Pimenta. So, Ricardo Pimenta, the floor is yours.

Ricardo Medeiros Primenta: Thank you. Thank you, Juliano. So, good morning. I’d like to begin by thanking CGI for the invitation and also the Ministry of Science, Technology and Innovation of Brazil for this honor of representing it. So, to begin with, let me share a popular Yoruba saying from Brazil’s Afro-Brazilian culture. that says, Eshu killed a bird yesterday with the stone he threw today. So, Eshu, we know, it is a figure of movement and transition in Yoruba mythology, bridges the human and divine, enabling communication and connecting them. This notion of interconnectedness reminds us that maintaining and developing connections in our digitized world is both our responsibility and a challenge, even when the connection is between past and present. In fact, in our current digital reality, these connections generate immense data and information raising pressing questions. What should be preserved and how do we distinguish the essential from the superfluous? The challenge of maintaining collective memory has grown exponentially. We now face a flood of disorganized and even lost data stored across countless devices, complicating retrieval and comprehension. For public policy, this issue is particularly urgent, given the unprecedented speed and volume of data production in the past three decades. So, memory, as highlighted in the Yoruba saying, it isn’t just about the past. It is actively constricted in the present. Remembering today shapes our understanding of yesterday. And memory itself is updated and rewritten in real time. In Brazil, the time has come to think about yesterday’s bird. So, memory, it’s a political agenda, not just a cultural one, which should primarily unite public and third sector institutions. so that it doesn’t end up being driven mostly by the market, leading to what AndrÃ©s Hussein has described as a disneyfication of memory, which through its overexploitation would also invite us to greater collective and irremediable forgetfulness. This has profound implications for public and collective memory in the digital age. We must approach it ethically, curating what is preserved, while recognizing that not everything can be saved. Social platforms like Instagram or Facebook, for example, add complexity as the content they host belongs to their owners, including disinformation and toxic narratives. This threatens the representations of our past and present, and meanwhile Brazil’s more than 5.3 million internet domains contribute every day to the vastness of this challenge. To tackle this, initiatives by institutions like IBICT, the Brazilian Institute of Information Science and Technology, where I am a researcher and currently the teaching and researching coordinator, provide some valuable examples. First of all, I can speak something about the Tynakon software. The Tynakon is a software that digitizes and systematizes cultural collections from IFAM, that is an institute for our historical and artistic heritage, National Institute, and the IBRA, that is an institute from Brazilian museums. So ensuring the Tynakon could ensure access to museums and memory institutions. This is one example that are developed in IBICT inside the Ministry of Science, Technology and Innovation. The second is the Cariniana Network. That is a network that preserves… over 700 open-access electronic journals automating processes like storage and validation. The third is the Arquivo.gov. It’s a kind of a pilot project that archived nearly all Brazilian government websites in 2021 with plans for user-driving websites collection and preservation inspired by models like the Arquivo.pt and the Internet Archive but more the experience of Arquivo.pt is the reference for us. And the last, the Tempora. Tempora is a digital tool developed in a digital humanities laboratory in Ibict. It is a platform for archiving and visualizing digital information in the form of a timeline which during the 2022 presidential elections we started publications from fact-checking agencies with the intention of creating a timeline of disinformation events and contributing to the memory of that event in the midst of this disinformation fever we are experiencing globally. So, these efforts showcase Ibict’s potential leadership in preserving Brazilian Internet memory but broader challenge will remain, particularly regarding who preserves the entirety of Brazil’s online presence and now and how storage limitations are addressed. The issue recalls the Argentine writer JosÃ© Luis Borges who wrote Funes de Memorios where the desire to remember everything leads always to a paralysis. Memory thrives in balance. between remembering and forgetting, recovery and erasure. The technological promise to store everything is illusory. We must curate what defines the memory of the Internet, shaping what is remembered and what is not. To do this, two challenges stand out. The first is about management, in my perspective. The challenge of memory today is its management, its control in a scenario where space and time are atomized and the volume of information expands entropically, invites us to feel this kind of Freudian death drive, intimately that pushes us to confront, to innovate and generally to the vibrant creation of means, techniques, strategies, policies and practices capable of making us overcome it one day at a time. The second could be about governance, a good one. A good one that is capable of circumscribing different actors, able to decide what to preserve and who makes those decisions. This isn’t just a technical issue, but a political and institutional one, requiring ethical collaborative solutions. Furthermore, if the object we are looking at is the Internet, how will any proposal to preserve its memory be able to progress without thinking about the mechanisms that need to be aligned with the devices, actors and institutions that regulate it? So, in my perspective, governance could play one singular role to keep proper access to information and freedom of expression without major ethical complications and transgenerational public and collective memory mediated by information… communication technologies that are now in different parts of our private and public daily lives. In closing, I return to the Yoruba saying, the actions we take today to preserve the Internet’s memory will determine if the bird was indeed killed or not yesterday. So thank you. Thank you so much Ricardo and I

Juliano Cappi: just gave the floor to Ricardo without presenting. Ricardo then I’m sorry and I’m doing it just as now. Ricardo is currently the coordinator of teaching and research in information and science and technology at the Brazilian Institute of Science and Technology and he’s a permanent professor at the postgraduate program of in information science at the Federal University of Rio de Janeiro. Ricardo has been a full research at the Brazilian Institute of Information Science and Technology since February 2013. Sorry Ricardo and thank you so much for your insightful thoughts. Then I would give I give the floor to Samik Karel. Samik is a journalist and researcher from Kathmandu at Nepal with over a decade experience in reporting on contemporary issues for national international media. He has contributed to leading research institutions focus on technology ethics and human rights. Karel has received a multiple international fellowship and grants and he teaches critical thinking at university at university while exploring electronic music. Karel thank you for your participation. The floor is yours.

Samik Kharel: Hi, can you hear me? Yeah. So yeah, thank you very much. Hello to everyone at the IGF in Riyadh. From myself enjoying a sunny winter afternoon in Kathmandu, at least a couple of minutes back. I’m overwhelmed to be a part of this esteemed panel. I would like to thank the CZI for this wonderful opportunity to talk about collective memories in digital realms. I think it’s the collective memories that have actually brought us together, our past activities that’s on the internet. And so, yeah, although this is a very deep and awesome to dive in, I would like to start very general and narrow my interest towards my own expertise and probably my geographic reason as well. So yeah, just an anecdote to start with. When I was very young, I was given a chalk and a slate, you know, like by my parents. And a formidable technology at that time to write and learn first alphabets. It was not very long ago, but like, you know, it was like three decades back. And I thought it was the most convenient tool because I could scribble on it, write anything. And if I didn’t like it, I could erase it as well because this tool was very ephemeral, you know. So I don’t remember what I scribbled then, you know, like not much memories of it, except writing a few alphabets and maybe like scribbling some Mickey Mouses and Donald Ducks. But passing this phase, I was given a notebook and a pencil. Now I was told I was to have more structures, you know, like write between the lines, do this, do that, be more disciplined and only erase the errors. A little bit later, like a few years after the pencil, I was given a pen. a more permanent it was it was an idea which gave me more permanence and I and what I scribble stayed a little bit longer there were no no traces of my chalk and slate experiences and now still now like I although I don’t find anything else in my basement in my parents basement I still find like some scribbles of whatever I did with the pen and pencils you know like so yeah I mean that’s how I would like to start and how these were my memories and that were keep kept in like soup boxes in my parents basement and probably many of these contexts you can relate to as a collective memory yourself so we have a tendency to save and retrieve our memories as desired and as memories play a huge role in construction of our identities so fast forward to my teens so like you know like we get a computer with with a little bit of access to the internet a little bit later they’re more restricted and diverse I was being watched by my guardians to go there not to go there probably being logged and being checked my history compared to the most more analog past I had the internet seemed to make everything present you know even the past was so well weaved with the present everything now felt like a block this is likely because at present our memory function is increasingly organized via media systems specifically digital media and which has become entry very integrated this integrated media system internalizes the main functions as cultural memory now which has become a focal point of the document is in system of the past and the present Example like now I use Google photos and it gives me like, you know, seven years back memories You were in the ocean and today you’re in the ocean. So you I mean you’re doing well, or I don’t know Yeah, this is this is how you Tag your memories with this tokens so like coming again to an ad is what they say like the internet never forgets, but people do and And When people do then internet actually rightly reminds you again that you have not forgotten and so now you know like with internet and digital technologies and in particular internet and web-based information and communication technologies Our memories our collective memories are formed and shaped during the digital era While internet systems have enabled Kind of demo democratizes in memory with it with You know, like everyone’s basic technology and internet, you know, like devices can produce their own content promote it on the web um while now the big part is many who have been left behind as even with the lack of basic technologies and infrastructures Are not being able to do so my one of my country and the reason and the majority world about chronicles these digital divides which majority which measurably still affects already vulnerable population and the marginalized ones in our reason and my country so, uh this reason witnesses particularly of Of a patriarchy on internet as well as majority of narratives and discourses are still male dominated You know, like all these narratives discourses coming from political institutions parties Universities are still very patriarchal Uh, that’s what I feel. Um the same population with this which actually did not have cameras books access to libraries, information, newspapers, access to education, basic health care. They’re the same population who don’t have access to the internet, which is really sad. Their memories have never been documented. Rather, sometimes they’ve been part of subaltern narratives which have been seen by others and brought out to the world. While this divide is closing in data with more access to technology, the debates on what we call meaningful uninterrupted access still lingers. That’s where we stand in this region, particularly Nepal, India, Bangladesh, Sri Lanka, and the rest of South Asia. Where we lag is while social media helps forming collective communities. You have people who play games, different interest groups who don’t have to be face-to-face. These vulnerable populations are still left out of discourse. They don’t know what’s happening, where they are, where they stand in this technological world, which was supposedly principled under democratization and participation of collective memory making. Coming forward, the process of creating, storing, managing, removing, manipulating digital data. Let’s talk about public data and collective memory. Where we stand right now in the digital age, collective memory are often intertwined with the data we generate, from the photos we post online to the interactions we have in social media. This raises concerns about who controls our public collective memory, how it is used, whether it is a subject to manipulation. Most likely, we are very vulnerable when it comes to the government using our data. With the lack of very comprehensive data policies, mostly in this part of the world and elsewhere too, there’s a lack of accountability. While the governments have been proactively using available technologies to collect data, data from the citizens, there has been less accountability of where this data is being used, where it is being stored, for what purposes, for how long, how it will be used, in what cases it will not be used. There has been no accountable answers to these. There have been several breaches and leakages of data and personal information, personal data example. I would like to give a few examples of the election data that was collected but that was breached and used for other things. The government yet to realize people’s, the value of people’s data and being accountable for it. Also data being collected for one purpose and being used for another one, like you use for national, you know, like a national demographic population data for something else, you give it to like some marketeers, corporate houses, for their own benefits. So that is another problem. Also data being, also other sensitive cases of data collection and retrieval being procured to other countries because we don’t have the expertise to manage our own data, which is very, which keeps us very vulnerable position in the absence of comprehensive data law. Then again, there’s like the trustworthiness of social media platforms. Which have been pretty active in most of these countries. While we’re using social media platforms in our day-to-day activities, from our information sources to businesses, we tend to use all this like big social media as vital tool for our information, for even for our businesses. But no one questions the trustworthiness of it. The government has tried to grip the social media companies in this part of the world and other places as well. Asking them to work in coordination, filter harmful data against national integrity and national interest. And also establish a focal communication person for the, so that they can. actually been in touch with these companies. Few companies like TikTok, which was banned in Nepal and less some other countries in South Asia have also adhered to the government’s proposal, established a focal person, worked with the government for data breaches, but still it’s in a very nascent phase. TikTok was banned by government of Nepal, which has been lifted after they agreed to set up their centers and go accordingly. Also the rife of misinformation in the platforms is ever increasing, political parties and political wings are using the internet and social media to change narratives that have been abundant, which is like everywhere, especially during the crisis, which is a crisis during elections or some natural disasters or the pandemics. Whitewashing, smear campaigning, conspiracy theories, an area of these enforced to collective memories. At the same time, memories shared publicly in social media have also been very crucial during natural disasters and pandemics. I’m not saying all is bad, there’s good things as well. The recent floods, the use of social media and like the posts made by citizens actually helped rescue many people as well. Also coming back, like as a journalist, I need to bring this together, like the best example of consolidated open source, what we say is Wikipedia, which does not conform to historical recording practices. However, internet as a whole and social media are also a great tool for open source. Now, as a journalist reporting with limited resources from this country, not being able to travel everywhere on foot, I think open source has been very crucial to my coverage on very sensitive issues. It gives me multiple perspectives, angles, diverse ideas, and approach to report. I think it’s a marvel for modern news journalism if you know how to use it. So yeah, the future, I know like I have been closely following the LLMs and as Mariella also pointed out, how it’s gonna herald new ecosystem for the collective memory. Is it gonna be the future of collective memory is a question, particularly generative AI seems to have taken a technological leap and with building new infrastructures for memory, while it also enables combination of various diverse encounter memories. Now LLMs are being used to memorialize chat with historical figures and philosophers, bringing them from past life. There’s this Silicon Valley thing of saying long-termism and like, yeah, memorializing someone. So you can talk to Russo, even if he was dead like, I don’t know, many hundred years back. So the Russo chatbot becomes more dynamic in engaging in public memory with all the interactions with other people, quite exciting times, even those, the saturated discourses are likely to be dynamic again. So while AI could be the future of collective memories, it could be crucial to ensure participation of marginalized communities from the global South in progress towards inclusion and multilingualism and multiculturalism. That’s what I think. So we cannot be left behind and our already vulnerable community is getting more vulnerable without the lack of internet, with the lack of internet and connected infrastructures. So I would like to end there and I would like to discuss more. Thank you.

Juliano Cappi: Thank you so much, Samick. As we are advancing to the closing of the session, I go straight to Carlos Afonso. Carlos Afonso has a master’s degree in economics from University of Toronto, also a doctoral studies and social political. Through a thought at the same university, he has worked in human development field since the early 70s. He is co-founder of the Association for Progressive Communication, APC. He coordinated the Eco92 Internet project with APC and United Nations. He is a member of the United Nations Working Group on Internet Governance. He is a special advisor of the Internet Governance Forum. He was, in 2007, a member of the UNCTAD Expert Group on ICT and Poverty Alleviation. He was a member of the UNCTAD Working Group on Enhanced Cooperation. He was a member of the Mood Stakeholder Advisory Group of the IGF. He is co-founder and member of the Brazilian Internet Steering Committee. He is co-founder and chair of the Brazilian Chapter of Internet Society. Finally, he is a director of the NUPEF Institute in Rio de Janeiro. The floor is yours.

Carlos Alberto Afonso: Good morning. Were you hearing?

Juliano Cappi: Yes, we hear, but with a little bit of noise, but yes.

Carlos Alberto Afonso: Let me see if I can switch.

Juliano Cappi: I’m sorry, we are having difficulty to listen.

Carlos Alberto Afonso: Yes, can you hear me now?

Juliano Cappi: Yes, yes, perfect, perfect, great.

Carlos Alberto Afonso: Okay, thank you. Thank you. Well, good morning, good morning, right? You are, no, it’s still morning there? No, it’s not. It is, yes. So, it’s five in the morning here, so. Well, you probably are looking at a map that is from Wikipedia, which I posted there. And the map, as most maps are, is distorted, benefiting the Northern Hemisphere. So, the Northern Hemisphere shows much bigger than the Southern Hemisphere. But the important thing is, the countries painted green are the countries which have significant Internet archiving services, like the Internet Archive, like many other efforts to archive the Internet. Countries below the equator, which takes most of South America, and also the Caribbean and Mexico, there is no indexing, no indexing of the Internet in those countries. When I say there is no indexing, I say there is no significant indexing, which is worth mentioning. There are experimental ones. We are a small institute. We are doing a project like that. But it’s too small to figure in the map, no? In Africa, you have only one country with a… an important indexing service, internet indexing, web indexing service, which is Egypt. And why Egypt? Because they have the Alexandria Library, which does internet archiving. Wonderful, no? But it’s only Egypt in the entire Africa, no? In the Southern Hemisphere, you have only Australia and New Zealand doing significant internet archiving. So this is a major challenge for the southern countries in the so-called Global South, no? And we need to address that because we are losing a lot of information because as other speakers mentioned, the information on the internet is anything but eternal. It disappears and many government sites disappear when political issues arise, no? And this happened recently in Brazil, several sites almost disappeared. We are trying in Brazil, there are initiatives, but are not at the scale which could be present in that map. But there are initiatives trying to do something. And one of them is our small institute which we call the Grauna Project. Grauna is a bird, is a bird with a strong, tremendous resistance to environmental challenges and so on. And it’s also a figure of a famous cartoon in Brazil, which represented people, impoverished people in the northeast of Brazil. So we use the name Grauna to represent our project of trying to do indexing of the internet in Brazil. And it has two components. One of them is indexing based on the technologies used by the Internet Archive by Arquivo.pt, which is the major indexer in Portuguese language, but does not index Brazil, index only Portugal. And several others which use open source technology and the reproducible technology to index the Internet. And this Gaona project also includes a local server, a very small server, which is a small box, which you can carry with you anywhere, which has a copy of many information systems which are there to be used in remote communities which have poor or no connectivity to the Internet. So they have a reproduction of Wikipedia in Portuguese, for instance, in this box, and several other facilities, information facilities. So this is part also for Gaona project, no? And what are we doing right now is the project in an experimental phase and trying to protect content relevant to the democratic processes, which is a potential target of hacker attacks, censorship, political pressure, or eventually, which cannot be backed up satisfactorily, no? The Gaona Archive stores websites selected using a methodology that prioritizes qualitative interviews and analysis of the political scenario. It is very experimental. In this experimental phase, some priority areas are defined, like environment, health, culture. human rights, but we have defined in principle 18 thematic areas to index and the challenges we are confronting are quite interesting and we had to do it to understand why people are not indexing the Internet and now we know it’s very difficult, it’s a big, big challenge. We have created several interesting features in the system for archiving like the ability to belong to a group of users, for example, if a research group wants to have multiple users creating archives for the same project in the system. Ability to schedule, requiring archiving to maintain different versions, display of archiving date and time, which is typical of the major Internet archives. And we have defined, to begin, 18 themes from culture to government, racial equality, gender, elections, communication, etc., etc. In recent years, there have been several cases of removal of or alteration of the content of public information, as well as deliberate attacks on web pages. There are also frequent reports from civil society about greater difficulty in accessing previously available public information. Despite some relevant experiences in the academic field, for instance, at the Federal University of Rio Grande do Sul, there is an indexing initiative, Brazil still lacks permanent projects aimed at archiving the web on a scale compatible with the breadth and reach of the Internet in Brazil. Disappearance of information in all elections due to poor management or incorrect application of electoral law is an issue which has to be considered. And Graona started in 2018, and we managed to get some funding from the Open Society, the Media Democracy Fund, and others to help us start the project. We have support from NIC.br with equipment and from the National Research Network, which provides connectivity to our project. And we conducted about 60 interviews about threatened websites, relevant content, security of their own websites. And we also had a legal context document prepared by our lawyer regarding archiving of content which may be challenged by the actual owners of these contents. And this is a challenge that has to be contemplated in these projects. In 2022 and 2023, we improved the infrastructure to ensure the necessary conditions for the system to run securely. And part of it is to provide almost real-time backup of the system, which is one major challenge if the main data center fails. You have to have a backup to run immediately, almost immediately. And this is also a challenge that has to be contemplated. We initially had 18 themes active with 227 archived sites and more than 100.gov.br sites, government sites, archived. That was especially important because there was a political transition in Brazil in which many of these government sites were challenged or disappeared. The scale of indexing is much smaller than the Internet Archive and others, and at this stage of the project is specifically aimed at preserving content at risk for several reasons. It’s also an experiment that seeks to address the challenge of indexing content that is publicly available, but often extremely difficult to capture. There are several reasons, use of increasingly complex technologies, frequent changes in the technologies used, huge databases, sites with multiple depth levels, many other challenges. There is the possibility of archiving that is not made public, is one of the features that we managed to install in the system, which is useful, for example, for the storing of sites that promote disinformation, and that we do not want to multiply, but can preserve. There is still controversy, however, about the use of archiving as forensic evidence. Preservation of dialogues in Brazil about preservation of web contents are happening in the sessions, in dialogues, and meetings of academics and other interest groups in Brazil since at least 2019, and we are discussing this now here at the IGF. And in the IGF, there is, I understand, an intersessional initiative, a policy network dedicated to highlighting best practices for preserving and creating local content. And a major challenge, for instance, for the original idioms, languages, which are a challenge in our region especially. We are finishing the first version of the software of the system, the RULA, and will be available on GitHub for free development using application by other organizations and also by the public authorities. And we are organizing a permanent curation team or committee to preserve more sites and review archiving criteria, which is a big challenge. The criteria for archiving, which was mentioned here by, I think, Marielza. And advanced research on public debate on the formats to archive, which have to be compatible with several library and other standards. Advanced debate on the authenticity of archives in WARC format so that they can constitute evidence. Establish support partnership to advance the development of the project. And train people to perform archiving. Hire a team to perform more complex or large volume archiving. Further improvement in usability of the tool, which is already online, by the way. Keep the system up to date in light of the constant transformation of the web and expand the infrastructure to increase processing storage capacity. Preserving content in any language is a complex challenge. Brazil currently has more than 300 indigenous ethnic groups with more than 270 languages, all of which are at risk of disappearing. And with them, an entire culture disappears. Similar challenges occur in Latin America and other countries, in the Portuguese-speaking countries and so on. How can internet resources be used to support the preservation and continuity of these languages and cultures is a big challenge. That’s it. Thank you. I talk too much. Here, the address of our institute is nupef.org.br. I will put this in the chat. And the address of the Grauna project is grauna.org.br. I’ll put there as well. Thank you.

Juliano Cappi: Thank you so much, Carlos Afonso. We have one question from the online audience. I would ask if someone here in the room would like to have a question. We have a question here.

Audience: Hi, I am a researcher based in Germany. I would like to ask Ricardo, can you hear me? You mentioned the link between memory or political agenda, or collective memory and the elections in Brazil. Can you give us some examples to elaborate how collective memory has been used or has impacted the results of the elections in Brazil? I also have a question for the journalist from Nepal. Can you give us some examples of the relations between whitewashing and collective memory? And if possible, could you give us examples from Nepal? And I would like to give, I’m sorry, but I forgot the name of the first speaker, the only female speaker in the room. Okay, okay. If you can hear me, actually I’m also working on collective memory. And when it comes to the deaf people in famine and natural disasters in the past, I couldn’t find the number of you know dead female bodies just because in the past only males house only for age households males dead bodies were counted what would you suggest me do it like you know doing to count to counter this challenge and that is I think that is also the question for all the panelists there was no data on certain issues in the past and maybe when you apply for funding when you apply for when you talk to your editors when you will talk to your bosses to convene someone of your research proposals they would ask you where’s your data what do you get data from and in case when there is no data due to historical injustices what would you do thank you

Marcelo Ferreira da Costa Gomes: hi I’m Marcelo Ferreira from Osvaldo Cruz Foundation and CGI.br thank you for the interesting very interesting interventions and I was thinking about the interventions you mentioned public institutions NGOs or civil society institutions looking for for memory open sourcing initiatives also very interesting but I found it is a lack of business interest on memory like companies and the compared to my Maria said that the technology expensiveness and indexing expensively we see on business today people saying that cloud storage and cloud processing it’s cheap so what I feel after this is that you have technology that are cheap for business interests for producing products and private services and expensive reform memory and I’d like you to comment about that because we see that you have technology for private interest that they’re cheap and available but when you think on public interest on there is no market interest And I’m not thinking only on states, but on the public, the common goods and the public interest. We don’t have investments of states or even of business. I’d like to comment this difference between the access and availability of technology for private interest and public interest like memory. We have this hard way to do that.

Alex Moura: Hi, I am Alex Moura, originally from Brazil. I work here in Saudi Arabia currently in the Cal State University. And I have a question for Carlos Afonso. As I have a past working back in RMP, the Brazilian Academic Network, I am aware of the challenges that happen in the science and education area, where people struggle to also have data for scientific purposes, for educational purposes, in universities, in research institutions. And this brought me a recollection that this is an open problem in Brazil, that we don’t have a specific institution towards storage or preservation, digital preservation. So how are you tackling this part of the problem of the storage capacity for the Grauna project? And what are your thoughts on how Brazil and other countries can address the problem of the storage capacity for many purposes, not only for internet memory, but also for scientific education and cultural and arts and etc.?

Bianca Correa: We have one question that we received from the online audience, I think I will also pose this question because then we can give the floor to all the speakers to answer them. So Dr. T. V. Gopal from Anna University in Chennai, India, he asks, he says, public memory is short, internet memory is seldom so. Any solutions for the mismatch hazard in the geopolitical space?

Juliano Cappi: Well, we have very good question in very short time. Then I would ask panelists to make their final remarks, trying to address the questions, which are very important and interesting, but I would also have to ask you to not go further than three minutes, because we will have to close the session very soon. Then we could start from backwards with Carlos Alberto Afonso, and then Samick, and then Marcelo, Ricardo, Pimenta, and then Mariausa, please, Carlos Afonso, the floor is yours.

Carlos Alberto Afonso: Thank you. I’ll be very brief, a good question. I recall that the Grauna project is an experiment, still an experiment, exactly to measure the difficulties which you mentioned, among others, like for instance, backing up in real time is a tremendous challenge. The cost of doing that is already very expensive in a big scale. That’s why we restricted the breadth of the information that the project can capture. to mostly civil society organizations’ web information. And on the basis of this experiment, we will try to progressively expand. But, of course, considering that this means more storage, more memory, and more backup, which is tremendous, the challenge is tremendous. Our idea with the project is also to provide a sort of a small reference, but a useful reference for an organization that could tackle the challenge in full and really do a Brazilian Internet Archive. And I have to say that one of the organizations that has these resources to do that, especially technical resources, is NIC.br. And we do hope that they consider this in the future. Thank you.

Juliano Cappi: Sameek, please.

Samik Kharel: Hi, thank you. I would just like to address the question from a lady in Germany. She asked about parties, memory, and whitewashing, I think. So, like, it’s been a common trend for major political parties in Nepal and the reason to deploy what we call the cyber army. So, what they do is look around the Internet and, you know, like, if there is any criticisms about them or if there is any critical discourse about them, they document that and they go to make a counterargument against that to make their image better. So, it’s very common for them to do that these days. And to inject some populist ideas and what is go against whatever is trendy. So, that’s how it works. Anyway, finally, speaking of collected memory with the ubiquity of Internet, the way we are accessing. Collective memories, storing, discovering, and retrieving these collective memories has changed with emerging technologies. The way we interact with our memories has changed, and I think it will keep on changing with the advents of LLMs and generative AI, mainly social media and platformizations have also augmented new ways to approaching these memories by allowing us to actively contribute to them, making collective memories more interactive and collaborative now. However, we should be careful to ensure everyone has equal access and infrastructures to these. There should be accountability of our data, and the future should be shared, equal, and be democratic, and bring together all marginalized and vulnerable populations of the majority world together. Thank you.

Juliano Cappi: Thank you so much, Sameek. Luis, Thiago, Ricardo, Pimenta, I’m sorry.

Ricardo Medeiros Primenta: Okay, so I’ll try to answer in one brief. So about the question about elections. The memory is always a place of struggle, political struggle. Many people tell about the cultural side of memory, but that is okay, it exists, obviously, but even the cultural side of memory, if we can talk about this, is a result from political struggles, struggles about power. So how it impacts precisely related to the fact that it is potentially violated or rewritten as a field of dispute. by those who seek to dispute the discourse on fruit, political past, science, and so on. And let me tell you something about the tool I was talking about, the Tempora. The Tempora, this digital tool, was created during the COVID-19 pandemic in Brazil. From 2019 until mid-2022, we collected there, with this digital tool, almost 6,000 notices from media in Brazil, Brazilian media. Stories about how COVID spread in Brazil, something like that. And, obviously, the Brazilian media that didn’t have a paywall. So, in the process, most of the news stories produced by the Ministry of Health in Brazil and other Brazilian government bodies’ websites had their links broken. In 2019, this happened very quickly. And then we realized this back in 2019. We tried to develop the system so that we could save an image, a kind of PDF website, and also scrap the entire corpus of news that soon tended to disappear. So, how the question of memory could impact the elections, for example. The elections are a place of struggle, dispute, about discourse, about the past, the near past. and about projects of future, so we can afford this kind of thing and we need to develop something, some strategies to avoid that this kind of discourse could stay in some groups, some political groups that could do all bad things that we almost know, we already know that they are. So the other thing that I think I can answer about the question of data and so on, it’s a perspective about algorithmic governmentality, it’s a kind of new regime of truth. So about data, I think our biggest danger is the automation of social existence. I think we all talk about that. Automation of social existence through computational process deployed in online media. Its memory that comes from it will not be a memory preserved by the demands and conflicts of social groups, institutions or cultural practices. This kind of memory, rather it will be mathematically elaborated by algorithmic devices that are in turn programmed by groups such as the acronym GAFAMI, that is Google, Apple, Facebook, Amazon, Microsoft and IBM. So in the end, the perspective of a political surveillance that we are talking about here today, we know that it’s something that we must fight against. But the surveillance by the market, many of us let this kind of thing happen. So is this correct? I think there is a kind of reification of the practice when we just give to GAFAMI, for example, our data in change of visual and informational kinds of consume. So look, I don’t agree with any kind of surveillance, but it’s a fact that we all practice it on different scales, the culture of following on social networks, the culture of attention that we all share a little bit, a little bit more, a little bit less, our surveillance practice also. So that we carry out in intimately and in a valid way. So I find this paradigm difficult to overcome. And right now, the answer is that I don’t know how we can solve this problem. But I know that in stats, these types of questions is important.

Juliano Cappi: Ricardo, thank you so much. And then we close the session with Marielza Oliveira. Please, Marielza, the floor is yours.

Marielza Oliveira: Thank you, Juliano. Well, it was a fabulous exchange. Thank you very much for this. I’m going to close with a very simple thing. Curation is a political economic process. It’s as simple as that. We have to ask, whose memory is being preserved? you know, why do we care to preserve memory now if we didn’t care so much before, you know, and the proof that we didn’t care so much before is simply that physical archives are being led to rot, you know, essentially, you know, you go into warehouses of documents that are, you know, exposed to, you know, floods and fires and mildew and simply neglect, and we haven’t really digitized everything that should have been digitized, you know, since the beginning. One of the simplest statistics is that about 40% of the birth records of people above 60 years of age are still on paper and not digitized, you know, it’s simply because there’s this tremendous backlog of content that have never been digitized to begin with, and we simply don’t do it, we don’t get to it, we keep looking at the, you know, creation of digital records, you know, new digital records, you know, the birth records of the young children that are born now, but we forget that we haven’t done it, you know, equally for everyone, that we left behind the older generations, for example, since they didn’t, you know, start with digital archives to begin with. So we have to really ask this question, you know, and the reason why we are caring so much about, you know, digitization right now, the preservation of memory, is that we found that it has a value, a monetary value, because, you know, if the point is not to preserve memory, the point is to create data that then feeds into generally, you know, generative AI and other, you know, mechanisms such as that, you know, that then can be monetized and drive the economy for other reasons. you know, for other purposes. We are looking at the AI, not at the memory. But we need to really ask, you know, about the memory. How do we preserve it? And as a matter of fact, you know, there are very simple things that need to be done. First, we need to digitize more. Digitize, close the digitization divide that exists. For example, the content for older generations that is still, you know, on paper, or if it’s not on paper, has been digitized, is on outdated, obsolete formats that need to be brought into, you know, this new format. Then we need to look at this issue of indexing that was mentioned before. You know, the fact that we need to really think about how, you know, we create the mechanisms for searchable data, you know, because just creating data is insufficient. We need to look at searchable data. And the cleaning up of, you know, quite a lot of content that is, you know, literally toxic, or, you know, cleaning up and separation of this content. We have already the common crawl, you know, which is done once a month, you know, about once a month, crawling the entire internet. So for Global South, you know, developing the technologies and the methodologies to really mine the common crawl, to extract what is the collective memory in particular countries would be a huge thing. But we also need to build capacities of people, you know, to preserve their own memories, you know, because we don’t, we can’t just preserve memory for somebody else. We need to allow them, to give them the tools to preserve their own memories, the ones that are meaningful for them. Because actually, the terrible thing is that. only 2% of what gets on the internet, you know, it gets preserved hardcore, and only about 10% get preserved overall. It’s increasing, you know, because more data centers are being built because companies such as the GAFAM are really investing to the point that they are building nuclear reactors to power these data centers. They value it so much. But we need to value our own memories as well as, you know, the global south. And think about where do we store it in terms of data sovereignty as well. You know, how do we keep access to these memories if, you know, and to this content, if it end up, you know, ends up being, you know, switched off somewhere else. And that’s, you know, that degrades and the quality of our own, you know, collective memories in different countries. So I’m going to stop here and say thank you for the chance to have this fabulous conversation.

Juliano Cappi: Thank you so much, Marielza. Thank you, all panelists, Ricardo, Samik, Carlos Afonso. We had a great panel and this is a first initiative to debate the challenges related to memory, collective memory online. Hope that we can have further discussions in considering this event, which is in the core of internet governance on this debate. Thank you and we now finish the session. Thanks a lot for everyone. Thank you, everyone. Bye bye. Bye bye.

Bianca Correa

Speech speed

128 words per minute

Speech length

906 words

Speech time

422 seconds

Rapid disappearance of online content

Explanation

Bianca Correa highlights the issue of online content disappearing quickly. She cites a study showing that a significant portion of web pages from the past decade are no longer accessible.

Evidence

A study by Pew Research Center found that 25% of web pages from 2013-2023 are no longer accessible as of October 2023. For older content, 38% of web pages from 2013 are unavailable today.

Major Discussion Point

Challenges of preserving collective memory online

Agreed with

Marielza Oliveira

Carlos Alberto Afonso

Ricardo Medeiros Pimenta

Agreed on

Challenges in preserving online content

Marielza Oliveira

Speech speed

123 words per minute

Speech length

3421 words

Speech time

1659 seconds

Selective digitization and storage due to high costs

Explanation

Marielza Oliveira discusses the high costs associated with digitization and storage of online content. This leads to selective preservation of information, with much of what is produced being discarded.

Evidence

Less than 10% of produced content is stored in data centers. The amount of online data has grown from 2 zettabytes in 2010 to an expected 181 zettabytes in 2025.

Major Discussion Point

Challenges of preserving collective memory online

Agreed with

Bianca Correa

Carlos Alberto Afonso

Ricardo Medeiros Pimenta

Agreed on

Challenges in preserving online content

Differed with

Carlos Alberto Afonso

Differed on

Approach to preserving online content

Dominance of English and Northern countries’ content online

Explanation

Oliveira points out the disparity in online content representation, with a dominance of English language and content from Northern countries. This results in an unequal representation of global perspectives and languages online.

Evidence

46% of online content is in English. Out of 7,061 languages in the world, less than 300 are in use online.

Major Discussion Point

Biases and inequalities in digital memory preservation

Agreed with

Samik Kharel

Agreed on

Biases and inequalities in digital memory preservation

Obsolescence of storage formats

Explanation

Oliveira discusses the problem of obsolete storage formats leading to loss of digitized content. She emphasizes that as technology evolves, older storage formats become inaccessible, resulting in loss of archived information.

Evidence

Example of CDs becoming obsolete as storage medium, with computers no longer including CD players.

Major Discussion Point

Technological challenges in memory preservation

Carlos Alberto Afonso

Speech speed

113 words per minute

Speech length

1783 words

Speech time

945 seconds

Lack of internet archiving in Global South countries

Explanation

Carlos Alberto Afonso highlights the disparity in internet archiving services between Global North and South countries. He points out that many countries in the Southern Hemisphere lack significant internet indexing services.

Evidence

Map showing countries with significant Internet archiving services, with most of South America, Africa, and parts of Asia lacking such services.

Major Discussion Point

Challenges of preserving collective memory online

Agreed with

Bianca Correa

Marielza Oliveira

Ricardo Medeiros Pimenta

Agreed on

Challenges in preserving online content

Differed with

Marielza Oliveira

Differed on

Approach to preserving online content

Loss of indigenous languages and cultures

Explanation

Ricardo Medeiros Pimenta

Speech speed

104 words per minute

Speech length

1616 words

Speech time

929 seconds

Broken links and vanishing government websites

Explanation

Ricardo Medeiros Pimenta discusses the issue of broken links and disappearing government websites. He highlights how this affects the preservation of important public information and historical records.

Evidence

During the COVID-19 pandemic in Brazil, many news stories and information from government websites had their links broken, especially in 2019.

Major Discussion Point

Challenges of preserving collective memory online

Agreed with

Bianca Correa

Marielza Oliveira

Carlos Alberto Afonso

Agreed on

Challenges in preserving online content

Memory preservation as a political agenda

Explanation

Pimenta argues that memory preservation is inherently political. He emphasizes that the process of preserving or rewriting memory is a result of political struggles and power dynamics.

Major Discussion Point

Political and economic aspects of digital memory

Challenges of algorithmic governmentality in social existence

Explanation

Pimenta discusses the concept of algorithmic governmentality and its impact on social existence. He argues that this new regime of truth poses dangers to how memory is preserved and accessed.

Evidence

Mentions the role of tech giants like Google, Apple, Facebook, Amazon, Microsoft, and IBM in programming algorithmic devices that shape our online experiences and memories.

Major Discussion Point

Emerging technologies and future of collective memory

Samik Kharel

Speech speed

147 words per minute

Speech length

2251 words

Speech time

914 seconds

Exclusion of marginalized communities from digital discourse

Explanation

Kharel discusses how emerging technologies like AI and large language models are changing the way we interact with and construct collective memories. He emphasizes the need for equal access to these technologies.

Major Discussion Point

Emerging technologies and future of collective memory

Potential of generative AI in memorializing historical figures

Explanation

Kharel mentions the use of generative AI to create interactive experiences with historical figures. This technology allows for new ways of engaging with and preserving historical memories.

Evidence

Mentions the ability to ‘talk’ to historical figures like Rousseau through AI chatbots.

Major Discussion Point

Emerging technologies and future of collective memory

Need for inclusive participation in AI-driven memory preservation

Explanation

Kharel emphasizes the importance of ensuring participation from marginalized communities and the Global South in AI-driven memory preservation efforts. He argues for inclusion and multilingualism in these technological advancements.

Major Discussion Point

Emerging technologies and future of collective memory

Alex Moura

Speech speed

97 words per minute

Speech length

156 words

Speech time

95 seconds

Lack of storage capacity for scientific and educational data

Explanation

Alex Moura raises concerns about the lack of storage capacity for scientific and educational data in Brazil. He points out that this is an ongoing problem for universities and research institutions.

Major Discussion Point

Ricardo Medeiros Pimenta

Selective digitization and storage due to high costs

Memory preservation as a political agenda

Unexpected Consensus

Impact of emerging technologies on memory preservation

Marielza Oliveira

Samik Kharel

Ricardo Medeiros Pimenta

Obsolescence of storage formats

Impact of AI and large language models on memory construction

Challenges of algorithmic governmentality in social existence

Despite coming from different backgrounds, these speakers all addressed the significant impact of emerging technologies on memory preservation, highlighting both challenges and opportunities. This consensus suggests a growing recognition of the transformative role of technology in shaping collective memory across various contexts.

Overall Assessment

Summary

The main areas of agreement among speakers included the challenges of preserving online content, biases and inequalities in digital memory preservation, the importance of cultural and linguistic diversity in digital archives, and the impact of emerging technologies on memory construction.

Consensus level

There was a moderate to high level of consensus among the speakers on the key challenges and issues surrounding digital memory preservation. This consensus implies a shared understanding of the complex nature of preserving collective memory in the digital age and the need for multifaceted approaches to address these challenges. However, there were some variations in the specific focus areas and proposed solutions, reflecting the diverse backgrounds and perspectives of the speakers.

Differences

Different Viewpoints

Approach to preserving online content

Carlos Alberto Afonso

Marielza Oliveira

Lack of internet archiving in Global South countries

Selective digitization and storage due to high costs

While both speakers acknowledge the challenges in preserving online content, Afonso focuses on the geographical disparity in archiving services, particularly in the Global South, while Oliveira emphasizes the economic constraints leading to selective preservation.

Unexpected Differences

Role of AI in memory preservation

Samik Kharel

Ricardo Medeiros Pimenta

Impact of AI and large language models on memory construction

Challenges of algorithmic governmentality in social existence

While both speakers discuss AI’s impact on memory, their perspectives differ unexpectedly. Kharel sees potential benefits in AI for memory preservation, while Pimenta expresses concerns about algorithmic governmentality’s impact on social existence and memory.

Overall Assessment

summary

The main areas of disagreement revolve around approaches to content preservation, the role of technology in memory construction, and the political implications of digital memory.

difference_level

The level of disagreement among speakers is moderate. While there is general consensus on the importance of preserving digital memory, speakers differ in their focus areas and proposed solutions. These differences reflect the complexity of the issue and the need for multifaceted approaches to address the challenges of preserving collective memory online.

Partial Agreements

Both speakers agree that memory preservation has political implications, but they differ in their focus. Pimenta discusses it as a broader political struggle, while Kharel provides specific examples of how political parties actively shape online narratives.

Ricardo Medeiros Pimenta

Samik Kharel

Memory preservation as a political agenda

Use of “cyber armies” by political parties to shape online narratives

Similar Viewpoints

Both speakers emphasized the importance of preserving diverse cultural perspectives and languages in digital memory, particularly focusing on indigenous and marginalized communities.

Carlos Alberto Afonso

Samik Kharel

Loss of indigenous languages and cultures

Need for inclusive participation in AI-driven memory preservation

Both speakers highlighted the political and economic aspects of memory preservation, emphasizing that decisions about what to preserve are influenced by costs and power dynamics.

Marielza Oliveira

Ricardo Medeiros Pimenta

Selective digitization and storage due to high costs

Memory preservation as a political agenda

Takeaways

Key Takeaways

Preserving collective memory online faces significant challenges including rapid content disappearance, selective digitization due to high costs, and lack of archiving infrastructure in Global South countries.

There are major biases and inequalities in digital memory preservation, with dominance of English and Northern countries’ content, and exclusion of marginalized communities.

Technological challenges include obsolescence of storage formats, difficulties in capturing complex web content, and need for robust backup systems.

Memory preservation is inherently political and economic, with curation processes shaped by power dynamics and monetization incentives.

Emerging technologies like AI and large language models are reshaping how collective memory is constructed and accessed online, raising new challenges and opportunities.

Resolutions and Action Items

Develop technologies and methodologies to mine the Common Crawl for preserving collective memory in Global South countries

Build capacities of people to preserve their own meaningful memories online

Increase efforts to digitize older content still in paper formats or obsolete digital formats

Improve indexing and searchability of preserved digital content

Consider data sovereignty issues in storing and accessing preserved memories

Unresolved Issues

How to address the digital divide in memory preservation between Global North and South

How to ensure preservation of underrepresented languages and cultures online

How to balance privacy concerns with the need for comprehensive archiving

How to fund large-scale digital preservation efforts, especially in developing countries

How to mitigate biases in AI-driven memory preservation and retrieval systems

Suggested Compromises

Focusing preservation efforts on select high-priority content given limited resources

This comment cuts to the heart of the issue by framing digital memory preservation as a political and economic process. It raises critical questions about power, representation, and the motivations behind preservation efforts.

impact

It prompted a deeper examination of the underlying forces shaping digital memory, encouraging participants to consider issues of data sovereignty, representation, and the economic drivers of digital preservation.

Overall Assessment

explanation

This question addresses the potential consequences of the disparity between how long information persists online versus how long it remains in public consciousness.

Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.