Connecting open code with policymakers to development | IGF 2023 WS #500

11 Oct 2023 00:00h - 01:15h UTC

Event report

Speakers and Moderators

Speakers:
  • Cynthia Lo, Private Sector, Western European and Others Group (WEOG)
  • Carolina Pierafita, Intergovernmental Organization, Latin American and Caribbean Group (GRULAC)
  • Mike Linksvayer, Private Sector, Western European and Others Group (WEOG)
  • Helani Galpaya, Civil Society, Asia-Pacific Group
  • Henri Verdier, Ambassador for Digital Affairs Ministry for Europe and Foreign Affairs of France
Moderators:
  • Cynthia Lo, Private Sector, Western European and Others Group (WEOG)

Table of contents

Disclaimer: It should be noted that the reporting, analysis and chatbot answers are generated automatically by DiploGPT from the official UN transcripts and, in case of just-in-time reporting, the audiovisual recordings on UN Web TV. The accuracy and completeness of the resources and results can therefore not be guaranteed.

Full session report

Helani Galpaya

Accessing timely and up-to-date data for development objectives presents a significant challenge in developing countries. It can take up to three years to obtain data after a census, leading to outdated and insufficient data. This lag in data availability hampers accurate planning and decision-making as population and migration patterns change over time. Additionally, government-produced datasets are often inaccessible to external actors like civil society and the private sector. This lack of data transparency and inclusivity limits comprehensive and integrated data analysis.

Another issue is the lack of standardisation in metadata across sectors, such as telecom and healthcare, especially in developing countries. This lack of standardisation creates challenges in data handling and cleaning. The absence of interoperability standards in healthcare sectors further complicates data utilisation and analysis.

Cross-border data sharing also faces challenges due to the absence of standards. This absence hinders the secure and efficient exchange of data, hindering international collaboration and partnerships. Developing more standards for cross-border data sharing is crucial for overcoming these challenges.

Working with unstructured data also poses challenges, particularly when it comes to fact-checking. There is a scarcity of credible sources, especially in non-English languages, making it difficult to identify misinformation and disinformation. Access to credible data from government sources and other reliable sources is essential, but often limited.

Efficient policy measures and rules are necessary to govern data usage while preserving privacy. GDPR mandates user consent for sharing personal data, highlighting the importance of differentiating between sharing weather data and personal data based on different levels of privacy violation.

The usage of unstructured data by insurance companies to influence coverage can have negative implications, potentially resulting in unfair risk classification and impacting coverage options. Ensuring fairness and equality in data usage within the insurance industry is crucial.

To address these challenges, building in-house capabilities and utilising open-source communities for government systems is recommended. Sri Lanka’s success in utilising its vibrant open-source community and building in-house capabilities for government architecture exemplifies the benefits of this approach.

The process of data sharing is hindered by the incentives to hoard data, as it is seen as a source of power. The high transaction costs associated with data sharing, due to capacity differences, also pose challenges. However, successful data partnerships that involve a middle broker have proven effective, emphasising the need for sustainable systems and case-by-case incentives for data sharing.

The evolving definition of privacy is an important consideration, as the ability to gather information on individuals has surpassed the need to solely protect their personal data. This calls for a broader understanding of digital rights and privacy protection.

In conclusion, accessing timely and up-to-date data for development objectives is a significant challenge in developing countries. Government-produced datasets are often inaccessible, and there is a lack of standardisation in metadata across sectors. The absence of standards also hampers cross-border data sharing. Working with unstructured data and fact-checking face challenges due to the scarcity of credible sources. Policy measures are necessary to govern data usage while protecting privacy. Building in-house capabilities and utilising open-source communities are recommended for government systems. The government procurement system may need revisions to promote participation from local companies and open-source solutions. Data sharing requires sustainable systems and incentives. The definition of privacy has evolved to encompass broader digital rights and privacy protection.

Audience

During the discussion, the speakers explored various aspects of open source, highlighting its benefits and concerns. One argument suggested incentivising entities to share data as a way to counteract data hoarding for competitive advantage. It was noted that certain organisations hoard data as a strategy to gain a competitive edge, but this practice hampers the accessibility and availability of data for others. Creating incentives for entities to share data, therefore, was emphasised as a vital step in promoting data openness and collaboration.

Conversely, the potential negative effects of open source were also discussed. The speakers raised concerns regarding the need to verify open source code and adhere to procurement laws. They specifically mentioned the French procurement law, expressing apprehensions about the ability to effectively verify open source code and ensure compliance with regulations. These concerns highlight the necessity for thorough scrutiny and robust governance measures when relying on open source solutions.

Building trust in open source was another significant argument put forth. In Nepal, for instance, there was a lack of trust in open source, hindering its widespread adoption across different sectors. The speakers stressed the importance of establishing mechanisms that enable the verification of open source code, ensuring its reliability and security to build trust among stakeholders. They also emphasised the need for capacity building to enhance knowledge and expertise required for verifying and utilising open source code effectively.

Overall, the sentiment surrounding the discussion varied. There was a negative sentiment towards data hoarding as a strategy for competitive advantage due to its restriction of data availability and accessibility. The potential adverse effects of open source, such as the need to verify code and comply with regulations, were also viewed negatively because of the associated challenges. However, there was a neutral sentiment towards building trust in open source and recognising the necessity for capacity building to fully leverage its benefits.

Mike Linksvayer

Mike Linksvayer, the Vice President of developer policy at GitHub, is a strong advocate for the connection between open source technology and policy work. He firmly believes that open source plays a crucial role in making the world a better place by supporting the measurement and information of policy makers about developments in the open source community. Linksvayer expresses enthusiasm about the potential of sharing aggregate data to address privacy concerns. He sees promise in technologies like confidential computing and differential privacy for data privacy and recognises the importance of balancing privacy considerations while still making open source AI models beneficial to society.

Mike Linksvayer emphasises the crucial role of archiving in software preservation and appreciates the contributions of Software Heritage in this field. He highlights the separation of preservation and making data openly available. Linksvayer sees coding as unstructured data and acknowledges the importance of data collection in research on programming trends and cybersecurity. Collaboration in software development is facilitated by platforms like Github, which provide APIs and open all events feed, enabling the sharing of aggregate data. Linksvayer believes that digital public goods, including software, data, and AI models, can be effective tools for development and sovereignty, addressing various Sustainable Development Goals (SDGs).

Promoting and supporting open source initiatives is essential, according to Linksvayer, as they drive job creation and economic growth. He cites a study commissioned by the European Commission estimating that open source contributes between €65 to €95 billion to the EU economy annually. Linksvayer also stresses the importance of cybersecurity in protecting open source code and advocates for coordinated action and investment from stakeholders, including governments.

In summary, Mike Linksvayer’s advocacy for open source technology and its connection to policy work underscores the potential for positive global change. He emphasizes the importance of sharing aggregate data, advancements in data privacy technologies, and the promotion of digital public goods. Linksvayer also highlights the economic benefits of open source and the critical need for investment in cybersecurity.

Cynthia Lo

During the discussion, several key points were highlighted by the speakers. Firstly, Software Heritage was praised for its commendable efforts in software preservation. It was mentioned that the organization is doing an excellent job in this area, but there is consensus that greater investment is needed to further enhance software preservation. This recognition emphasizes the importance of preserving software as an essential component of data preservation.

Another significant point made during the discussion was the support for assembling data into specific aggregated forms based on economies. This approach was positively received, as it provides a large set of data that can be analyzed and utilized more effectively. The availability of aggregated data based on economies allows for better understanding and decision-making in various sectors, such as the public and social sectors. This aligns with SDG 9: Industry, Innovation and Infrastructure, which promotes the development of reliable and sustainable data management practices.

One noteworthy aspect discussed by Cynthia Lo was the need to safeguard user data while ensuring privacy and security. Lo mentioned the Open Terms Archive as a digital public good that records each version of a specific term. This highlights the importance of maintaining data integrity and transparency. The neutral sentiment surrounding this argument suggests a balanced consideration of the potential risks associated with user data and the need to protect user privacy.

Furthermore, the discussion touched upon the role of the private sector in providing secure data while ensuring privacy. Cynthia Lo raised the question of how public and private sectors can collaborate to release wide data sets that guarantee both privacy and data security. This consideration reflects the growing importance of data security in the digital age and the need for collaboration between different stakeholders to address this challenge. SDG 9: Industry, Innovation and Infrastructure is again relevant here, as it aims to promote sustainable development through the improvement of data security practices.

In conclusion, the discussion shed light on various aspects related to data preservation, aggregation of data, user data safeguarding, and the role of the private sector in ensuring data security. The acknowledgement of Software Heritage’s efforts emphasizes the importance of investing in software preservation. The support for assembling data into specific aggregated forms based on economies highlights the potential benefits of such an approach. The focus on safeguarding user data and ensuring privacy demonstrates the need to address this crucial issue. Lastly, the call for collaboration between the public and private sectors to release wide data sets while ensuring data security recognizes the shared responsibility for protecting data in the digital age.

Henri Verdier

In this comprehensive discussion on data, software, and government practices, several significant points are raised. One argument put forth is that valuable data can be found in the private sector, and there is a growing consensus in Europe about the need to promote knowledge and support research. The adoption of the Data Sharing and Access (DSA) policy serves as evidence of this, as it provides a specific mechanism for public research to access private data.

Furthermore, it is argued that certain data should be considered too important to remain private. The example given is understanding the transport industry system, which requires data from various transport modes and is in the interest of everyone. The French government is working on what is called ‘data of general interest’ or ‘Données d’intérêt général’ to address this issue.

The discussion also highlights the importance of data sharing and rejects the idea of waiting for perfect standardization. It is noted that delaying data sharing until perfect standardization and good metadata are achieved would hinder progress. Instead, it is suggested that raw data should be published without waiting for perfection. This approach allows for timely access and utilization of data, with the understanding that standardization and optimization can be addressed subsequently.

The protection of data privacy, consent, and the challenges of anonymizing personal data are emphasized. The European General Data Protection Regulation (GDPR) is mentioned as an example of legal requirements that mandate user consent for personal data handling. It is also noted that anonymization of personal data is not foolproof, and at some point, someone can potentially identify individuals despite anonymization attempts.

Open source software is advocated for government use due to its cost-effectiveness, enhanced security, and contribution to democracy. France has a history of utilizing open source software within the public sector, and there are laws mandating that every software developed or financed by the government must be open source. The benefits of open source software align with the principles of transparency, collaboration, and accessibility.

The discussion also addresses the need for skilled individuals in government roles. It is argued that attracting talented individuals can be achieved through offering a mission and autonomy, rather than relying solely on high salaries. The bureaucratic processes of government organizations are criticized as complex and unappealing to skilled workers, indicating a need for reform to attract and retain talent.

In conclusion, this discussion on data, software, and government practices emphasizes the importance of a collaborative and transparent approach. It highlights the value of data in both the private and public sectors, as well as the need for data sharing, open source software, and data privacy protection. The inclusion of skilled individuals in government roles and the promotion of a substantial mission and autonomy are also seen as essential for effective governance. Ultimately, this comprehensive overview underscores the significance of responsible data and software practices in fostering innovation and safeguarding individual rights.

Session transcript

Cynthia Lo:
us this morning. It’s a little bit early for individuals on site. Today we’re talking on a workshop on connecting open code with policymakers to development. And an agenda that we have here today, we’re going to go through a round of introductions and then overview of connecting open code with policymakers. Then we’ll move directly to our panel discussion and then a Q&A. Feel free to ask questions also to our online participants as well. I’m going to hand it over first to Mike Linksvayer to do an introduction. One second. We have slight technical difficulties. And so one moment while I fix that. I’m going to hand it over

Mike Linksvayer:
to Halani and never mind, we have Mike now. Hey, thanks a lot for resolving those technical difficulties. I’m sorry I can’t be there in person. I’m Mike Linksvayer. I’m the VP of developer policy here at GitHub. Former developer myself who’s now been doing policy work focused on making the world better for developers and helping developers make the world a better place. Kind of open source is a big part of the way that happens. And I’m really excited about measuring it and informing policymakers about what’s going on. And so I’m really excited about this panel. Great. And I’ll pass it over to our speakers here, Halani and Anri.

Helani Galpaya:
Is there a specific question? Just to introduce yourself. Okay. I’m Halani Galpaya. I’m the CEO of LearnAsia. It’s a think tank that works across the Asia-Pacific on broadly infrastructure, regulation and policy challenges, but with a huge focus on digital policy. Thank you.

Henri Verdier:
Hello. Good morning. I’m Henri Verdier, French ambassador for digital affairs. Just to mention that I’m not a career diplomat, I was a French entrepreneur a long time ago and I used to be the state CIO for France.

Cynthia Lo:
Great. Thank you. So we’re going to move directly to our panel talk. And to start, let’s talk a little bit more about challenges from unmet data needs. So let’s start with Helani here. What are some of the challenges that you’ve seen over the years on unmet data needs?

Helani Galpaya:
I mean, from a development perspective, understanding where we are in whatever those development objectives, that’s the starting point of any kind of development. And that’s a problem if there is no data. And particularly when it comes to developing countries, which is where I come from, this is a particular challenge, right? So traditionally we’ve relied on government-produced data sets, take for example the census. Every 10 years it’s supposed to happen. And low levels of digitization has traditionally meant it takes about three years after the census to actually get some data out in many countries, by which time the population has changed, the migration patterns have changed, and so on. But we know now there are obviously lots of other proxy data sets that we can use. But the timeliness is one concept. that we worry about in development because the data is slow to come by, even when it is available. The second unmet need is if you’re outside of government, is the availability of data to actors outside government. And frankly, within government, sometimes the data that’s collected by one department or ministry is not even available to others, right? So there’s a very low level of data access possible within government, and certainly for civil society and private sector outside government to access data. Many governments have signed on to open data, charters and all of those things, but really the data that they put out is sometimes not what most people need. It’s not usually in machine readable format, so you spend enormous amounts of time digitizing it and data-fying it. So these are sort of basic challenges and basically, I mean, from the government point of view, governance and regulation in particular, the oxygen that feeds that engine is data because there’s a huge data asymmetry between the government and the regulators versus the governed entity. Take telecom operators, for example, right? How are they doing? They have a lot more information about their operations than the regulators or the governing party would. So there’s really multiple data challenges that we have, and increasingly, the conversation is that the private sector data can act as a proxy to inform development, but negotiating that and accessing that is particularly hard. So there’s multiple data challenges in developing countries, particularly from our point of view as a research organization sitting outside government and outside private sector.

Henri Verdier:
Thank you for the question. 15 years ago or something like this, government understood. that open government data was very important. And together, we did work a lot to open our data. And then maybe we’d commit later our source code. And we learned some lessons, that those data could create much more value if more people can use it, that it was a matter of transparency, democracy, but also economic development, efficiency, and maybe citizenship. And more and more, we understood that government don’t have the monopoly of general interest. And some very important data are in the private sector. So it’s time, probably, and it’s the moment to start thinking deeply, even philosophically, about the private sector data. In Europe, first, there is a growing consensus that, first, we need to help research and to promote knowledge. There are a lot of topics where we have to know. I can speak about disinformation, some impact of social networks, but also climate change, or some important topics. We need more knowledge. And for example, if you look at the DSA that we did adopt last year, we do organize a specific access to private data for public research. Of course, I know that there are important issues, privacy, intellectual property, sometimes security, because if you share everything, you can allow reverse engineering and hacking, et cetera. But we can fix it. And for example, there is an important field of research regarding confidential computing. You can use the data without taking the data. So this is a growing consensus. And probably, we will have, collectively, this kind of consensus. the international community to make the public research stronger and to organize ourselves to be able to understand important mechanisms. But then there are also other actors that need access to those data. And for France, for example, first we do encourage the private sector to be more responsible. Let’s think, for example, about the transport industry. If you don’t have all the data, you have nothing. If you don’t have buses and taxis and personal cars and motorcycles and metro and train, you don’t understand the system and you cannot take good decisions. And this is in the interest of everyone, the public decision-maker, private actors. Everyone needs a good comprehension of a good knowledge of the system itself. So we do encourage cooperation, sharing the data, et cetera. Then we think that we can go further. Maybe you know the French economy Nobel Prize, Jean Tirole, he did publish a lot about the economy as a common good. And we consider that it’s time to conceive some incentive to make the private sector share some important data. And this year, in the French government, we are starting to work deeply on what we call data of general interest, the Données d’intérêt général. Because as I said, government don’t have the monopoly of general interest. And some data should be considered as too important to be allowed to remain private. Of course, this is complex because we need a legal framework to give a status to this kind of data. But really, we did open the case to create a status for some very important, impactful data and to decide that… And those data has to be open, even if they come from the private sector.

Cynthia Lo:
Perfect. I do also wonder, you mentioned one thing about policy and standards. A lot of metadata has very clear standards in financial markets, in healthcare, for instance. I’m curious to know, are there any unmet needs within the standards of metadata? It’s currently governed quite well, but is there anything, is there a certain standard that isn’t out there that could be?

Helani Galpaya:
I mean, the data we deal with, my data scientists deal with, telecom, mobile telecom network, big data, basically call detail records, for example, that we get from base stations, trillions of call detail records. These are not standardized under any means, right? In fact, the team spent four to six months cleaning up the data, because across, when you get data from four to six telecom operators, they’re actually not standardized, how the numbers are. So they’re not interoperability standards. There are many, many sectors where there isn’t interoperability standards. And of course, some of the coolest stuff that comes out is from unstructured data anyway, like social media data and so on. I think the financial sector has traditionally been well forward in this, but many other sectors haven’t. Health, I think, in developing countries is less developed, interoperability standards. And certainly for cross-border data sharing, this is a fundamental problem, right? Like when you look at taxation data, all of that, there’s a lot more work that needs to be done, I think, particularly when it comes to developing economies.

Henri Verdier:
Yes. I did join the French government 10 years ago to lead the open data policy. The lesson learned is that… If you wait for a perfect standardization and a good metadata, you will never do anything. When I did join the French government, we wanted to index every dataset through an index that was conceived during the Middle Age for the National Archives with 10,000 words. So it was quite impossible to publish a dataset because you had to come back to Philippe Lebel to decide where you did. So I take from the open data movement the idea that you share your raw data as they are and don’t wait. But it doesn’t mean that standards don’t matter. Of course they do. But let’s start by publishing. The second lesson is that maybe the apification process is more important than the indexation on metadata themselves. So first, during maybe five years, you did publish everything. But it was not always very useful, especially for data that has to be refreshed very frequently. And you need the last data and not just a data. So for this, we did then take three years to organize a proper API ecosystem. And again, people told me, then first you have to conceive a good architecture of the API system. I said, no, let’s build API. And then we will optimize the API system. So my lesson is that, and my personal experience, don’t wait for a perfect standardization because you will never accede to this goal. This is a moving target. So don’t wait.

Cynthia Lo:
Thank you. And I think that brings us to our next point quite well. You both highlighted this as well on private sector data for development purposes. And I know Mike also has some thoughts on that. But I’d love to know. You mentioned on private sector data, a lot of times it’s a little unstructured, but that’s interesting because you have, it’s wider. You can take a look and analyze that in an easier way. Tell us a little bit more on that, what has been a surprising find? On private sector data? Yes, and some unstructured data.

Helani Galpaya:
Some unstructured data that we work with include, for let’s say for misinformation and disinformation identification, automatic identification of mis and disinformation that spread across platforms in languages outside of English in particular. And there, I think, well, there’s sort of two types of problems. One is just the low levels of data. So I mean, even assuming you have all the language resources, like a language corpus that is needed to identify this on, you know, natural language processing, you at some point you’re going to need a fact base to check against, right? So there, the unstructured data is, well, structured or unstructured data comes from government resources and maybe other sort of credible sources, right? So you’re dealing with two types of data. To fact check numbers, you’re looking at usually trying to find government, and to fact check other things, you’re looking at reports and so on. And there’s a serious lack of data. So for example, if you look at like the big popular English language models, they are trained on millions of articles. We tried this in Bangladesh and Sri Lanka to fact check. We’re down to 3,000 articles that are credible, you know, sort of data sources that we can use to fact check against unstructured, you know. So we are working with a very, very limited universe of credible data that’s actually out there because there’s very little out there so I think that’s for us the biggest challenge.

Henri Verdier:
Sorry, it’s a very complex question. First I was thinking thatcompletely unstructured data are very rare because usually someone did produce the data and did pay something. So a data set is the answer to one certain question but usually it’s not your question. So they have a structure usually. Of course in the world of Internet of Things and some source and you have more and more quite not structured data but if you do observe we are living in a world of data with purpose so they have a structure. So the question again is to think about interoperability and to build bridges. One other question with unstructured data or with a minimum structure is that if you want to share the data to give them as many as more value as they can have you also have to protect other important securities like again privacy but not just privacy like interoperability and if you don’t really understand what is within the data you are not sure that you are protecting all the securities you have to protect. That’s why I pay more and more attention to the field of research as I said of confidential computing. We have to learn to work with the data to train AI model to ask question. For example let’s in France as you know we have an ancient and very structured social security system. So there is one database the social security with every prescription that every French doctor made during the last 20 years. Can you imagine this? 70 million people, every prescriptions made by a doctor during 20 years. And then we make a statistic archive. So you take 1% and we. Here, of course, you have a lot of knowledge and science. You can discover new drugs because you can discover that, I don’t know, someone that had a lot of head ash at the age of 20 don’t have Alzheimer’s 40 years later. And you can discover a new principle of some drug and a lot of things like this. But you cannot just open this kind of data because this is pure privacy. This is my health and your health. But you can organize a technical strategy to accede to this data without showing them. And if you do this, you can control a bit the people that are using this data. And if they don’t respect some laws or principles, you can disconnect them. So this is probably an important field. So again, I’m not looking for a perfect standardization. But we can organize the ecosystem of how to access to the data, when, why, and give another relation between knowledge and data.

Helani Galpaya:
And I agree with the minister. Some of the solutions are technical. We’ve certainly worked with differential privacy methods when we use call data records to still have the data be as usable to inform policy, but without revealing where an individual might actually be or what that person’s number is and all of that. The other part of the solution, I think, is policy, is to have some kind of governing structure to make sure that we are able to use it and preserving the privacy and having some sort of rules around what that. users, what the data is used for, like in the health care system, that insurance companies cannot use it and then drop, you know, private insurance companies cannot drop coverage, because they have so much more information about a set of users. Even if they’re not individually identifiable, once you’re in an insurance pool, you can identify that this is a much higher risk. So there’s sort of, you know, policy as well as technical solutions there.

Cynthia Lo:
I think on the privacy part, I’m very curious to also hear from Mike on what are your thoughts on privacy and private sector data. I’d love to know your thoughts on that, too, or anything to add.

Mike Linksvayer:
Okay, yeah. Well, I first would say that I should have said in my introduction and shouldn’t assume people know what GitHub is, where I work. It’s the largest platform where software developers from around the world come to develop software collaboratively, a lot of it open source. And there are a lot of themes I’m hearing that you can, I mean, software development is kind of a very specific thing, but I think there are a lot of themes we’ve talked about on structured data, APIs, and privacy that maybe I can paint a little bit of a picture about how it works with data about code developments and that the code that programmers are writing is data itself. And indeed, you could think of it as unstructured data. It’s a text file, but it also, each programming language has its own structure because it needs to be able to parse the individual statements. So it’s really a matter of how much work do you want to do and what are the questions that you have about, for example, software developments. And then APIs is another aspect you can, if you want to crawl all of the code, we call our repositories is where a particular project on GitHub or similar platforms are collaborated on. If you want to crawl all the code in the world, that will take you a long time and be very resource intensive. However, GitHub and similar platforms also make APIs available. So I think that’s another kind of common theme that we can look at how exactly that looks with code where you can both kind of do queries to ask questions about kinds of projects that you’re interested in, or you can kind of try to ingest all of the activity as it comes out because GitHub has a very open kind of everything, all events feed, but that also is extremely expensive to do. So as a, and some researchers who do kind of research around programming trends, I don’t know, cybersecurity, there’s a bunch of different kind of research areas that you can look at GitHub data to do. A lot of them spend a lot of their time kind of gathering data before they can even answer or kind of validate whether they’re asking the right questions. So one approach to that and dealing with privacy is publishing aggregate data that will be helpful for some use cases. And that’s what we’ve done with a new kind of initiative we have at GitHub. We’re calling the innovation graph, which is basically a longitudinal data per country, per economy. roughly country basis at various kinds of activity. And so that, and we did it particularly to inform policy makers and international development practitioners who want to understand, use that data to understand things like digital readiness within their sphere of influence. And to, we were able to, publishing aggregate data, kind of satisfy some of these use cases or at least allow us people to explore the aggregate data to figure out what they want to make an investment in, you know, crawling more. It also sort of neatly deals with the fundamental privacy questions that you don’t want to identify, you know, individuals and things like that. So you can do that by kind of, you know, thresholding a certain number of people have to be doing an activity within a country in order to report aggregate statistics on that. So that covers a lot of different themes I think we’ve heard covered there. And I think there’s a ton of promise in, you know, a range of technologies like confidential computing, differential privacy. And I’m excited about them all because developers are building them and a lot of the research slash the R&D is open source. But simple, I guess I’ll just highlight here that, you know, very simple approach, but kind of very low tech approach of, as a first, you know, step at sharing data can be just sharing aggregate data that doesn’t have any privacy concerns can be, you know, that’s, it’s actually very much kind of to Henri’s point about like sharing data before you do all of the standards work because that will. you might be waiting forever. Also sharing aggregate data is a way to kind of take that first step, share data that’s gonna be useful to a range of stakeholders, and then work on the harder part that might be pending more advanced technology to deal with the harder issues.

Henri Verdier:
Oh yeah, please. A small answer. First, we know and cherish GitHub. When I was a state CIO, France was the second public contributor, government contributor in GitHub. And I don’t know if you know, but by French law, every software that the government develops has to be open source and free software. So open source and freely reusable. And more than this, every time that the government use an algorithm to take a decision, he has to publish the source code, but also to tell to the citizens that we are using algorithms, and to be able to explain in a simple word how it works. So that’s an important and coherent policy. And regarding the structure or unstructured data, what I learned from my open data experience, as I said, the first duty is to share data as they are. And then some people will structure. And if we think again about GitHub, so as I said, we cherish GitHub, but we work a lot within GitHub. And then, for example, in France, I don’t know if you know the Software Heritage Project. So here, some researchers from the INRIA decided to build the biggest possible archive of every software. So taking GitHub, but also some dead forges like the Google one. And they are working harder to structure it now. to be able to track the genesis of a soft. So they are working. But we did allow this because we did publish unstructured software. And then some people can continue. And maybe someone will do better. I don’t know. But we will have a variety of experiences. So my lesson is to separate, first publish, and then structure. And you can have a diversity of attempts to structure if you have a common ground of raw data or software.

Cynthia Lo:
I think you mentioned a really interesting. Sorry, please, Mike.

Mike Linksvayer:
Yeah, I just wanted to add to that. Thanks for cherishing GitHub. I definitely cherish Software Heritage. And really, archiving is almost a third part that is also extremely important and, I think, under-invested in. So I think in the software preservation space, Software Heritage is doing an amazing job. But I think that’s preservation of data is something that can be decoupled from the making available unstructured. But I think it’s extremely important to think about.

Cynthia Lo:
Yeah, absolutely. I think we actually have a slide here as well on the innovation graph that Mike had mentioned. And I also saw in the audience here, we have Malakumar, who helped on the standardized metric research because we wanted to understand exactly what type of data would help and what type of data would public sector or the social sector require. And as you mentioned, we have the API, which is that large set of data that Anri mentioned first. And then now we’ve gathered all of the data sets into specific aggregated data based on economies, just in the pattern that Anri had mentioned. I’m not sure, Mike, if you want to mention anything on there. I think you also may be able to share your screen if you’d like. But also, huge thank you to Malakumar, who led that standardized research. metrics research, who’s joining us online.

Mike Linksvayer:
Yeah, I can share my screen briefly, if it would be useful. I’m not sure what the, if folks will be able to see in the room. So maybe I’ll share and you can tell me whether you can actually see it in a useful way. Okay. Can you see, see anything on the screen? Yes. Yes. Okay, great. I think I’m sharing a window that has the page for France and the innovation graph. So this is just to show that we have a bunch of data on a per economy, on a per economy basis. Some of them are fairly technical. Git pushes is basically code uploads to GitHub and you can see that summer vacation actually happens. And repositories, as I was saying, this is the kind of unit of a project on GitHub and similar platforms have using the same concept. Developers, those are people actually writing the code or in some cases doing design around software project. Organizations, which is kind of a larger unit of organizing projects on GitHub that sometimes correspond to a real world organization, sometimes do not. Programming languages, this can be very useful for thinking about skilling within a country. And licenses are about copyright. And then probably, oh, and then topics are, this is currently very unstructured. Basically maintainers on GitHub can assign keywords to their projects, but this can also be, so it’s kind of very noisy data, but can be. helpful in, you know, really diving into, like, identifying a set of projects that you want to study more. And one thing that I’m excited about, so you can tag with any kind of text. So even going forward, people might tag that, you know, your project is relevant to a particular sustainable development goal. And so you’ll be able to kind of navigate the tags in that way or the topics in that way. And finally, perhaps most interesting and new is this kind of trade flow diagram. You can see economies that France is collaborating with that developers are kind of sending code back and forth. So you see U.S., Germany, Great Britain, Switzerland. It’s unsurprising that those are some of the top ones. You can also combine all the member states. And this is kind of a first release. There’s obviously a lot of other exciting analysis that can be done. The data is actually open in the repository. You can see the data here. And, you know, at the end of the day, data can be extremely boring. This is literally a CSV file. But that boringness is fantastic because it means that, you know, you can use your tool of choice, whether it’s a spreadsheet, a Jupyter notebook, or something fancier to analyze the data. And then I’ll just show really quick the reports that Cynthia and Mala mentioned, worked on. And that kind of really drove our requirements for this project, looking at what kinds of data about software development would actually be useful for international development, public policy and economics practitioners. So did a lot of, you know, discussions with entities that are part of the data development partnership, for example, to help design this. And then I also pulled up Software Heritage because I’m a big, big fan of it. They have a page on here that I can’t find immediately kind of showing all the different. projects that they indexed. But I cherish that, too. So anyway, I’ll stop sharing. If people later have questions about a particular country or metric, happy to share again.

Henri Verdier:
Yes, thank you. Very, very promising. We did agree, apparently, that the best policy is to first publish and think later. But we also have to think and to understand. I observe that we are more and more living in a world of interdependent, free, and open source software. And there are dependencies and security issues. If we don’t understand a bit the very structure of the soft ecosystem we are living in, we have to face important concerns. We can remember log4g, for example. We can observe that sometimes when we discover a security failure, because we don’t know the story of the evolution of the code, the forks, et cetera, we are not able to correct everything because we don’t have a proper vision of the history and the evolution of the code. And probably that’s a very important new frontier. We have to build new tools and new approaches to understand to control this very complex system of softs. Do you agree?

Helani Galpaya:
Yeah, I completely agree. I think Sri Lanka, just one example, has a really vibrant open source community. So this kind of data, if they are using GitHub primarily, could be really interesting to understand the evolution of that community as one thing. But just on the many countries are technology takers and product takers when it comes to e-government systems, so don’t have the luxury of saying everything will be open. They’re buying software from big companies. which will not certainly make the code open, right? Not even APIs, a very close, tight licensed system is what they’re buying. And I think as countries go along that technology maturity road, like Sri Lanka at some point came to the point where there was enough capacity with the CTO, with the government agency who was able to say, okay, we will build some of this in-house, I will use the open source community who’s working around the world to build some of these tools to set up the basic government architecture. But that takes a bit of time, I think, to get to this stage because the easiest thing is to get some donor money and to do a procurement of a closed system. And that’s really problematic, yeah.

Henri Verdier:
Small comment. When I was in charge, the budget for buying software in France was four billion euros a year. Half of it was consumer products, like, I don’t know, Windows. So for this, of course, we cannot negotiate. But half of this, two billion, were proper, yes, back-end system. And here, you can decide by law that in the procurement, the software has to be open. And we are trying to, we tried to do this, and now that’s quite a standard for French procurements.

Cynthia Lo:
I have many thoughts on that, because I’m very curious, we’ve been talking a lot during IGF about digital public goods and how that could be discovered a little bit more. But that is maybe a little bit off course, but maybe think a little bit about that, I think.

Mike Linksvayer:
Well, actually, don’t, if I could interrupt. It’s actually not off course in a way, at least maybe I can tie it in. I think the, and maybe I’ll share my screen again really, really quick. I mean, this might have been something we’re planning to talk about later, but I think it’s a good opportunity to actually. So this that I’m sharing now is the digital public goods registry, which. in parts which digital public goods could be software, could be data, could be AI models, could be a lot of different things, but it’s mostly software. In fact, you can see the breakdown here between software data and content. And you can see that they’re all tagged in relation to a particular SDG. Part of the, I mean, a big part of the motivation here is we’re gonna find and share solutions, you know, to progress on the various SDGs. The same kind of concept can be useful to kind of just basically curation of information about open projects is its own data project in a way, and can be very helpful in not reinventing the wheel, finding that, you know, a government or civil society institution is already, you know, serving a particular need in that software was developed in country A and people in country B can maybe take it and use it or customize it. And so they have a little bit more, I guess, sovereignty or autonomy to use those words that are quite popular now. See, and the way it’s really tied together, I think, is that the, yes, these are tools that can be helpful for development, for SDG attainment, for sovereignty, but it’s also a data projects kind of doing this kind of organization and, you know, which is its own effort. And I’ll stop sharing now.

Cynthia Lo:
No, thank you, Mike. I did also wanna highlight the Open Terms Archive, which I believe is a digital public good incubated with the government of France. Linking back to you mentioned on security. having ways to publicly record every version of a specific term, and I think it does tie in very well with security, and I was a little curious to go to the next slide about our topic on data, privacy and consent, and then also, widely, security. Would love to know some of your thoughts on how to really safeguard all the data that impacts the users. How should public or private sector provide data that is secure and ensures privacy? It is a big question, and there’s no perfect answer, of course, but another way to think about it is if there’s one suggestion for private sector data that are thinking of releasing data sets, if they release a wide set, is there anything they should keep in mind before doing so?

Henri Verdier:
Yes, that’s a very complex question, and there is no silver bullet. In Europe, we started with principles, so the GDPR, which started in France in 1978, decided that regarding personal data, data speaking about you, the consent of the user is needed, so it’s mandatory. So then we had to… You can conceive legal approaches or technological approaches, and for example, I’m very interested in an Indian project, Digital Empowerment and Privacy Architecture, that does organize technically a way to check the consent in a way that tries to be an infrastructure to unleash innovation. This is not a burden, this is an infrastructure for innovation. So you can implement it. on various approaches, and some are better than others, but there is a strong principle there. And for example, just to mention it, there is also a legal controversy between France and Anglo-Saxon countries because we consider personal data as something like your body. You are not the owner of your body. You cannot decide anything regarding your body, and you cannot decide anything regarding your personal data. There are some fundamental rights. In the world of the copyright, this is a different approach, and that’s great. We can extend. But in France, we have strong commitment that you cannot treat personal data as an average data.

Helani Galpaya:
I think that this approach many countries are taking, seeing a difference between sharing weather data, for example, and very different from personal data. I think we talked about it earlier as well. I think what the minister is talking about is sort of the policy legal, and then we talked about some of the technical solutions. And I think at a practical level, there’s private data, but there’s also commercially sensitive data. So our approach, for example, was to say we will not work with one telecom operator’s data because that’s highly commercially sensitive where the base stations are, which direction it’s facing, the power on those base stations, et cetera. We said we’ll go into this sort of kind of data and analytics to understand where people live, where people move. All of that is possible with mobile network data, but we will only do it if we have more than one company contributing data, and then we sort of anonymize at a company level. Like the base stations are not known whether it’s company X or Y. So the more data that you pool, that brings another level of protection on commercially sensitive data in our case, yeah.

Henri Verdier:
Yes, of course. Statistic anonymization can be. useful for some purposes. If you want to make epidemiology for example, if you want to understand where people, population goes in case of natural disaster, if you want even to check if France or Germany did respect more the lockdown during the Covid, do you know that we did respect the lockdown more than Germans? Yes, we learned this through operators data because of course everyone including me would have bet that German would have been more strong. So you can have a very important use of statistic data, but except this approach I think that you can never really anonymize a personal data, the data describing one person. You can delay the name, the age, at some point someone will find you. So if you want to build knowledge regarding one person, someone, here you need other approaches like confidential computing, technological solutions.

Helani Galpaya:
I agree and I think sort of it depends on the situation and what the company is releasing data for, right? I think what we’re saying is at aggregate level there’s a lot of use you can make out of it. You don’t need anything that’s even remotely identifiable, you can talk about groups of people. But I mean Covid was a classic example to understand movement that was good enough. Facebook check-in data was being used in some governments to see where people are, but at some point if you’re looking at an outbreak and then you’re trying to contact trace using data, then that’s a very different level of privacy violation and you need the legal backing to say okay this is a national emergency and I’m now going to actually identify who owns that cell phone because we need to know where that person may have spread, you know, moved and then spread the virus. So it depends on the question you’re asking really, what company data can do and what the safeguards should be.

Cynthia Lo:
Thank you. I also want to make sure, give an opportunity to Mike, if you have any thoughts on safeguards and privacy and consent on private sector data being released.

Mike Linksvayer:
I think really all of the key points have been covered already. So I don’t think I really have anything substantive to, I mean, directly on point to that, but I feel it’s related to another thing that’s happening now that’s kind of related to open data and open code, which is a debate around how open, quote, open source AI has to be. And a lot of, and the reason why there’s a link is because a lot of times data can’t be fully opened for privacy and other reasons. And yet society can still benefit from having some of the outputs of that training often called the model. And so there’s kind of a debate about what kinds of sharing of data that’s being used to train and open AI model makes it open or not. To some extent, this is a very academic debate, but at the same time, it could end up being, you know, reflected in law as, you know, because it’s often recognized that that open source might need special treatment because of its non-proprietary nature, but, you know, it can be, there are kind of a bunch of different ways that you can, for a data corpus that’s used to train an AI model, the raw data is extremely useful, obviously, but there are other things that can be useful at all that can be useful as well. For example, a, you know, a description of the schema of all the data that you’re using so that other people can bring their own data. and replicate the model, if like two parties have access to similar private data sets, then they can be close substitutes for each other. So I think that’s like a burgeoning area that all these issues kind of come back together around.

Henri Verdier:
So Mike, this is not just an academic issue, it’s a question of which data you did use to train the model. First, you are in California, I feel. I have read that one of the important reasons of the screenwriters’ strike was generative AI, because they wanted to be sure that the work will be respected, so it can have a very concrete and important impact. And if we don’t pay attention to this, first we will delete all the international architecture of intellectual property, then we’ll create new disbalance and inequalities, because some big companies will take the profit of every creation of all humankind, because they will take everything, everything we did, dream, write, learn, publish, share, and they will use it to train some big monopolistic models. So from my perspective, this is not just an academic controversy, this is one of the most important topics of those days, and we have to be sure, and we can also think about security issues, security concerns. So the traceability, if I may, of how was this model educated is a very, very important issue, and we don’t have proper answers today, because you can…

Mike Linksvayer:
I agree with you, just to clarify the academic comment, it was exactly what can you call open or not, it’s this thing that’s somewhat academic, but the fundamental issues are of extreme importance, and I… and I really appreciate the, you know, French government’s direction around open source AI. It is extremely important.

Helani Galpaya:
No, I mean, just to say there’s like a million conversations about training data and the problems of using certain data for training. I don’t think this is the forum for it. Women, people of color, developing country people are at the receiving end of decisions made by models made that were trained on data that does not talk about them. So, you know, that’s a whole other field. So I don’t think we need to talk about it. Just to say that completely agree the issues around training data are very real and huge. Another important concern is the definition of privacy itself. Because 10 years ago, to protect my privacy, I just had to protect my personal data and I was protected. Today, I can know a lot about you without knowing anything about you. Because I will educate a model and it will predict something about you. So I cannot protect myself just while protecting my personal data. And not living in the digital world is no longer a safeguard against not being profiled. You can profile me even if I have no email address, no presence online.

Cynthia Lo:
On privacy, I think being able to layer in different data sets and as a result, you have a profile of a person. I think it is fascinating to see different data sets. I think as I’m looking at the time now, I want to move on to our last point on promoting and supporting open code initiatives. Considering all of the topics we talked about with security, safeguards, privacy. What is the best way to really promote open code initiatives and how can member states do so?

Henri Verdier:
So first, there are more and more approaches, and that’s great. So you have a strong European policy, for example. You have a network of open source officers in European governments. You have, I did mention the French law, it was named La Loi pour une République Numérique, so La Loi pour une République Numérique, that imposed to the government to publish everything in open source and fully reusable. We are promoting this is a European foundation for digital commons because we want Europe to take its responsibility and to contribute to finance commons that are important for for freedom and sovereignty and self-determination. So there are a lot of initiatives, but the more I work on this field, the more I observe that financing is not enough, and maybe it’s not the most important part. Really using free software, open source, contributing, allowing your public servant to contribute, paying attention. For example, when we did prepare the DSA, we did quite kill Wikipedia, because we said companies with more than, I don’t remember, 400,000 connection amounts in more than seven European countries has to be a legal representative in every European state. For a big tech company, that’s not very expensive, but for Wikipedia, that’s very expensive. So we need a conviviality, we need a proximity, we need a constant interaction, we need a mutual understanding, and this is maybe the most difficult today.

Helani Galpaya:
I want to add just two things to this, which is, I think, one is capacity. Public sector has very low technical capacity in many of the majority world countries in the developing world, and the expectation of, except for a handful of public sector officials, anyone else being able to contribute code, I mean, it’s a dream for many countries, so maybe what we need are, and that’s great if you can do that, and that’s kind of the aspirational stage you want to be. So instead of that, another solution is to build the communities, because the private sector is a lot more evolved and highly skilled, right? So it’s like, I keep going back to Sri Lanka, you know, the really vibrant open source community, highest number of contributions to Apache, for example, right? So that comes from Sri Lanka, that comes from, you know, being in high-paid export-oriented software companies, but a couple of people really getting this community together to create this. So how can they participate in government-related stuff? I think that needs two things. One is that community building, but they can’t participate in government procurement. That’s really hard. Government procurement is a system that puts out a bid and gives points to a company that has done this ten times before in five reference countries, right? A group of people who come together who don’t have that references, it’s very hard to signal that they can do this. So I think there’s some problem there. Then at a practical level, I think if you want to maybe, you know, not go all out, but at least give some preference for open source, some governments, what they do is, you know, out of a hundred, allocate five to ten extra points, which you get as a bonus if you are proposing an open system. And there’s variations on open, completely open, you know, free and open, you know, open, you know, open AIs, et cetera. So, you know, a gradated set of systems marks in the procurement. So different types of companies can at least have a hope of participating and competing against the large firms. This is kind of the same strategy that governments in the South have used to promote local companies when it comes to government procurement of IT systems. It’s very hard to compete with, you know, I mean, I, for example, purchased on when I was in government pension systems, right? A big company will come and say, I’ve done pension systems in five of these companies, five countries. It’s very hard for a local company. So then we say, well, if you at least have a local partner in the first year for technical support, in the second year for actual deployment, you get five marks. So the same way, you can build up this sort of legacy of open source by allocating marks over time in procurement systems.

Henri Verdier:
It totally takes a point. And that’s interesting, because if you do observe the story of governments, they had technical skills to build bridges, roads, railways. And there is something different in the history of IT. Maybe because the story started in the military era, as you know, with projects to launch rockets from a submarine. And it was, from the beginning, very big procurement, very expensive, with very bizarre rules of conducting projects. And governments should learn to work with ecosystems, as you say, to be maybe a bit more humble, to learn about agile methodology, to agree to start with an imperfect project and to improve it, to have a constant improvement policy. So this is a cultural change. And just to finish, maybe it will be time to conclude. That’s why, from my perspective, there is a strong connection between. open source movements, open government movements, because you need to learn humility, to be an actor within a network of actors, and a state modernization, and maybe the new democracy that we need with collective intelligence, citizen engagement, participation, contribution, et cetera. You cannot work just on one of the three topics. You need to cross the three topics.

Cynthia Lo:
Perfect, thank you. And I think looking at the time, we are almost at time, but before we go to Q&A, I want to make sure, Mike, if you have any thoughts as well on this topic of promoting and supporting open code initiatives.

Mike Linksvayer:
Sure, first, I mean, everything already said has been great, and I have too many thoughts, but I’ll just say one thing. I think what doesn’t get measured doesn’t get paid attention to. It’s fantastic that we have free and open source software advocates within government now, but a much broader set of policymakers need to appreciate the role that open source plays in the economy and development, et cetera, and that’s kind of one of the motivations of the innovation graph that we launched, that we want to, if you want to see numbers that are kind of tuned to your jurisdiction, then you can look at those even if you don’t have a fundamental appreciation of open source and understand that it’s a really big driver of jobs, economic growth. People have used GitHub data to show that more, including policies that support, that foster open source leads to more startup formation, more jobs, and things like this. There’s a really important study from the, or commissioned by the European. Commission, I guess, several years ago kind of putting a floor on the contribution of open source to the EU economy of, I believe, the range was like 65 to 95 billion euro a year. So quite significant and would love to see that replicated in, you know, in other jurisdictions in a way that’s very legible to policymakers who don’t know anything about, don’t have any affinity for open source, don’t know anything about technology necessarily. So I think those making it legible is super important.

Cynthia Lo:
Thank you, Mike. And I think before we move to our Q&A, particularly open source in the social sector, there’s a lot of organizations that work in the social sector that are also open source. We mentioned Digital Pop Goods and there’s also research in India, Kenya and Mexico taking a look at what were the drivers for social sector open source organizations. How are they funded? What are their initiatives as well that I think in another section we can explore more on open source in the social sector? OK, and I believe also Malakumar was instrumental in leading that research. As we move to our Q&A section here, opening the floor up to anybody who has any questions here in person, please.

Audience:
Hi, good morning, everyone. My name is Sumanna Shrestha, I’m a parliamentarian from Nepal, and I was very curious to attend this because I have a lot of questions. So the first one is how do you incentivize these entities to actually share data? When you think about different sectors that exist to improve the society, you’ve got private sector, obviously. you’ve got government, and you’ve got a very influential INGOs and UN that work. So how, what are some of the ideas, what has worked maybe in Sri Lanka or other parts of the world to incentivize these different actors to actually share data in whatever format, right? Whatever privacy setting format. The reason I ask that is one of the things in my previous life before parliamentarian, what I’ve seen is there is a massive incentive to hoard the data and then come up with insights to then present and say, okay, I have some advantage over everybody else that then sort of warrants funding for me to go out and do something. It could be going and distributing relief material when there are earthquake or disasters, for example. Right, that’s one. And then it’ll be really great to understand a bit more on this French procurement law that you mentioned that you require a certain percentage to be open source. How did you, in Nepal we have a very big distrust, mistrust towards anything that’s open. They think anything that’s free is not good quality, et cetera. So we tend to procure, and you’re smiling maybe because we see the same problem. That’s exactly the contrary. If you have a closed system, you don’t know if there are back doors. Right, so I think maybe also, I understand, but how did you go about building that level of trust in open sources if you’ve seen, if there was something fundamental you did? I think it also maybe pertains to the capacity, right? How many people can actually have the capacity in Nepal to go check the open source code and then see if there are back doors? So what are some of the in-built assumptions that you have? and what are the maybe very focused attention that you paid to strengthen those pillars to then bring this level of trust in open source? I think let’s start with that.

Helani Galpaya:
Okay, I mean I’ll go on the data part I think. Sort of the superficial answer is it’s actually very difficult to get the incentives right for data sharing, right? Data is power and therefore the incentives are to hoard it whether you use it or not actually, that’s the interesting part. So we’ve spent the past year looking at public private data partnerships across Africa, Asia, Latin America, Middle East and the Caribbean and mapped like over 900 different partnerships around data and done some in-depth case studies and we see a couple of things. One is that data sharing is a really high transaction cost activity, right? Because capacities are different, particularly if you’re dealing with a large company and trying to get some data, you don’t even know who to reach because there are regional managers, marketing managers, somebody in San Francisco, et cetera, et cetera, right? So it’s high transaction cost and what that does is it privileges the really large companies because they can come negotiate with the government, spend the money and they can also enter a market and subsidize something with data with a very long-term view. So I mean Microsoft, for example, is a case in point where they can go and do something in a country that’s in the early stages of development, digitization because in 10 years when everyone gets a computer, you know that operating system is more likely to be a Microsoft one. those kinds of investments for the long-term in data partnerships, right? Many small ones don’t. So partnership building, this is why I said the easy answer is it’s difficult, because partnership building around data are really difficult. So the incentives have to be set up. So we talk often about this incentive of, you know, you can get data from Uber, if it is in Nepal, but I’ll talk about Sri Lanka, that has some percentage market share. Uber can give it to government or civil society to understand, you know, where people are or something. But actually, if you now combine with two other local taxi companies and share the data back with Uber and everybody in a non-commercially sensitive way, it’s now suddenly much more useful to Uber, it’s useful to the local person, it’s useful to the transport planning person in government as well. So you kind of find the incentive system that makes it worthwhile for the large and the small operators to come and play. And then you set up the technical infrastructure for data sharing, of course, right? And you give them the kind of confidence that says we are not going to share sensitive data, you know, like in the telecom example I gave. You also then put the legislation around it for telecom data in particular. We really had to sort of make sure that the telecom regulators didn’t have a problem. So you need sort of research exceptions or public policy or journalistic exceptions in data sharing, particularly if it comes to sensitive data. So, I mean, bridging those transaction costs and getting the incentives right, those are the broad principles, but really finding the incentives are a case-by-case basis. So we find the successful ones are often where like a middle broker is involved in getting these data partnerships going, right? Somebody who can convene multiple people. So a classic example would be in India, UN had a now defunct, but a Pulse Lab Jakarta, the UN data governance system sort of, you know. They would sit in the middle and convince government that they need to play in this data game, that they need to use private sector data. They develop that capacity, because government doesn’t automatically say, I’ll use private sector data, right? And sometimes governments can’t say that either, because the census department often has a rule that thou shalt conduct national surveys, not use call detail records for population projection, right? So they don’t give up, so work with government. Then bring like five different private sector players together. Sometimes it involves paying for the data, sometimes it’s setting up the incentive systems. The Global Partnership for Sustainable Development Data in Africa brought together the Group on Earth Observations, which allowed satellite data about Africa, like as a block, and to any country who wanted it, they made it available. So data brokerage also plays a role. I’m not saying government can’t be a data broker, but that role of a data broker is really important, because otherwise what you have is one-off data transactions, right? I mean, during COVID, everyone managed to get some Facebook data to understand where people were. That’s not really useful, because now COVID is over, none of those data is flowing anymore to government or to civil society. So to set it up in a sustainable way that you can understand development and use that data requires a bit more.

Henri Verdier:
Thank you for your very precise and important questions. First, as you said, most people of power has quite an instinct of hiding the data. But this is the old approach. First, obviously this is not the best global organization, as you can easily see in the bureaucracy, for example. When I did join the French government 10 years ago to lead the open data policy, sometimes four different administrations were sharing the same data with mistakes, and they did spend a lot of time and money to… to sell data between administrations of the same government. So it was unuseful, expensive, long. I discovered, because it was expensive, sometimes some administration did use very old data sets because they did buy it just every four years, for example, to the neighbor and with the same money because we are one state. So this is not the best global organization and maybe this is not the best strategy. What I’ve learned from the digital economy story is that platform strategies are better. If you have data, you share this and you become the center of the ecosystem and you have more influence, maybe less direct power but much more soft power. And the story of, I don’t know, Microsoft, Google, Amazon, is a story of people sharing their data, not of people hiding their data. So first, yes, this is a natural instinct but we have to fight it because this is a stupid strategy to hide your data. Then, regarding the controversies regarding open source, yes, in France, we usually consider that open source is the best security approach because you can check, you can contribute, so if you discover something, you can fix it. That’s funny because, for example, if you observe the story of European countries, now everything is converging but 20 years ago, the French public sector did use a lot of open source and free software and not the private sector and in Germany, it was the contrary. The German companies did use a lot of free software and not the German government. So you have also national histories, of course. It depends of your… But in France, probably, it’s also political. Most public decision-makers consider that open source is less expensive. And if it’s not, because sometimes it has cost, of course. But you will spend your money to pay national workers, not benefits in Seattle. So that’s a better use of your public money, because you create value in your country. And usually it’s less expensive. A better security, and maybe a better democracy. You know, in the Declaration of Human Rights, in 1789, we say that the government has to be accountable, that every citizen has a right to understand what the government is doing, and to check if this is the most efficient approach. So now, most of the governmental actions are made through big and complex systems. If you don’t have the right to understand the black box, you are not a perfect democracy. And you have to rely on someone that pretends to make the best, but you don’t know. So the mix of cost, security, and democracy makes that in front, this is not a controversy anymore. Most people in the public sector encourage this approach. If you need a strategy, you did ask for. The first easy step is about public procurement. I’m not speaking about buying software. I’m speaking about buying services. I remember 10 years ago, the city of Paris wanted a network of self-driving cars. But they did write in the procurement, and I will access to every data, and I will share it in open data. And the companies didn’t want to. But they said, that’s my market, my procurement. If you don’t accept, I will take another solution. So for water, for transport, for when you buy a service, or you delegate a public service, just think about writing one clause saying. And I will take the data, and I will share the data. That’s not so difficult if you have a competitive market. The second thing, of course, is to explain, to exchange, to build an ecosystem. Yes, to be frank, I don’t think that those strategy can be done if you don’t have any ecosystem. It can be an ecosystem of open source software. It can be an ecosystem of startup or a big tech company. I don’t care, but you need to work with the civil society or private sector. You need to work with outside of the government. If you cannot rely on some skills and competencies and energy and innovation and creativity, that’s very difficult. And regarding the La Loi pour une République Numérique, so to be precise, we wrote that every software that the government develop or we pay for development has to be open source. It was built on the premises of the law for free access to information. During the 70s, we decided, so we wrote that the citizen has a right to ask for every information regarding government action. So how did you pay? Where did the money go? And we did build on these premises. So of course, when we buy, as I said, a consumer product, we don’t ask for open source. But when we finance the development of the product or when we develop ourself, this is mandatory. Regarding the competencies, as you said, this is very often a problem. But you don’t really need very, very, very skilled people, because we are speaking about a simple IT. And sometimes, for example, just a funny story. Ten years ago, I did create also the job of chief data officer for the French government. And I did hire a great data scientist to fix and to build good public policies. And I did hire brilliant people, and we did help maybe 100 administrations to improve some public policies. And after four years, they went to me and they told me, this job is a bit boring. We did just use Excel software and linear regression. Because government has very structured data and very simple questions. You don’t need to make a generative AI on a big data with a big… You don’t need this to fix 80% of the problems. If you have simple people with simple software, but very focused to have an impact. And very often, we did build, for example, in France, the French ID system, France Connect. Which is used now by 40 million people every week. We are a small country regarding to India. So 40 million people is something in France. I did build it with six developers in six months. The global price was 600,000 euros. Of course, if I had decided to buy it to some big companies that you can imagine, it would have cost, I don’t know, 30 million euros. But when you do it yourself with simple principles, with this agile methodology I did mention. So make a first minimum viable product and then improve it. That’s not so expensive and you don’t need a Nobel Prize, if I may. You just need good and serious developers. And maybe one last thing. I was there when we decided this law regarding… So some people had concerns. So we decided to mention a cybersecurity exception. So if the cybersecurity agency say that publishing the code is dangerous, we won’t. It was five years ago. It did never happen. They did never find a software publishing the code was dangerous. So it was a security to make people comfortable, and it was never useful.

Helani Galpaya:
Let me just make a quick thing. I think this is quite amazing. Just one little challenge, depending on the structure of your civil service, is to attract people with skill to do this kind of development. You need to look at what other options you have. And particularly in South Asia, they can work for a global IT firm, usually for five to 10 times the government’s salary. And that’s a real incentive problem. So the way some countries deal with it is to have these other structures, like a government-owned private company that does a lot of this IT development, who don’t have to abide by government pay scales. And that then suddenly makes it attractive, somebody who wants to do civic tech, public technology, but also isn’t compromising and making low government salary.

Henri Verdier:
If I can say something, because that’s very important. So most of the people that went to work with me did divide their salary by two. But you can have very skilled and dedicated people if you give them a mission and autonomy. But if you ask them to divide their salary and to obey to a big hierarchy chain and to respect a stupid and a very complex framework, so you have to give them a mission, a real mission. Let’s fight unemployment. Let’s educate. Let’s end kind of autonomy. And that’s why we have to change the way we do organize bureaucracy. But that’s not impossible. And actually, a lot of countries did it. And more and more, I feel. And always with people coming from the private sector. Private is a big, important open source ecosystem. It can be also Wikipedia, GitHub, OpenStreetMap. In France, we work a lot with the OpenStreetMap community. Linux, Debian. It’s not always private firms. But that’s outside of the government.

Cynthia Lo:
Thank you. And taking a look on our virtual attendees, we have some questions on whether there are government tools regarding securing data. Just double check. And potentially, I think let’s start with that first. If there’s any thoughts on that. If not, we do have another question as well.

Mike Linksvayer:
I have a small comment on that that might not be directly addressing it. But I just want to highlight how important basically cybersecurity is for protecting data. If you have a breach to an exploit, then your data is exposed, no matter what other measures you have taken. And I want to kind of tie that back into the previous discussion. I think the idea that open source is more secure because everybody can audit it and see exploits and fix them is sort of true. But also a little bit of a double-edged sword and can actually be useful and is very pertinent in policy conversations now. Because one analogy is that open source is free, but it’s also like a free puppy that you have to take care of. And due to incidents like Log4J, I think the attention of policymakers has been focused that open source is part of our societal infrastructure. And it’s something that we can’t only rely on the developers of individual projects to adequately secure. So there needs to be kind of investment from a bunch of stakeholders, including governments in making sure that that ability to for everybody to review the code and make fixes that actually acted on. And Germany is really a leader in this with the sovereign tech funds, but there are others in the U.S. Open Technology Fund and kind of others brewing. But I think that’s a really important point that potential for open source being more secure actually needs to be actioned and needs coordinated action. And I think in sort of another way that this kind of loops back on itself is that those decisions about where to invest, what open source code is actually no critical for power plants, for elections or whatever, you actually need data to be able to identify where you make those investments. Otherwise you’re boiling the ocean. So it’s really tangential, but it just basic cybersecurity is just absolutely crucial for protecting data.

Henri Verdier:
You’re completely right. Open source creates a possibility to check, but someone has to do it. I have another funny experience. In France, we had interesting free bureautic suit, so Word, Excel, it was named Framasoft. And during the COVID, the Ministry of Education decided and said publicly. I will use Phramasoft. And the people from Phramasoft did yell and contest this, and they said, are you crazy? Are you really considering to put 1 million teachers and 10 million students on my infrastructure without giving me anything? But I will die. You have to finance infrastructure, servers, or you will kill me. And that was funny because it could have been seen as a big victory. That’s the French Ministry of Education, one of the biggest international administrations, bigger than the Red Army. So it could have been seen as a victory, but it was the kiss of death. And so we have to be serious and to nurture and protect and finance this ecosystem, or we will kill it. There is no such thing as a free software, free lunch. Someone has to pay a bit.

Cynthia Lo:
Thank you. I know we are at time, but I want to double check to see if anybody has any questions in the audience here or online. All right. Well, thank you so much, everybody, for attending. Any concluding thoughts from our speakers here? Nope, not a problem. Well, thank you so much for everybody to attend this very early morning in Japan session. And we look forward to any other thoughts that you have on OpenCode on development. Thank you.

Audience

Speech speed

176 words per minute

Speech length

440 words

Speech time

150 secs

Cynthia Lo

Speech speed

157 words per minute

Speech length

1275 words

Speech time

488 secs

Helani Galpaya

Speech speed

177 words per minute

Speech length

3638 words

Speech time

1234 secs

Henri Verdier

Speech speed

155 words per minute

Speech length

5032 words

Speech time

1950 secs

Mike Linksvayer

Speech speed

160 words per minute

Speech length

3113 words

Speech time

1165 secs