To share or not to share: the dilemma of open source vs. proprietary Large Language Models

29 May 2024 17:00h - 17:45h

Table of contents

Disclaimer: This is not an official record of the session. The DiploAI system automatically generates these resources from the audiovisual recording. Resources are presented in their original format, as provided by the AI (e.g. including any spelling mistakes). The accuracy of these resources cannot be guaranteed.

Full session report

Experts debate the future of open-sourcing large language models at industry panel

In a dynamic panel discussion, industry experts from the Linux Foundation, Meta, the Future of Life Institute, Google, and the Wikimedia Foundation engaged in a deep dive into the complexities of open-sourcing large language models (LLMs). The conversation was framed around the benefits, challenges, and ethical considerations of open source versus proprietary AI models.

Jim Zemlin of the Linux Foundation opened the discussion by highlighting the integral role of open source in the development of modern computing systems, with a significant majority of code being open source. He pointed out the societal cost savings and innovation spurred by open source, emphasizing the need for clear standards to define what constitutes an open LLM. Zemlin also addressed the issue of market consolidation and the importance of open models in fostering trust and collective innovation.

Melinda Claybaugh from Meta presented the company’s nuanced approach to open sourcing, which is not a binary choice but rather a spectrum. She discussed Meta’s responsible open sourcing strategy, which includes releasing model weights but not training data, and the provision of responsible use guides and technical safeguards. Claybaugh also mentioned Meta’s support for the ecosystem through initiatives like the Lama Impact Grants, which encourage the development of localized AI solutions.

Isabella Hampton of the Future of Life Institute underscored the ethical implications of the open versus proprietary debate, advocating for a case-by-case approach to decision-making. She suggested that open source should be viewed as a tool rather than an end goal and called for creative thinking in developing alternatives when models are deemed too risky to open.

Melike Yetken Krilla from Google shared examples of Google’s transformative open source contributions, such as the transformer architecture and AlphaFold. She stressed the need for a responsible and cautious approach to openness, balancing innovation with risk management, and highlighted the importance of collaboration in creating standardization for AI.

Chris Albon of the Wikimedia Foundation spoke about the role of open source language models in expanding access to knowledge. He emphasized the importance of credit and sustainability for volunteer-driven platforms like Wikipedia and the value of transparency and adaptability in open source models. Albon also shared concerns about the potential for AI-generated content to disconnect users from the original sources of information.

The panelists discussed the integration challenges of open source LLMs, touching on accuracy, bias, and content moderation. They explored governance frameworks and policies to ensure responsible development and sharing of LLMs, agreeing that regulation should not stifle innovation but should be placed on those best equipped to manage it.

In conclusion, the panel highlighted the multifaceted nature of the open source debate, recognizing the global impact of open source AI, the democratization of technology, and the need for responsible governance. The discussion underscored the importance of transparency, collaboration, and nuanced regulation in fostering a sustainable open source AI ecosystem, balancing the benefits of innovation and cost reduction with ethical considerations and risk management.

Session transcript

Bilel Jamoussi:
Thank you very much for the great introduction and good afternoon. I really admire your energy to stay with us this late. It must be really an important topic. Moving into the next topic on to share or not to share in terms of open source and the large language models. So, as the panel has been introduced, we’ll go straight into the questions. And I’ll start with the Jim Zimland, CEO of the Linux Foundation. Open source has been a cornerstone for the Linux Foundation philosophy. Can you share your perspective on the benefits and challenges of open source large language models? How has open source of AI models benefited startups, academic institutions, and developers in resource constrained environments? Yes, I think you speak.

Jim Zemlin:
There we go. Thank you for having the Linux Foundation here today. And certainly, you know, we’re proud to be the home of the largest shared technology investment in the world, which is Linux, Kubernetes, PyTorch. Open source has been a free fundamental building block for all modern technology systems. 80%, 90% of the code in any modern computing system is open source code. Large language models wouldn’t exist without like PyTorch and other open source. It has been a way to provide, according to Harvard University, $9 trillion in savings for society by having this free building block for innovation. In large language models, we believe that they should be open in order to be able to examine these models, in order to be able to build trust in these models, in order to be able to collectively innovate in a wide range of things for all the good. The challenges are a variety. Part of the challenge is just simple market consolidation. There are layers of the way that you build large language models that are owned by very few. The more we can get tools like that in the hands of many, the better. And so we think that that’s something that’s really important and is a lesson that we can learn. The final challenge, I think, is an important challenge where standards can really help. Because while I can give you a very clear definition of open source software, no one can really give you a very good answer for what an open large language model is. And when you kind of describe it as open source, practitioners of large language models get upset about it. So one of the things that the Linux Foundation has been doing is creating something we call the open model framework. This is a new way to describe openness so that we can view the idiosyncrasies from the GPU all the way up to data about what would be required for various openness. And so starting with those kind of standards, we think are very important. Great. Just a quick follow-up, Jamie, if I may. Would open source LLMs help organizations like the ITU or other UN organizations to foster open collaborative exchange of data among different institutes like academia, startups, industry, and government? They already do. They already do. So when Meta open sourced LLAMA 3, within days, that tool was immediately used to advance large language model technology and different AI benefits in health care. I suspect you have people at the ITU using it right now and training models to share data and do good work, and you may not even know it. And so I think that it absolutely can provide high value. I think we have to look at all layers of the large language model supply chain beyond just the LLM, the weights, the data, but also at the underlying tools and even safety mitigation tools, which are now open source. Things like our C2PA project, which you just heard about from OpenAI, things that can detect intersectional bias. Those are open source tools now. So you’re already benefiting from open source. You may just not know it. Your engineers definitely know it.

Bilel Jamoussi:
Since you mentioned Meta, I’ll go to Melinda and ask you about Meta has made significant contributions to both open source and proprietary AI. How does Meta decide which projects to open source and which to keep proprietary? What factors influence these decisions? And what is open sourced algorithms, weights, and training data?

Melinda Claybaugh:
Great. Thanks for the question. So Meta has been a really strong proponent of open source throughout history, but in particular recently around our large language model, LLAMA. The most recent model we released was LLAMA3. And so I think although we’ve been a strong proponent of open source, what we really want to convey is that this is not a binary, which you were getting at. That I think we do ourselves a disservice if we think of things as open versus closed. There’s actually a real spectrum, and I think we can get distracted by the debate of what is open or closed. But what’s important is the nuance. And so when we think about open sourcing something at Meta, we think a lot about the risks. We think a lot about business reasons, other reasons. And we can’t sit here and say today that we will open source everything for the future. One interesting aspect to consider, I think, is maybe a limited open release to researchers. You know, open to researchers and then open more fully after there’s been some pressure testing and further risk assessment. So I think it’s really important to keep that nuance in mind. And the other thing I want us to keep in mind for this conversation is that open source does not mean no safeguards, no protections. And so at Meta, we take an approach of responsible open sourcing. We do release our model weights as part of our open approach. We don’t release the training data. That’s how we have been interpreted. That’s our open approach. But others take different approaches. And so for us, responsible open approach is all of the kind of testing and mitigations that are done from the data collection stage, filtering data, doing risk assessments and mitigations along the way. At the release point, releasing a responsible use guide for developers who are going to be building on our model. And then also open sourcing safeguards and other technical protections that developers can put in place when building on our guide. And then finally and most importantly, I think, is providing channels for developers to give feedback on the model, including risks and bugs, back to us so that we can mitigate and improve the model in the future.

Bilel Jamoussi:
Thank you very much. Isabella, the Future of Life Institute, from your perspective, from an ethical standpoint, what are the implications of keeping large language models proprietary versus open source? And how do these decisions impact society? And what considerations should guide organizations in this space?

Isabella Hampton:
Thank you for the question. So the key consideration that I think organizations should make is framing, perhaps, that open source is a means to an end and not the end itself. Open source is a tool that we can leverage to accomplish our goals. We’ve discussed many of those goals today. We want to ensure that AI models are transparent and ensure that the landscape remains competitive and that we don’t have a strong consolidation of power. We also want to, you know, just ultimately ensure that these models are safe and trustworthy. And so far, I think that the open source ecosystem has been really effective in accomplishing many of these goals. But looking ahead, I think it’s important that we maintain our focus on the goals and not commit ourselves to a single solution to accomplish those. I think open source will continue to be useful for the things that I mentioned before. But as, you know, an enormous amount of resources are being pooled into advancing the capabilities of these systems that we have relatively immature risk frameworks in managing, we’re going to have to come up with creative solutions to develop new tools beyond just open source. I echo the sentiment that was just mentioned that if we were on the stage to make a decision right now that we should open everything or close everything, I think that would be the wrong decision. Instead, I think that we should make our ability to evaluate capabilities and determine the impact on society more sophisticated and then make the decision to open models based off of that.

Bilel Jamoussi:
Great. Thank you. Google has a history of both open source contributions and proprietary developments. Are we already seeing the benefits or advantage, disadvantage from open or closed source models?

Melike Yetken Krilla:
Thanks very much for the question. As people may know, Google has been an open software and open science company since 2005 and that continues now with our generative AI and large language models. It won’t surprise you to know at Google we are tech optimists, so I’m going to start here with the benefits of our open source and open approach. First is looking at something called transformer architecture, which we released in 2017 that serves as the basis for all current LLMs. Second is a 3D protein structure prediction that we released called AlphaFold, which has studied over 200 million proteins and then released it to researchers and scientists in a free open database that has allowed for technological advances and discoveries in cancer treatments and in malaria vaccinations and enzyme capabilities, really transforming and catalyzing future benefits for all of society. Next, just last week we released an iteration of Gemini, our LLM, called Gemma, available for developers to look through, to be able to access, to use, to pull calls into, to create new models and to test certain technological advances. In fact, there is a group of developers in India that was able to create a new Gemini-based AI model using 15 native languages, Indian languages, but now people are able to communicate and use the model and test it. So I think there are a number of areas that sharing of this open material is really benefiting technology, but we need to balance that with being cautious about the risks and the capabilities and what exactly that openness looks like.

Bilel Jamoussi:
Great, thank you. Chris, the Wikimedia Foundation, as an organization that thrives on open content, how do you view the role of open source language models in broadening access to knowledge and what potential do you see for these models in enhancing or endangering platforms like Wikipedia?

Chris Albon:
Sure, so thank you for having me. I think, you know, Wikipedia, as our role in the world, is to share knowledge as much as possible, right? And so Wikipedia is one of the best things the Internet ever created, a huge pool of information created by humans, millions of hours of human time, that is freely available to anybody. And the reason that it is so popular as a training data source for large language models is because it is high-quality data created by real people. That is Wikipedia, and we don’t hide the information. If you wanted to download all of Wikipedia, we have a page where you can click and a zip file comes down and now you have all the content on Wikipedia. That’s how easy we make it. And so when we see large language models building tools off of Wikipedia, we are super happy. It is getting information out there, it’s helping spread research, it’s, you know, I think, I don’t track it super closely, but there’s definitely, every week there’s like two or three new papers that come out that use Wikipedia as a source of information, so we’re, you know, pushing science forward. That is awesome. For us, as a platform that provides information, our worry is not that, you know, a wacky artist makes something using Wikipedia data or a big company makes something using Wikipedia data. Our worry is around credit. So the information on Wikipedia is created by people. People who have jobs and have kids, and after they put their kids to bed and after they have a glass of wine, they sit down and they go and they produce information for no, you know, no payment for themselves, they just put it out into the world and give it to all of you. And I won’t have you raise your hands, but I bet you if I asked you who had edited Wikipedia, a huge number of people in this room would have made at least one edit to Wikipedia. And this pool of knowledge, we want this to continue for generations. That is our goal. And so, you know, our sort of ask for people who end up using Wikipedia is not to not use it. Absolutely use it. Our ask is that you help us make it sustainable, and that could be providing links. So if you use Wikipedia data when you do some AI summary, provide a link back to us, right? Provide a link back to Wikipedia so that the next generation of kids can come on and see something that they see is wrong and then open their textbooks and say, hey, that fact is wrong. I’m going to put that fact in there. And then they become the editor for the next 20 years, right? So that’s what we want to build is a world where there’s lots of cool things being created on AI, but there’s also this piece of information that is just this, you know, wonderful part of the Internet that has been going on for 20 years, that is owned by all of us and created by all of us in 330 languages around the world, that we continue doing that. And the way we continue doing that is making sure that users of that help us out and provide credit to the volunteers who do it through a link back or through supporting through other ways.

Bilel Jamoussi:
Great introduction of the value of Wikimedia and how Wikipedia and how it’s generated. But in terms of open sourcing large language models, do you see that as enhancing the value proposition or do you see any dangers with it?

Chris Albon:
Yeah. So Wikipedia right now, since I’m in charge of this at Wikipedia, we have over 400 machine learning models in production live serving to users. They’re not all large language models, but there’s 400 models. All of those models right now are open source. So you can go on. You can see all the code we use to train them. You can see actually our evaluations of the model. You can see our scores for the model. You can see our internal documentation on how we did the model. Everything we do is transparent. You can actually go on Tuesdays and see my team decide what to work on that week around ML in production. You can actually go into a chat room, which is my team’s internal chat room that we use to discuss any kind of server area, any kind of thing, and join in the conversation and talk to us about it. It’s live and open to the public. And that sort of transparency is how Wikipedia works. And so when it comes to open source, there’s lots of levels of open source. There’s open weights, and other people have discussed that. But there is so much benefit, and I really want to supplement what Jim said. So much of the world right now is built on open source software that I worry that we’re forgetting the value that we’re all reaping from this, that we as an organization that doesn’t have a lot of resources, we don’t write most of the code that comes into live production because we only use open source software. Everything on Wikipedia is open source. Either we wrote it ourselves or because we use open source libraries written by other people, often thousands of people, that when they fix a bug on their system, we benefit. And then when we fix a bug and push it to everyone else, they benefit. And that amount has led to a huge expansion in the ability of low-resource organizations to scale to, well, one of the biggest websites in the world, all for very little money. And that part of it is totally done because it’s open source. So when I see open source models, the value is that I can see everything about them. I can see the weights of the model. I can test it. I can experiment with it. I can adapt it to my individual use case. I can decide that I’m just not going to bother. It doesn’t look good. I can make changes, whatever. But transparency is always good in these things. Seeing how things work is always good. Not having to look at someone and say, hey, I just have to trust you because I can’t see what’s inside the black box. Forget the black box. Open it up to everyone else and show what’s happening.

Bilel Jamoussi:
Great. Thank you. So we’re going to go for a second round of questions. Thank you. So, Jim, back to you with the Linux Foundation. Given the rapid advancement in AI, how can the open source community address concerns related to security, misuse, and the ethical deployment of large language models?

Jim Zemlin:
So this is something where I agree with my colleague from Meta in that detail and nuance really matter here. You know, open source as a movement is very good at solving immediate problems. And, you know, we can conjecture about, you know, a theoretical AGI at some point in the future. But we’ve got immediate issues we need to deal with right now when it comes to large language models. And let me just give a couple examples. First of all, market consolidation. So you have to know at every layer of the stack where that’s happening. At the GPU layer, the lowest layer, CUDA is an API that everyone writes to that is owned by NVIDIA. And NVIDIA has done an amazing job with that. There is also an open alternative called UXL that is a more neutral, open source alternative that could allow for more access in the future to GPU technology. At the next layer, all the fundamental building blocks to build large language models, as you’ve heard, already open source. At the next layer, large language models, some of the data is open, some it is not. Some of the weights are open, some they are not. And understanding that is important. But having those open models, like Lama 3, allow for amazing benefits to society and level the playing field. It’s at the data layer which I think we have the biggest challenge in terms of open. You already heard Wikimedia has the best open data set for human knowledge. But we’re also trying to come up with ways to make data more seamlessly shared, more data liquidity, so to speak. We created an open source data license, an open data license agreement that addresses some of the legal problems around data. data sharing. We funded an organization called Overture Maps, which is about a $30 million effort to take publicly and privately available geospatial data and license it for free, normalize it, and allow it to train models in a way that is a many-to-many data sharing model. These are things that you can do to prevent consolidation of power and lower cost. Open source can also be incredibly helpful in the immediate areas around safety. First, in provenance and things like non-conceptual sexual imagery and deep fakes. I already mentioned C2PA as a provenance tool. We’ve been working on that for years. But there are also open source tools that the good folks in the open source community use to detect intersectional bias, to examine large language models in a more impactful way. So the open source community is really, really good at this stuff. I hate to say that the answer to problems created by tech, in this case LLMs, is more tech in the form of open source tools that enable good things. But in this case, the open source communities actually can add a lot of value there.

Bilel Jamoussi:
Great, thank you. Back to you. In terms of meta, what have been the tangible benefits and potential downsides of open sourcing some of your AI technologies? Can you share specific examples where open sourcing has significantly impacted meta’s AI projects or the broader community?

Melinda Claybaugh:
Yeah, so I think we’ve all given some examples of benefits of open source technology. And I think I’ll point you to, there was a paper that came out last week from the Columbia convening on openness and AI that really laid out a ton of use cases. I mean, there’s just, everywhere you look, there are really compelling use cases around open source large language models in particular. And a lot of these have to do with the ability of people in countries all around the world, particularly in under-resourced areas and populations, to build with and on high-quality technology for very individualized use cases. So whether that’s, there was an example, I was in Korea last week for the AI Summit, and people were talking about an example there of a local company built on Lama for very personalized math tutoring help, basically calibrated to Korean math curriculum. So very local, or allowing women in India to get very high-quality information about child care and health care for their kids. And so there’s examples everywhere you look in medicine and education, health care, biology, agriculture. And so every day, we’re hearing more and more of these use cases. And I think what is really unique about open source that enables this is that the cost comes down. You don’t need to have a model. You can build on a model. You can use open source tools to build on the model. So it’s open source layers of the stack. And you can produce something that is very, very tailored to your specific needs that are under-resourced and not getting attention. So I think the use cases are very compelling. I also think that, aside from the examples there, I just want to point out that Meta has funded Lama Impact Grants so that we are actively looking for developers with compelling use cases, working with them to refine them and put them into production, and give grants to some finalists who have really great ideas. So this is an ongoing project of cultivating an ecosystem and resourcing an ecosystem of developers who can build on open source models. And I think we’re going to see the rewards pay off more and more. Thank you very much, Melinda. Isabella, are there other ways of pursuing the open source objectives while keeping models closed source? Yeah, thank you for the question. So I want to reiterate that I think this decision should be on a case-by-case basis. So I certainly don’t think that open source plays no role in this ecosystem for solving some of the problems that we’re mentioning here today. But in the case of models that in the future we would deem too risky to open to the public, I’m excited about some of the work that we have coming up at the Future of Life Institute. We’re going to be launching grants around power concentration to think creatively about what other solutions there might be beyond open source for solving some of these problems. I’m also really excited about NAIR, the National AI Research Resource, which will give developers and researchers access to compute and data and other resources that they need to do safety research. But again, I think there needs to be a lot of creative thinking on alternatives in the case where we would choose to close a model.

Bilel Jamoussi:
Great, thank you. Malika, maybe I’ll come back to you on how should we decide if models should be open or closed sourced?

Melike Yetken Krilla:
Thanks very much for the question. As Melinda said earlier, I think it is not a matter of open or closed. This is a non-binary issue, and there are benefits to different versions of this. The idea is what are the stacks, what are the models, what are the standards and safeguards? I think as somebody said on this panel earlier as well, open source does not mean no safeguards. And so for us, when we’re looking at openness, we’re thinking about is it gradually open, fully open, API access, partially open? There’s a lot of different gradients that you can have. And with GEMMA, what I was talking about earlier, we’re trying to take a very responsible approach to and thoughtful approach to how we are looking at releasing these models and to whom. And so first, we’re looking at safety testing in advance. So what does that look like for ourselves? How are we red teaming before we launch it? What is the high bar for evaluating things, testing processes, running vigorous processes, and then identifying how and at what level to release? GEMMA is released at an open level, but not open source. And then on the other side, any researchers or scientists or developers that are using it have to commit to their own set of safeguards and standards by committing to avoiding harms overall. So that’s one component of it. And another is collaboration on standardization. ITU does this very well, but something we’re working with governments, civil society, and businesses is to look at the creation of standardization. And a good example of this is actually based on the US Executive Order on Artificial Intelligence. The National Telecommunication Information Agency, NTIA, is doing a really strong job convening all three of these pillars of actors to talk about these complex issues.

Bilel Jamoussi:
Thank you. And thank you for that kind words towards the ITU and the standards effort. Chris, I’d like to come back to you now in terms of you talked a lot about the benefits and the transparency. What challenges have you encountered or foreseen integrating open source language models into platforms like Wikipedia? And how do you plan to address issues such as accuracy, bias, and content moderation?

Chris Albon:
Sure. So one of the things, I’ll just the second part of the question first. One of the things that people often ask me is, well, is AI going to come and ruin Wikipedia? And the answer is that there is no barrier right now to any of you editing Wikipedia. Any of you can go open a page and edit a page, delete a word, add a fact. And that has been true for 20 years. And the model, the human model of how Wikipedia works is that you’re allowed to do that. But then as soon as you do that, someone else comes and says, oh, you’re not so right on that fact. Here’s a different source. And someone else jumps in and says, oh, no, I got this other source. And that model in a world of AI works amazing. Because it doesn’t matter if an AI comes and tries to spam the site. We have tens of thousands of humans who are helping out to catch that data, to catch that entry, to market as vandalism, to revert it, and to find other sources. When it comes to integrating tools, we’ve actually been using AI more to give tools to those human editors to help them work faster, to predict which edits are probably vandalism, which edits are probably ones that don’t use reliable sources, and that kind of stuff. But our worry is that in a world of AI is not that the model of Wikipedia, where everybody can edit in a free and open way that anybody can use that data, is not going to be valuable. I’ll use a term of art here. A new term of art is slop. Slop is the, if you’re really, really deep in the practical side, slop is AI-generated content that isn’t good. So it would be fake reviews on products or fake recipes on recipe sites. Our worry is not that there will be slop all over Wikipedia. Our thinking is probably that as the internet accumulates more and more slop over time, that users will actually come to sources of information that they know they can trust, including Wikipedia. And so we know that in this very moment, the work that those human volunteers do is more critical than ever. Our worry is that in this world where Wikipedia’s role, where humans generating knowledge is so critical, that there’ll be a disconnect between people who are generating knowledge and people are consuming, where they’re consuming knowledge. So if a LLM produces an answer to a question that you have about Geneva, and it doesn’t cite Wikipedia when it uses Wikipedia’s information, there’s a disconnect between how you receive that information and how that information was generated. And that part is really worrying, because we want to make sure that people are able to actually go to the original sources, can go and see the study that they use to produce that information, can actually go and maybe even become an editor if they want to become an editor. And so that kind of stuff is really the focus of our work, which is focusing on giving tools to those human editors, you’re all welcome to become one, to make the site even better, to help them use their time more efficiently, to help make editing the site more fun and joyful and effective. And open source AI tools, we found, are very, very useful. We’ve been using it for over 10 years for ASML tools, and the community has been using it long before us, trying to figure out what they should be editing at any given time, categorizing articles, sorting things, writing descriptions, summarizing text, translating text. It’s been a very common source of what we end up doing.

Bilel Jamoussi:
Fantastic. Thank you. Thank you, Chris. Final round of questions, and we have about 14 minutes left. And I know the moderator of the next session wants to start on time. I happen to moderate the next session. So to all of you, considering the potential for both positive and negative impacts of opening large language models in open source, what governance frameworks or policies would you recommend to ensure the responsible development and sharing of large language models? Think of also the European Act, AI Act. Maybe you can start with Jim and go around.

Jim Zemlin:
Sure. On the previous panel, Stuart, who I’m a huge fan of, talked about regulation in the airline sector. So all of the jet’s air traffic control systems that are regulated for our safety run on open source software. But the regulatory burden is on the airline industry itself. Because, and this is my point, put the regulatory burden on those who are most equipped to handle it. Upstream open source developers create, as we’ve heard over and over again, incredible innovation. And that innovation should be left open. The regulatory burden should be downstream. Open source can be a very useful tool to prevent market consolidation. It can be a very useful tool to co-develop safeguards, to provide trust and transparency. But I would put the burden of regulation not on banning open source or closing down this $9 trillion shared asset. I would put it on those who are best equipped to be responsible.

Bilel Jamoussi:
Great. Thank you.

Melinda Claybaugh:
I agree. Along those lines, though, I think what’s important is, if we’re talking about this nuance and this spectrum from open to close, is really understanding all of the players in the AI value chain and what is in their control to do. And putting the responsibilities, I wouldn’t say only on the downstream folks, but there are certain things downstream folks can do that an open source model developer cannot do, and vice versa. So I think we really need to avoid a kind of blanket approach to regulation. I have been a bit discouraged in what has happened globally so far that there is not enough space or recognition of the reality of the open source ecosystem and an attempt to kind of regulate all AI the same, regardless of whether it’s open or closed. And so I would like to see more nuance there. I think there are some helpful signs, though, in that regard. So there’s the one conversation that’s happening at the very high global level of the voluntary commitments, whether in the US or in the UK and Korea. And there’s the G7 principles and code. And that’s all kind of circling the same set of issues. And we’re seeing, I think, some nice convergence there. Then on a more kind of pragmatic, concrete level, there’s a conversation that’s happening in industry groups like Partnership on AI, in the Frontier Model Forum, in the AI Alliance, which is a group of open source companies that we’re a part of, in something like the NTIA and the US government policymaking efforts, where they’re really trying to dig into what is open source, what is closed, where should the responsibility lie, what is possible, what is not. And so I think we need all of these things. We need the high level principles, and we need the convergence around that. But then we also really need the evidence base and the concrete solutions as well.

Bilel Jamoussi:
Thank you. Isabella?

Isabella Hampton:
Yeah, so I touched on this a bit in my last answer. But I am optimistic about resources like Nair. But I also am optimistic about opportunities to maintain this public-private connection. I think it’s important that members of civil society are working closely to understand what is happening in the development of models, open source or otherwise, so that we can assist in developing these strategies to mitigate risks. And so we’re excited to be part of the NIST working groups that are doing a bit of that, and I hope to see more of it in the future.

Bilel Jamoussi:
Great, thank you. Melika?

Melike Yetken Krilla:
Thanks very much. I think AI is captured in three words, bold, responsible, and together. And we would say AI is way too important not to regulate. But how do we regulate it? And we need to figure out the 21st century solutions to these 21st century challenges. And I’ll tell a personal story in how I think about this. I’m Turkish, and the Ottoman Empire back in the 14 and 1500s, in the invention of the Gutenberg printing press, the Ottomans were innately very skeptical. And many of the calligraphers came and lobbied the Ottoman officials to say, we’re very concerned about misinformation. We need to control the writing of the books, the Koran, et cetera. And we want to ensure that this printing press is not allowed. And so the Ottomans banned the printing press for 200 years, while the rest of societies were developing economically, educationally, the masses were able to read. It really played an influential role in the future of the Ottomans. And so I say this because there is a balance needed in the regulatory action between embracing and allowing some of this innovation while ensuring competition and doing so together.

Bilel Jamoussi:
Thank you.

Chris Albon:
I think when it comes to regulation, I agree with Jim. I would love to see space for people, particularly people working on open source models, to be able to innovate. Some of the most wildest stuff that you’re seeing, so creative, cool activities, both around creating models and fine tuning models and evaluating models, is being done by small groups of volunteers and little Discord channels that you’ve never heard of. And I really want to make sure that we live in an environment that we have that safety, but also we provide a space for these folks that they’re not sitting on panels like this. They’re sitting behind their screen, arguing about teeny little pieces of code that are making these things possible. And that group of people, they don’t appear here, but they are the people that are making the next generation of open source tools they’ll use. One of them will figure out some great way of evaluating a model. They’ll put it on GitHub. It’ll start to share. It’ll start to grow. Companies will take it up. And then we’ll all be talking about it in a few years. And I want to make sure that that group of people doesn’t have to have $50,000 in lawyers to get started, or doesn’t have to navigate five regulations in five different countries because it’s an international group of people. I want to make sure that that little group gets to foster themselves really well, and maybe they succeed, and maybe they fail, and maybe they publish, and maybe they don’t. But if it’s going to be open, it’s going to help everybody. If it’s going to be in a place that we’re all going to benefit from that work, and they’re going to share what they’re doing, let’s give them a space. Let’s let them run and see what happens.

Bilel Jamoussi:
Thank you very much. It’s been a fantastic panel. I’ve learned a lot. And thank you for sharing with all of us the benefits and possible disadvantages of opening open sourcing large language models, also for decomposing the problem. It’s algorithms. It’s weights. It’s data. The fact that it’s not one size fits all, you have to make those decisions on a case-by-case basis. And generally speaking, the overall economic value of open source is something that has helped us get here and will be an important tool going forward. And maybe final word is, in terms of our next panel, is going to look at the international standards and the interplay between open source and standards. I think you alluded to that, Jim. So please, a big round of applause to our panelists, and invite the next panel.

BJ

Bilel Jamoussi

Speech speed

150 words per minute

Speech length

873 words

Speech time

348 secs

CA

Chris Albon

Speech speed

211 words per minute

Speech length

2209 words

Speech time

629 secs

IH

Isabella Hampton

Speech speed

163 words per minute

Speech length

401 words

Speech time

147 secs

JZ

Jim Zemlin

Speech speed

166 words per minute

Speech length

1278 words

Speech time

463 secs

MY

Melike Yetken Krilla

Speech speed

169 words per minute

Speech length

852 words

Speech time

302 secs

MC

Melinda Claybaugh

Speech speed

166 words per minute

Speech length

1414 words

Speech time

510 secs