Safe and Responsible AI at Scale Practical Pathways

20 Feb 2026 16:00h - 17:00h

Safe and Responsible AI at Scale Practical Pathways

Session at a glanceSummary, keypoints, and speakers overview

Summary

The panel opened by Shalini Kapoor highlighted that enterprises and governments hold vast amounts of information in fragmented PDFs and digitised documents, creating an “information divide” that limits AI’s ability to provide accurate answers [1-7][8-15][16-18]. She illustrated this with the example of a Nagpur entrepreneur unable to locate a biotechnology subsidy because the relevant government notification remained hidden in a siloed document, which LLMs could not retrieve [7-13][14-15].


Rohit Bardawaj argued that before data can be considered AI-ready, the ecosystem must agree on a clear definition and a shared framework that includes cataloguing, machine-readable metadata, context files and business glossaries [33-46][160-205]. He emphasized that such a framework should be open and federated, avoiding a single data owner and ensuring a data steward orchestrates the ecosystem [181-184][185-194].


Prem Ramaswami described Google’s Data Commons as an open-source platform that transforms diverse datasets into a machine-readable knowledge graph, enabling an AI search layer that can combine global statistics with local queries [55-63][64-66][69-71]. He noted that the system is designed to be bottom-up, allowing users to overlay their own CSV data onto existing public datasets, thereby reducing risk for small businesses making location decisions [277-283][298-302].


Ashish Srivastava added that real-world solutions suffer from data fragmentation, requiring interoperability, contextualisation through glossaries, and verification of declared data to be truly AI-ready [92-102][103-108][124-130]. He advocated for reusable policy artifacts (DPIs/DPGs) that can be automatically enforced at the API level, preventing reliance on manual human enforcement [228-236].


The participants agreed that LLM outputs can be unstable, prompting the need for benchmarks that test consistency across models and repeated queries [80-84][85-88]. A brief debate emerged over whether making alternative data AI-ready is primarily a governance issue or a technical one, with Rohit ultimately framing it as a governance challenge that requires standards and stewardship [162-170][176-180].


Shalini introduced the “data boarding pass” concept, a checklist-based certification that would allow organisations to certify data as AI-ready and facilitate secure, on-demand access [353-360][361-363]. She also referenced a “give-data-give-model” framework that ties incentives, value and exchangeability together to sustain a formal data economy [390-398][399-401].


The panel concluded that while building AI-ready data infrastructures is a long-term journey, collaborative standards, open tools and incentive mechanisms are essential to unlock the massive potential of data for both India and the Global South [408-410].


Keypoints

Major discussion points


The fundamental problem of fragmented, non-AI-ready data - Enterprises and governments hold massive information in PDFs, legacy systems, and siloed databases that lack trust, safety, and interoperability, preventing LLMs from delivering accurate answers. Examples include an entrepreneur in Nagpur unable to find a biotechnology subsidy because the notification is stuck in a document [7-15] and the massive compliance-query load of 3,000 entities handling 5 million new queries per year [23-27].


Need for a shared, institutional framework to make data “AI-ready” - Panelists stress that institutions (e.g., MOSB/NSO) must define standards, create federated governance, and provide catalogues, metadata, context files, and business glossaries so data can be safely reused. Rohit proposes a consensus framework and a “core + aspirational” AI-readiness model [33-46]; later he outlines concrete steps: machine-readable JSON catalogues, metadata, context files, and knowledge-graph glossaries [160-224].


Practical use-cases that illustrate the value of AI-ready data - When data is structured and linked, it can power diverse applications: government-level statistical analysis, MSME location-risk modelling, agricultural decision support, education-domain translation via glossaries, and health-worker tools. Prem describes how Data Commons can de-risk a shop-owner’s location choice by overlaying private sales data with 50 k public datasets [298-302]; Ashish highlights a journey-centric solution that enforces data policies automatically [228-235].


Trust, consistency, and benchmarking challenges - LLMs can return different answers for the same query, raising concerns about reliability. Rohit cites a study where identical prompts produced divergent analyses [75-82]; Shalini notes ongoing work on a benchmark to measure answer stability across LLMs and users [84-88]; Ashish stresses the need for guardrails, human-in-the-loop risk assessment, and verification of public data [125-130][126-130].


Building a sustainable data economy with incentives - The panel proposes mechanisms such as a “data boarding pass” checklist, the G-I-V-E model (Guarantee, Incentive, Value, Exchangeability), and differentiated licensing (free for research, paid for commercial use) to motivate data contribution and ensure long-term funding. Shalini outlines the boarding-pass concept and incentive framework [353-361][391-399]; Rohit clarifies public funding and commercial licensing for NSO data [380-388].


Overall purpose / goal


The discussion aimed to diagnose why large-scale data in India remains “AI-unready,” to propose institutional and technical standards that make data safe, trusted, and interoperable, and to illustrate how such standards can unlock high-impact applications for government, MSMEs, and the broader public sector while laying the groundwork for a formal data economy.


Tone of the discussion


– The conversation opens with a concerned, problem-identifying tone, highlighting data silos and trust gaps.


– It shifts to a collaborative, solution-focused tone as participants outline frameworks, open-source tools, and federated governance.


– Mid-session the tone becomes cautiously critical, emphasizing inconsistencies in LLM outputs and the need for benchmarks and guardrails.


– Toward the end it turns optimistic and promotional, showcasing concrete use-cases, the “data boarding pass,” and calls to action for audience engagement.


Overall, the tone evolves from problem-statement to constructive planning, tempered by realism about technical limits, and concludes with an encouraging call for adoption and partnership.


Speakers

Ashish Srivastava


– Area of Expertise: AI, data interoperability, contextualization, verification, agentic AI, education and health solutions.


– Role/Title: Practitioner; leads the AI Innovation for Inclusion Initiative (A4I) Lab – a collaboration between Microsoft and IIIT Bangalore; former head of a Gen AI company. [S1]


Prem Ramaswami


– Area of Expertise: Data Commons, knowledge graphs, AI-ready data, open-source data platforms, AI-driven search.


– Role/Title: Google – Lead for the Data Commons project (open-source stack, knowledge-graph integration). [S2]


Shalini Kapoor


– Area of Expertise: AI-ready data governance, data economy, policy, trusted and safe AI deployment.


– Role/Title: Chief Strategist, XSTEP Foundation. [S4]


Rohit Bardawaj


– Area of Expertise: AI-readiness frameworks, data standards, governance, metadata and cataloguing.


– Role/Title: Representative of Mosby (statistical agency that calculates GDP at village/taluka level). [transcript]

Speaker 1


– Area of Expertise: (not specified)


– Role/Title: Moderator/host (unspecified). [S7]


Audience


– Area of Expertise: Varied (participants asking questions on data platforms, business models, etc.).


– Role/Title: Audience members / questioners. [S10][S11][S12]


Additional speakers:


(None identified beyond the list above)


Full session reportComprehensive analysis and detailed insights

The panel opened with Shalini Kapoor (Shalini) describing a fundamental bottleneck: enterprises and governments hold vast quantities of information in fragmented PDFs, legacy systems and isolated silos. Because artificial intelligence-especially large-language models (LLMs)-“thrives on data” [2] and much of this data is “digitised but stays where it is” [6], AI cannot retrieve the answers users need. She illustrated the problem with a concrete case-an entrepreneur in Nagpur looking for a biotechnology-plant subsidy is unable to locate the relevant government notification because it is hidden in a siloed document, and her queries to LLMs and conventional search tools return nothing [7-13][14-15]. The “information divide” is compounded by a lack of trust in sharing data with AI systems [5-6].


The scale of the challenge was underscored by an example of an organisation that serves 3 000 entities and must handle five million new compliance queries each year [23-27]. Such a volume of “new compliances” generated by multiple government bodies creates a massive problem that can only be bridged if the data is made interoperable, useful and AI-ready [28-29].


Rohit Bardawaj (Rohit) then shifted the discussion to the need for a shared institutional definition of AI-readiness. He asked whether the ecosystem already has a “uniform definition” and argued that a consensus framework-comprising a “core + aspirational” model-is essential [33-46]. According to Rohit, AI-ready data must be accompanied by a machine-readable catalogue, rich metadata, a context file and a business glossary; without these artefacts the data cannot be safely and reliably consumed by AI [160-205][184-205]. He further stressed that any framework should be open, federated and avoid a single data owner, with a designated data steward orchestrating the ecosystem [181-184][177-184]. Rohit also described the MCP server-a lightweight connector that lets any LLM plug into a catalogued dataset via a standard URI, analogous to a USB-C socket, enabling seamless integration without leaving the user’s workflow [221-240].


Prem Ramaswami (Prem) presented Google’s Data Commons as a concrete, open-source realisation of that vision. Data Commons ingests diverse public datasets, converts them into a structured, machine-readable knowledge graph, and layers an AI search engine on top, thereby improving the chance that an LLM can answer a query correctly [55-58][59-60]. The platform is deliberately “federated”-each organisation retains local governance of its data while still contributing to a common graph [61-64]. Prem highlighted a bottom-up use case: a small retailer can upload its own CSV of store-level sales, which then automatically overlays with roughly 50 000 public datasets already in Data Commons, allowing the retailer to model location risk and de-risk decisions that would otherwise be “a costly shot in the dark” [277-283][298-302]. He also noted that AI can be statistically safer than human-only decisions, citing road-traffic-death statistics [144-147].


Ashish Srivastava (Ashish) added a practitioner’s perspective on why data must be more than just structured. He described how fragmented health and education datasets impede integrated decision-making, and argued that AI-ready data must be interoperable, contextualised (through domain-specific glossaries), and verifiable because many public surveys are merely “declared data” without independent validation [92-102][103-108][124-130]. In his own work, Ashish combines glossaries with LLMs to improve translation of domain-specific terminology [102-106][112-118].


The panel then turned to the reliability of AI outputs. Rohit recounted a recent paper by two undergraduates that showed identical prompts fed to the same LLM on the same dataset produced two different analyses, underscoring the need for benchmarks [75-82]. Shalini confirmed that her team is developing a benchmark to test answer stability across multiple LLMs and repeated queries, noting that “the same question … asked multiple times … can give different answers” [84-88]. Ashish reinforced this concern, stating that LLMs should be treated as a small component (10-15 % of a solution) and that robust guardrails, human-in-the-loop risk assessment and verification are essential to maintain trust [125-130].


When asked whether making alternative, secondary data AI-ready is a technical or governance problem, Rohit conducted an audience poll and concluded that it is primarily a governance issue that requires standards, a federated stewardship model and clear policy before any technical solution can succeed [162-170][176-180]. He reiterated the need for a data steward-potentially the National Statistics Office (NSO)-to catalogue datasets in machine-readable JSON, attach metadata and context files, and standardise codes and dimensions [181-221][184-221].


Shalini highlighted the tension between Retrieval-Augmented Generation (RAG) architectures and pure LLM approaches, emphasizing that data sovereignty and the need to keep sensitive data under local control prevent a single-model solution [310-322].


She introduced the “data boarding pass”, a check-list-based certification that signals a dataset has met AI-readiness criteria (catalogue, metadata, context, glossary). Once certified, the dataset can be instantly onboarded by B2B users, policymakers or researchers [353-363]. Shalini also presented the GIVE framework (Guarantee, Incentive, Value, Exchangeability) as a model for a sustainable data economy, arguing that incentives are needed for data owners to contribute and that value can be monetised while ensuring exchangeability [380-389][391-399]. Rohit clarified that the NSO is publicly funded, so research-use data is free, but commercial use is subject to a policy-driven pricing structure [380-388].


An audience member raised concerns about the business model for maintaining high-quality data platforms. The response highlighted that public funding covers research use, while commercial licences generate revenue, and that the GIVE model provides a “formalised” mechanism for pricing and incentives [372-376][380-388][391-399]. Shalini further noted that without a clear incentive structure, “the data economy is actually running without a formal mechanism” [390-399].


The discussion also revealed points of agreement and divergence. Both Shalini and Rohit agreed that statistical data collection is bottom-up, i.e., gathered at the field level rather than imposed centrally[269-276]. Prem argued that, despite imperfections, AI can be statistically safer than human-only decisions [144-147]; Ashish warned that AI should remain a minor (10-15 %) component of any solution, requiring extensive human oversight [125-130]. Technically, Rohit’s checklist-centric approach (catalogues, metadata, context files) differed from Prem’s emphasis on a knowledge-graph-centric, federated stack [184-221][55-64].


In conclusion, the panel converged on five pillars for AI-ready data: (1) a common, detailed definition that includes cleaning, linking, safety, trust, machine-readable catalogues, metadata and context files; (2) a governance-first, federated stewardship model to avoid single-point ownership; (3) the necessity of benchmarks and human-in-the-loop guardrails to ensure trustworthy AI outputs; (4) the importance of domain-specific glossaries or knowledge graphs for contextualisation; and (5) a sustainable data-economy model that aligns incentives, value and exchangeability. Action items include drafting the AI-readiness framework slide deck (Rohit), publishing machine-readable catalogues and glossaries (Rohit), extending Data Commons with contextualisation features (Prem), formalising the data-steward role and commercial licensing policy (NSO/Rohit), developing the answer-stability benchmark (Shalini), and promoting the data-boarding-pass and GIVE mechanisms to catalyse a formal data market (Shalini). The discussion closed with an invitation to visit the exhibition booth for a live demonstration and a reminder that building AI-ready data infrastructures is a long-term journey that must begin now to avoid future “holes in the rails” [408-410][401-406].


Session transcriptComplete transcript of the session
Shalini Kapoor

Deep work on working on fragmented data silos. As you all know that AI, it thrives on data. And today, most of the LLMs, what they have done is, they’ve definitely scraped internet and they’re doing really well. But the value of the work or what an answer an LLM would give is present based on what it can fetch from the actual data, which means in enterprises and organizations, there’s a wealth of information. There’s a wealth of information stuck in PDFs, stuck in documents, which people have a fear of not giving it to AI. So there is a fear, there’s lack of trust today, and that data… data, it stays where it is, like digitized. So, for example, there is there could be an entrepreneur, say in Nagpur, wanting to know about the scheme applied for the biotechnology plant that she wants to put up in Nagpur.

Now, if you see the MSME industry has a scheme for her, for women, for biotechnology. And, you know, it’s very good subsidy that that’s available. But where is it stuck? It’s stuck in a government notification which came out, which she’s not aware of. And what she is doing is she’s actually going to LLMS and asking that question and she’s not getting it. She’s also searching it on various places. She doesn’t get it. So that’s the divide, the information divide, which is existing. And the information which is there has which which is there stuck in documents or in even digitized form. That has to be AI ready. so that in a safe, trusted, and these two are very important, safe and trusted manner, the data can be linked, made useful and then made available.

Now, this is a long journey. This is a long journey. It’s not an easy journey because the data journey is about how you clean the data, you make it ready, you link it, you make it relevant, you make it useful and then present it in a manner so that the choice, and you want to have a choice of various elements of, you know, I mean, we live in the age of choice, right? We don’t want to be locked into anything particular. So that’s the data problem that we have in front of us. The opportunity is humongous because there is, I’ll give you an example, 3 ,000, I’m talking to an organization which does 3 ,000. Thank you. entities and the 3 ,000 entities actually manage 5 million new compliances in a year.

They have those kind of queries, 5 million queries on new compliances. Forget existing compliances because there are new compliances which get generated by the government, by various bodies and then they have to search. So the problem is humongous and it can be bridged. It can be bridged but we have to think about how to make data interoperable useful and AI ready. So with that background, I’d like to get into our panel and talk to some of the experts that we have today. My first question is to Rohitji who is from Mosby. So India generates a vast amount of statistical and administrative data. Mosby actually for all of you, it calculates a GDP for India. they have the source of all the data at village and taluka level so the data is there but as you think about making data AI ready what do you think is the responsibility of institution how and yours is an institution to make the data trusted safe and available to all

Rohit Bardawaj

thank you Salini ji good morning everyone so trusted safe and ready for everyone AI ready my I like all of you to take a step back on this and just let us understand do you have uniform definition of what is AI readiness at this point in time do we have and I’ll not say that it’s not there in the ecosystem it’s there in the ecosystem that but do we have an agreement about it so there are two issues we need to understand when we talk about AI readiness of data. One is that, so let me just go back to today’s conversation I had with one of my colleagues over WhatsApp group, you know, we all are very active there.

So one of my papers has just been accepted in one of the largest conference and it’s about AI readiness of data and he asked me what’s so great about it. So I asked why, what is not so great about it? So he asked me, I put Bangla into ChatGPT and it completely understands. So what’s new you are doing? So the point I’m trying to make is people are not aware what it takes to make data AI ready. We all understand and then he asked me that no, but it’s not understanding and he talked about some of the dialect of this country and we have a huge number of dialects and Salneji, I asked him and he asked me that how do I train ChatGPT on this dialect?

I said, it’s not my job, it’s Sam Holtzman’s job. So the issue here. is that we don’t know. And that is the biggest responsibility of our institutions like MOSB to make people aware what AI readiness is all about. And then AI readiness means if I start, you know, talking about there should be a context file, there should be semanticity, there should be metadata, but many of us sorry about that, many of us it looks it would not make sense. So the first idea is to create a framework agreed framework, say people not only me, it’s not about my way or highway, me all of us work together create that framework, put it up for people to know.

I would do, the first thing I would do and I plan to do it literally is try to create a slide deck saying what AI can see and what human can see. So my folder, if it has 10 versions of budget 1, 2, 3, 4, 5, 6 and if I ask a question from that folder budget some answer will come from budget one some answer will come from budget two because unlike human where i am focused on this question ai is designed to take scan the entire thing available so it’s a big difference between human and ai i can be focused ai when once i give a thing to ai it will just scan everything it has in its domain so i would say starting point and just you know not taking much of a time uh starting point should be that let us create this framework let us have a shared understanding let us have a core ai readiness part and an aspirational ai readiness part and work on

Shalini Kapoor

yeah i think that’s very relevant because you cannot leapfrog into everything you have to be like i mean you can have aspirational but the foundation is very very important and and everybody joining that foundation that that that foundation exercise is is really important um i’ll go to you preb uh and talk about let’s talk about data Data Commons aims to make public data more accessible and usable. You’re from Google, and you have put all this in open source. You’ve been working on US Census data being available. Tell us some more about your experiments and how Data Commons is kind of ready or prepared to work on this challenge.

Prem Ramaswami

Thank you for having me here on this panel today. I think one of the areas I’ll start with is the importance of coming to that understanding on AI -ready data, but understanding that the field itself is moving quite quickly at the same time. So whatever agreements we come to today in six months, it feels like we’re dealing with a brand new technological landscape that we’re staring down. What Data Commons tried to do was say that if we can get… If we can get our data in that machine -readable format, which means… structured, which means machine -readable metadata also, and a format where that format specification is not stuck behind a 500 -page PDF, right? Can we make that in a way that the machine can understand it, interpret it, and then use it?

Our theory behind this is that idea of a knowledge graph from that data combined with the large language model gives you a much better chance of success to answer your question. So at Data Commons, what we try to do is we try to bring multiple data sets globally together in a common knowledge graph and then put an AI search engine on top of it so that you can quickly access that data. You can play with this yourself at datacommons .org. But what we did is we open -sourced the entire stack because this idea that that data is centralized with one source is also the dangerous part, and it shouldn’t be, right? The data should be federated.

It should be located at every organization and governed locally by the organizations that are… using it. And so one of the things we’ve done by open sourcing that stack is allowed, for example, the United Nations, the United Nations Statistical Department to use data commons as their back end. And so, you know, UNSDGs, WHO data, ILO data, so on and so forth, is all stored in this common interoperable database now, where instead of a data analyst spending 80 % of their time renaming column headers, they can actually focus on the data analysis so that we can get the impact and the outcomes we want to see. Hope that helped answer the question.

Shalini Kapoor

Yes, yes, no, absolutely. I’ll poke you a little bit more to understand on data commons, what’s a vision you have?

Prem Ramaswami

So a very simple vision, right, which is make data aware decision making the easy answer to take. Today, right now, the majority of the world is flying blind, whether you’re one of those 74 million MSMEs in India. you can’t afford a bevy of computer scientists and data scientists that you can hire you pay a tax to play with any data if you’re a policymaker thinking about climate change poverty education health these are holistic problems it’s no longer i can go to one ministry pull one spreadsheet and solve poverty i need to endemically understand how does education how does health outcomes how does income and economy how do all of these affect poverty locally right and that’s the problem we have today that the world is a multi -dimensional problem the other problem is our brains are not inherently multi -dimensional our brains are great in three dimensions you add a fourth dimension which is time and we’re okay right like look at climate change you add time and it’s greater than our lifetime we can’t think about it which is why we’re not solving it right but the majority of problems are 50 60 dimensional problems machines are really good at this but by the way.

And humans are good at using tools that are good at doing things we’re not. And this is where we have to approach AI as a tool we can use. Not as the answer, but as a tool we can use to derive the answer, to supplement our brains in the areas we’re

Shalini Kapoor

I’ll poke you a little bit more, but later on.

Rohit Bardawaj

Saniji, I just want to take a second stab on that. And just a quick interjection on that. I’m a statistician. So I’ll be very happy if some of my work can be done by AI, you know, all those lab language models. I just read a paper today in the morning. It’s been written by two undergraduates from a Canadian university. And they said, and they proved it, that if you give same prompt to AI with the same data set, it gives you two types of analysis. So this is something I just wanted to flag. That we should not be really gung -ho about things, which is still untested. But yes, I would be the first to accept adopt an AI and use it for my work, but it needs to be, as you rightly put it, trustworthy.

Shalini Kapoor

Yeah, I just comment on this, the stability of an answer, that’s what you’re talking about. We are actually working to create a benchmark onto this because the same thing we are doing, like Amul AI was launched today in the morning, I mean, by the Prime Minister, and the same thing applies to Bharat Vistar, and we are actually working to see that the same question if you ask, multiple times across LLMs, and also to one LLM many times by different farmers, both options, you get different answers. And that, can we make it as a benchmark? That’s what we are working at also because this is a benchmark which is needed really on the ground, right? So that’s a part, so I wanted to comment.

I’ll go to Mr. Ashish. You’re from the industry, and you work with IIIT, . Bangalore. Tell us more about the research in the data area, plus how institutions can help build it all together.

Ashish Srivastava

Right. So I think my perspective is more as a practitioner because the last almost three decades I’ve been a solution builder. So I have seen data not from the data side, but from the solution side, trying to exploit it, trying to use it for the solutions. And I’ll come to the institution part of it. But, you know, when I look at the data and the challenges which are there associated with it, then for the last 10, 12 years, I’ve been for AI, for social problems or digital, like women in child health. I worked for almost a decade. Now, one of the problems that I realized is that the world is fast moving where you don’t manage a transaction.

You manage a journey. OK, and that is the agentic AI and all those things that we are talking about. Now, when I was working a few years back on the women and child data. I realize how fragmented it is the two main data sets if you look at a child’s health his anthropometric data, his nutrition data is with women and child development through their Anganwadi program if you look at the birth data, the humanization data and a lot of other data it is with the health and family welfare department and if you have to have an integrated decision making across for that child what needs to be done and then you have to look at both the data but that burden of orchestration comes on to the person who is solution making the data does not by itself flows through the workflow and that is one of my biggest problem that we have to solve that we look at data sets in isolation but we don’t look at how it flows through the process the second thing which I said the contextualization we all have read the book at least some of them that raw data is an oxymoron data always resides in a particular context and with some standardization associated with it so that you can make some sense out of it.

Now with education, when we are working recently, we realized that LLMs are becoming increasingly good, at least with the main languages, not with all the dialects, but in good translation. The moment they hit any domain -specific vocabulary, that’s when they start failing. Even the class 6th physics question, all these frontier models, is not able to properly translate. So we came up with a solution of using a glossary combined with the LLM so that it does a decent job in terms of overall translation. The user is transparent to contextualization. And the third thing which I faced a lot is that when we talk of public data, a lot of it is declared data and not verified.

Not verifiable data. Especially when a lot of planning depends on surveys. and lot of survey data is actually declared data whether you have a hypertension or not yes, no, whether you have this problem yes, no, what is the verification no doctor has actually verified that and you are going to make a decision based on that so in my opinion the AI ready data has to solve these three big problems it has to be interoperable it has to be contextual and it should actually the third problem that I was saying that you know verifiable, it should be verifiable and governable as an extension of that

Shalini Kapoor

very relevant I think you have posed the right challenge so Prem I am going to come to you right what is how let’s just pick one of them which is contextualization because I am increasingly seeing that domain information is needed and people are creating these glossaries to add like even in Agri when we had to roll out like we are going to do like we are going to do Mahavista, we actually created glossary of 5000 terms which is it is in Marathi so it has to be in Marathi and those terms being used and I know we did some experiments and we have created a sandbox environment you have done it for India so why don’t you explain that how contextualization and domain can be added to Google Data Commons and how it can be helpful

Prem Ramaswami

I think this idea of contextualization and localization is very important at the end of the day these are large language models, language being the key word there they’re not data models and so to what Mr. Bhardwaj said earlier what you want to be able to do is use them to write code to manipulate data because code is language but you don’t necessarily want them to be producing data on their own and one of those problems that you have today is also those large language models are essentially created largely off the web which has its own biases inherent in it, both language and locality -wise. And then on top of that, the example you used on the full folder of all the budgets, right?

The example I like to use for this is actually if you ask a large language model about a celebrity that recently had a breakup, they’ll tell you they’re together because it doesn’t know what just happened over the last month, right? It’s very sad. And so this is where you can use, though, the combination of, you know, you called it a glossary, I always call it a knowledge graph. What is that factual basis of information that I can put together? Now, it’s always going to be a subset of the whole, right? I might be able to cover maybe 0 .1 % of the world’s information with a knowledge graph. But if I can ground it in those facts, can I then utilize the intelligence of the large model to then help me produce some knowledge from those facts or fill in the gaps in those facts?

And so this, I think, is an opportunity that we actually have in the technology to move it forward. This is one of the areas that we’re actively working on as a team. But again, to do that, you first need that glossary of facts, right? This is where having that knowledge graph of statistical data, even if imperfect at this moment, because it is survey collected. It is dependent on the quality of the question asked, the error bar shown, the quality of that metadata, so on and so forth. But it is a starting point from which you can get more information and use that intelligence to potentially even find those outliers or areas that don’t match what you might be hearing on the ground.

So that’s the opportunity I think that we have.

Ashish Srivastava

Because I absolutely agree with you, but I will say it in more direct terms. Because sometimes we feel that LLMs or in previous version, the AI models are the solution. They are not the solution. They are only one of the inputs to the solution. And they comprise 10%, 15 % of what you’re trying to do. It is what is the rest of 85%. is doing yes llm will give different answer how are you compensating with guardrails human in the loop risk assessment these are the tools which are available today so i if you have to build because at the end of it it’s a probabilistic model okay come what may and i was talking to a mathematician from mit and he explained why it will never become perfect why it is it is grounded that fact is grounded in mathematics that it is it cannot ever become as perfect that every time consistent that we are wanting it to be ever because then you are taking the main source of its creativity away from it so what you have to focus is outside not inside that that’s all i ever wanted to say

Prem Ramaswami

if i agree with you completely and i started by saying it’s a tool right and we use tools to supplant ourselves not to replace ourselves right to supplement our knowledge not to replace our knowledge so i do agree with you it’s a tool but we have to be careful throwing the baby out with the baby and we have to be careful with the baby and we have to be careful with the baby and we have to be careful with the baby and we have to be careful with the bathwater here in the sense that That tool now makes things available to the average person. It upskills the average person in a way that they couldn’t themselves before.

So if we immediately go to put guardrails, prevent access, things like that, we’re preventing a large part of society. And I’ll say as somebody who worked on Google Search for many years, there were many arguments in Google Search that we, for example, shouldn’t put health information on search. Because the average person isn’t smart enough to be able to deduce information about their own health from Google. But the average person can’t afford a doctor also, right? There are endemic problems in society that prevent you from doing that. So does the answer to that question suffer, or does the answer to that question do less harm and give people a pathway that they can learn from? And so that’s an important question to ask ourselves here as we think about AI, which is, yes, it is imperfect at this moment.

Can we understand? Can we educate? Can we work inside the system that exists? we can’t ignore it either. We can’t say it made one mistake, therefore I will not use it. And I will also call out the imperfection of us as humans is also very much there, right? So there are many times we look at these systems and, you know, we look at, you know, a way more autonomous vehicle and we said, look, it had six accidents last year. The 30 ,000 deaths from car accidents in the U .S. a year, right? And so statistically speaking, this is still much safer, right? And so these are the sorts of examples that we have to look at, understand where to apply it, how to apply it, and what the overall societal good is from using it.

Shalini Kapoor

Yeah. No, thanks. I think a very relevant discussion that we are having, and there’s always a fight between should we have RAG architecture or should we just, you know, teach, give all to LLM to do it because it has more capacity and more, you know, GPU. But either or is not possible. There’s like, it’s like so much about the world. It’s like, you know, it’s like, you know, it’s like, you know, it’s like, you don’t want to give you maybe want to keep the data and the sovereignty comes in. it a lot. And this has been a discussion in the last two days. Most of the panels that I have been that you want to keep your data.

Countries want to keep the data with themselves and they actually don’t want to train because choice of LLMs is like you want a lot of choice and you want to use here, there, everywhere. So I’ll come back to you Rohitji and see we talked about administrative data and you talked about a framework. So my question is that how do you think alternate data, secondary data beyond administrative data, how can that be also brought in and your framework which you talked about that there should be a foundational framework if that framework is adopted by industry. One, is it possible? And two, what kind of data economy it can start?

Rohit Bardawaj

So this is early morning. Let me take an audience poll on it. How many of you think that what Salini asked is a governance issue? Or is it a, I mean, just raise your hand if you feel it’s a governance issue. Anyone who feels it’s a governance issue? How many of you feel it’s a technological issue? What she asked. How to make alternative data ready for AI. That’s what the question was. So how many of you feel it’s a technology? There’s no prizes for it. There’s no punishment for it. So feel free to raise your hand the way you think. It’s a technology. So, okay. So I am with that gentleman. I feel it’s a governance issue.

And I’ll also work on it. So what are we talking about? We are talking about data generated from different sources, be it alternative data sources, be it like administrative data sources. The panelist with my co -panelist just talked about getting data from different sources not aligned to each other. So it’s a governance issue which we need to understand first. We need to create. And, of course, I completely agree with Salini when she said that we need a federated model. Perhaps Prem said that. We need a federated model. There cannot be one whole sole owner for a data. of this country or for that matter for any country what as as somebody needs to play the role of data steward somebody needs to orchestrate this data ecosystem and that perhaps being from nso i have my own biases i’ll say nso can do it but of course that’s something for the people to decide now let’s understand this what do we need when we need ai ready data we need first a cataloging of it i’m just going to take one minute on it cataloging of it you should have everything catalog any industry any government organization this is my data set these are the indicators these are the definitions and so on and so forth i’m not getting that deep into it you need a catalog of your data and if that’s not there second thing is that catalog should not be pdf that catalog should be as she was saying machine readable json file probably you need a catalog of your data and if that’s not there you need a catalog of your data and many other ways are there but let’s talk about you JSON file.

Second point, you should have metadata for it. If you don’t have metadata for it, I mean other day I was with another panel with Prem, I said the thing which irritates me the most is lack of metadata. I don’t know. I’ve been driving in blind. I don’t know what the word frequency means. It may mean hundreds of things. So you should have metadata and again not in PDF. So when I’m, whatever I’m talking about is, I’m not, I mean JSON or XML, there are so many ways, but machine readable. Let’s put it that way. Third is, you should have a context file. So now machine has read it. Now but it wants to know that where do I find the meaning of frequency?

So machine should have a context file where the source is written. You go there and see. You will find the meaning of frequency. So metadata will not have frequency, meaning of frequency. It will only write frequency means quarterly. So machine now needs to understand what does that frequency means. So that’s what she was talking about and Tim again was talking about. We need to have a, that makes us, bring us to the, we need to have a business glossary. We need to have a business glossary. He also talked about a knowledge graph. I mean, just a sophisticated version of business glossary. That we need to have. So once we have sorted this out, we need to work, what type of codes are we working?

So the gentleman just beside me just talked about two data sources using different codes for different thing. I mean, same thing. So then we have to standardize that codes. And then lastly, we have to structure our data. Data needs to go in a structured database. It should be defined and that’s not new I’m talking about. It should be defined by dimensions. It should be defined by attributes. It should be defined by its role. So time means temporal. You can’t write time and expect LLM to understand what does time mean. You have to say time means temporal. And once you have these ready, these available in a, so there are two use cases. And just last, last quick of the One is that am I using it for my own use case?

Am I training my own model for it? Then I can put all these in one file and feed it to my model. But if I’m expected to create a MCP for my database, then I have to create separate files, put it up in a URI or URL where any model can go, the connector can direct it to that model, that place, that resource place, and then the things happen. And this is all I’m talking from my personal experience when we, and Salneji knows about it, when we developed our own MCP server

Shalini Kapoor

loving it the amount of reach out which has happened to use the data data sets for you know you can actually find out you can ask a question of what’s been the price of how has the price of moong dal been in the last whole year or whole quarters or month wise so that that capability is there now and it has happened because it was always there they do the calculation of the wholesale price index the commodity price index so that in from the data was there it’s just that now it is ai ready for people to consume take and and then ask and it is connected to claude and chat jpt ashish i’ll go to i’ll go to you uh for because building on what uh roheji stopped at which is the use cases and you come from the solution part of it uh how do you visualize and imagine solutions and use cases and how do you visualize and imagine solutions and use cases you combining say administrative data and alternate data I’m not going into personal data because there’s a lot of consent there but at least a lot of secondary sources of data which is available and how do we combine and make it more powerful

Ashish Srivastava

I think as you rightly pointed out I come from the solution perspective and a solution now with agentic AI coming in we look at every solution in form of a journey. We are going past the mechanism of point solution that you ask it reverts back to the answer and now the use case has to decide at which part of the journey what data is that you need and that will dictate whether it is additional data sets which are outside or it is a public data set it will be due. The only challenge which I see here is the who is accountable for that data Thank you. Thank you. in the solution at the API level, at the policy engine level, which are actually going along with the solution, and it should happen, it should be enforceable automatically.

If you are thinking that a human being will actually enforce that policy, it will break. It will break in no time. So that is what we are trying to do, is to create those reusable artifacts as DPIs or DPGs, it will fall into one of those categories. But where it allows those policies to be set for a data set in an easy reusable way so that everybody doesn’t have to recreate from scratch those kind of policies, and then that’s the way to move forward.

Shalini Kapoor

You mentioned your lab. I’m sorry, I just spoke you into that. Tell us more about your lab. What more work they are doing?

Ashish Srivastava

So that’s my current job. Previously I was heading a Gen AI company, by the way, and I will talk separately later on PDF challenge, which we thought we had solved it. We didn’t fully, but we were on the way. But the current lab, which is very exciting, it’s a collaboration between Microsoft and IIIT Bangalore. A4I stands for AI Innovation for Inclusion Initiative. That means we create large scale. The idea is not here to run pilots that we do this small thing here, we diagnose, not that. It should be population scale and we want to launch it as a DPG so that it can be largely. So we are working on education, school education area.

We are working with teachers in terms of making their life easy. We are working in terms of accessibility. How blind children can actually be taught STEMs so that they can actually become a mathematician. They can hope to become a physicist, mathematician. Today it’s very difficult. How to even read a book? And the third one we are doing is working with the last mile health workers. Our current solution is a rack based AI combination, but we are looking at exactly that problem that you mentioned that either it is this or that. I think there are plenty of answers which are in between. The. That was what we are exploring.

Shalini Kapoor

Thank you. Thank you so much. Prem, I’ll again build on the concept that we were discussing on the use cases, which can be. I mean, I just want you to paint a picture of if you have data in knowledge graphs, like what you mentioned, if the data is there and data commons is present. I just want you to visualize that what more use cases can be possible with secondary data. How can India benefit and not just India, Global South benefit from this? And please feel free to paint the use cases which you have built in the sandbox environment that you have. You can just take those examples.

Prem Ramaswami

Yeah, I’ll give two very. These might not be exactly where the sandbox is today, but where it could go tomorrow. Right. And so I’ll give two very different examples here. One is. At the end of the day, the Ministry of Statistics does a lovely job collecting as much information as they can. The whole ministry does. The government does. it’s a top -down data collection.

Shalini Kapoor

I’m sorry, I’ll just interrupt you. I think Rohitji will say it’s not top -down. It’s actually at the field level, it’s bottom -up.

Prem Ramaswami

That’s fair, that’s fair.

Shalini Kapoor

He will say that, it’s bottom -up.

Prem Ramaswami

That’s fair, that’s fair. You’re correct, it’s bottom -up. That said, we have alternate data sources also that are there. Sometimes they supplement and they further show, yes, the data collected is correct. At times they disagree. And those disagreements are also interesting to understand to the point of where is the survey question flawed or where is the civil society seeing something or has visibility into something that we don’t have access to. And so the more of these data sets that come together, these points of friction, again, this is where the human intelligence comes in. Show me the points of friction. I have a haystack full of needles. Which needles do I pay attention to? Right? So this is one example if I’m at the government or the statistics, you know, ministry of statistics level.

Now let’s go to the completely opposite end. I’m a small business owner. I’m setting up a physical shop. Where should I set it up? Right? Where I set it up depends on mobility traffic, depends on the demographics and affordability in that space, depends on all types of things. Right? It’s a large data question. But that MSME owner is often ill -equipped to answer any of those questions, is often taking a shot in the dark. And that shot in the dark is a costly shot in the dark if they’re wrong. Right? Because they are taking the full risk of that decision. Now with the data commons that we’re building, the question becomes can we reduce that risk for that individual?

Can we help them model, understand, de -risk the decision they’re making? And that’s what we’re doing. And that’s what we’re doing. based on the audience they want, based on the footfalls they want, based on the location that they’re choosing. That’s a very specific example now. But these are two very opposite examples of how bringing all of this data together, which we often think about as more aligned towards, you know, the international organizations or the government minister, but is actually usable on the ground by an individual too.

Speaker 1

Tell us a bit more about, like if suppose someone wants to put up a Data Commons instance, how can they get started?

Prem Ramaswami

It’s actually quite simple. It’s easy enough that I can do it myself, which means you can. But it’s datacommons .org is an open source platform. We have a 20 -minute guide to get started. You can set the whole thing up on your computer, have your CSV data set, bring it in. And the thing is, once you bring one data set in, it overlays with all the data sets already in Data Commons. This creates sort of a network effect between the two. To the data, right? So once I bring in, you know, if I am a chain store in India trying to figure out that next store location, if I bring in all my per store sales revenue data once, then suddenly I can compare that to the 50 ,000 data sets and overlay them that are already in data comments.

Before, if I wanted to do this as a chain store in India, I would normally have my people come up with maybe 10, 12 different hypotheses. Because then I have to get those 10, 12 different data sets and I have to form 13 different data transforms, right? So they’re all in the same format. That prevents us from being able to have that level of creativity we want where we can look across the entire landscape of the problem set. And so this is sort of one of the things.

Rohit Bardawaj

Right answer. And it was a matter of trust for NSO also. That, you know, people are getting different answers for the data which is created by NSO. That makes sense. It makes us look toward MCP server. A, it is open. so it makes our data interoperable for all the almost all the AI system. I am not saying all the AI system. Otherwise what would happen? Be aware that every LLMs have their own standards of API. So you create those APIs first and then you somehow manage the LLM to approach that API. With this connector, it’s like C socket for the phone charger, if I may use the parallel, where you can just plug in any C socket you can use for anything.

That’s what the MCP is. So data comes and the LLM comes and plugs into MCP. And it allows any LLM to come. But what you have to do now, that you have to connect that small tool with that LLM. So that’s a one minute job and it’s available on our website. You go there, www .mosfet .gov .in and the offering section, everything is available. You can do it in one minute. One minute, maybe two minutes at the most. Anyone. But still there is one challenge, which I must tell you, is that somehow need to ensure that this becomes a default tool. The user does not have to add it. Somebody says somebody forgets it. Then the same situation starts happening again.

So right now people have to add it to their tool. But the biggest advantage I see is that people don’t have to come out of their workflow. So if I have taken a very costly pro cloud, then I don’t have to come out of it. Go to my portal to get the data analysis. I can keep using the intelligence of cloud or chat GPT. I don’t have a preference for that. With the verified data, as he talked about, verified data of Mospy. And the use cases are innumerous on the web now. I mean people have just lapped it up. My favorite is that there is a Tamil song which talks about a lot of grains.

So one of the messages I got, and I’ll share the link also. It’s on Twitter also. I mean X now. That somebody created a CPI for all the grains which was talked about in that CPI is consumer price index which basically talks about inflation uh which talks about all the grains and they just took the grain out of the uh song that you know now wheat and and created CPI index for it and they have named it like p index or something which is like songs name so I’m not very conversant pardon me for that in Tamil but I’ll share that link so that’s my favorite use case so people so what I mean to say that people can use the data the way they like it that’s the that’s the bottom line and that’s that’s what the NSO’s idea

Shalini Kapoor

is most interesting use case I would have I would have seen and I really want to see uh it and and say yeah yeah I’ll have a look at it so one more thing which I want to tell uh the audiences that uh the work uh see several like the use case uh Rohitji mentioned about that someone can just pick the data uh so we have created a data and we have created a data and we have created a data and we have created a data and we have created a data and we have created a data and we have created a data and we have created a data and we have created a data and we have created a concept called as data boarding pass concept called as data boarding pass concept called as data boarding pass so this is a data boarding pass so this is a data boarding pass This is like for AI ready India.

This is a physical copy, but actually the concept is that once your data is ready and it is it has a set of checklists. Which it passes, then as a B2B player, you could be a policymaker, you could be a researcher, you could be a market player wanting to build on top of it. You can take this, you know, this concept of data boarding pass and get onboarded onto the date or for the data usage so that you can pick the data and then start using it in your applications. So data boarding pass is, say, at a district level, you have and I’m just painting a scenario. You have a data commons where graph knowledge graph and data have been all combined together, created all together, right context and everything.

And some organization. Now wants to know, say, the automobile. MSME manufacturer wants to access it and give information to dealers as to where scooters are being sold, where motorcycles are being sold and what’s been the income of of that region over a period of time that that can be possible now. Right. So the data boarding pass enables it, makes it possible. And if you want to physically see it, how this exactly works, visit our booth at a step foundation on Hall three in on the first floor. Do visit that. And my team would be there to show you the actual generation of the data I think we have given a lot of things. I want to just, you know, we have less time, but I want to take a couple of questions from the audience.

So feel free to ask. We have four minutes so we can have like two, three questions from the audience. audience. I saw that first, sorry, and then I saw you. So next to you. Yeah, please go ahead. Can someone give him a mic, please? Otherwise, I’ll hand mine.

Audience

Thank you very much. I wanted to ask you about the business models of these platforms because it is obviously extremely important to have high -quality data, but high -quality data is also expensive to collect, to maintain in the time. So did you work, besides, on how you can maintain these kind of platforms during the time? Does it have to be, I don’t know, publicly paid or whatever models you may have? And it’s also for everybody, I think.

Shalini Kapoor

Go ahead, then I’ll also add. Then I’ll also add.

Rohit Bardawaj

So, Jasmo, I just have a quick clarification on that. And National Statistics Office India is fully funded by the Government of India. It’s a… I mean, as we all know, National Statistics Office India is fully funded by the Government statistics office over all over are like public funded through public money. So it’s our job to create data and make it available to the public. At the same time, just one quick disclaimer on that, that open data is not free data. So somebody has paid for it. So when depending on the use, we provide the data. So if the use is research and things like those, I’m not getting into details of it, then it’s free.

But if the resource, you know, the use is commercial, then, of course, there is a system. There is a policy for it. And people have to pay accordingly.

Shalini Kapoor

Yeah. So I’ll also answer it because we have done a good amount of work. I would encourage you to see a paper that I’ve put up on our People Plus AI website, which talks about the give data, give model for data. G is guaranteed trust. And we talked about it. I is incentive. Incentive. Why should I bring the data? What will I get it get from it? The V is the value. If the data has no value, nobody is interested. And E is exchangeability. right which is can i share the data so i’ll focus on the i the incentive there has to be an incentive for someone to bring the data and there has to be an incentive for someone to use the data and that value will be monetized that is the data economy if you ask me this data economy is actually running without a formal mechanism there’s good amount of money people in selling data buying data lead generation i mean there’s huge amount of things which are happening this formalizes that so they will be but what will be the price that the economy the data economy i mean that has to stabilize that has to happen at region level with private sectors so we have been working in that direction so that the incentive model is clear but the actual price is is a discovery mechanism

Audience

and it’s very uh very interesting to hear all this that’s amazing one of the very key scenario that we see every day and we get little bit trouble is we see a road making getting made and stuck after few days I mean yeah it might not feel good but that’s how it is because it somewhere it feels like a disconnection in the data or somewhere decision in the policy making so do we have some way to kind of get this kind of pieces applied in those like know whatever the tender ecosystem or whatever that like you know you have a road made and then a duck for a pipeline after a very short window

Shalini Kapoor

yeah maybe I’ll answer it see if you see India has put the whole digital public infrastructure in place these are the DPI thinking whether UPI Aadhaar DigiLocker DigiYatra they were about digital rails which were put together this data infrastructure that we talked about today is going to be that rails is it going to be dug up maybe maybe maybe no problem Promises, right? Is it going to be dug up? Are there going to be holes in that? Maybe. But I think it’s a journey that if we don’t do it and don’t start it now, it’s going to hit us later on. So no promises, but yes. Rohit, do you have to add anything on that?

Rohit Bardawaj

I just wanted to add that we need to keep working on these data sharing platforms and all the philosophies we just talked about, like accessibility, sharing, analysis, use of AI, and things will improve slowly but steadily, I’m very sure about it.

Shalini Kapoor

Time is up, and the next session is going to start. So thank you so much for listening in to the AI -ready data, and please visit the booth to see it actually in action. Thank you. Bye. Bye. Bye. Bye. Thank you. Thank you. you you you you Thank you. Thank you. Thank you.

Related ResourcesKnowledge base sources related to the discussion topics (31)
Factual NotesClaims verified against the Diplo knowledge base (3)
Confirmedhigh

“Enterprises and governments hold vast quantities of information in fragmented PDFs, legacy systems and isolated silos.”

The knowledge base explicitly notes that valuable information remains trapped in PDFs, documents and isolated systems across enterprises and government organizations, confirming the claim.

Confirmedhigh

“The ‘information divide’ prevents entrepreneurs and citizens from accessing relevant data such as government notifications.”

The source describes an information divide where entrepreneurs and citizens cannot access relevant data, corroborating the statement.

Additional Contextmedium

“A lack of trust in sharing data with AI systems compounds the information divide.”

The knowledge base highlights the need for a trust infrastructure so users feel comfortable with AI outputs, adding nuance to the claim about trust issues.

External Sources (113)
S1
Safe and Responsible AI at Scale Practical Pathways — Ashish Srivastava brought a practitioner’s perspective, highlighting three critical challenges: data interoperability ac…
S2
Safe and Responsible AI at Scale Practical Pathways — – Ashish Srivastava- Prem Ramaswami
S3
https://dig.watch/event/india-ai-impact-summit-2026/building-scalable-ai-through-global-south-partnerships — Thank you, Sunil. Are we I think we have a change of plans. Thank you so much. And Sunil, if you could please stay on st…
S4
Building Scalable AI Through Global South Partnerships — – Sunil Wadhwani- Shalini Kapoor
S5
Safe and Responsible AI at Scale Practical Pathways — – Shalini Kapoor- Ashish Srivastava
S6
Safe and Responsible AI at Scale Practical Pathways — – Rohit Bardawaj- Audience – Rohit Bardawaj- Prem Ramaswami
S7
Keynote-Martin Schroeter — -Speaker 1: Role/Title: Not specified, Area of expertise: Not specified (appears to be an event moderator or host introd…
S8
Responsible AI for Children Safe Playful and Empowering Learning — -Speaker 1: Role/title not specified – appears to be a student or child participant in educational videos/demonstrations…
S9
Building Trusted AI at Scale Cities Startups & Digital Sovereignty – Keynote Vijay Shekar Sharma Paytm — -Speaker 1: Role/Title: Not mentioned, Area of expertise: Not mentioned (appears to be an event host or moderator introd…
S10
WS #280 the DNS Trust Horizon Safeguarding Digital Identity — – **Audience** – Individual from Senegal named Yuv (role/title not specified)
S11
Building the Workforce_ AI for Viksit Bharat 2047 — -Audience- Role/Title: Professor Charu from Indian Institute of Public Administration (one identified audience member), …
S12
Nri Collaborative Session Navigating Global Cyber Threats Via Local Practices — – **Audience** – Dr. Nazar (specific role/title not clearly mentioned)
S13
Collaborative AI Network – Strengthening Skills Research and Innovation — This comment provides a systematic framework for thinking about data preparation for AI, moving beyond generic discussio…
S14
Scaling AI for Billions_ Building Digital Public Infrastructure — “Because trust is starting to become measurable, right, through provenance, through authenticity, as well as verificatio…
S15
https://dig.watch/event/india-ai-impact-summit-2026/building-public-interest-ai-catalytic-funding-for-equitable-compute-access — And here, India is not waiting for permission. India is not waiting for permission. India is showing that it can be done…
S16
HIGH LEVEL LEADERS SESSION I — Additionally, for people impacted by decisions made from collected data, trust in the institutions collecting the data n…
S17
https://dig.watch/event/india-ai-impact-summit-2026/regulating-open-data_-principles-challenges-and-opportunities — A sort of symbolic nod to open data. It can turn into an unguarded channel through which value, agency and even sovereig…
S18
https://dig.watch/event/india-ai-impact-summit-2026/safe-and-responsible-ai-at-scale-practical-pathways — And some organization. Now wants to know, say, the automobile. MSME manufacturer wants to access it and give information…
S19
From Innovation to Impact_ Bringing AI to the Public — Audience questions and Sharma’s responses highlight specific applications: agricultural models that can analyse visual d…
S20
https://dig.watch/event/india-ai-impact-summit-2026/from-india-to-the-global-south_-advancing-social-impact-with-ai — So good evening. My name is Ashish Pratap Singh. I am the CEO of Prasima AI. My father runs an MSME business in Lucknow….
S21
WS #55 Future of Governance in Africa — While technology is important for advancing governance, it must be accompanied by proper infrastructure and public aware…
S22
Nri Collaborative Session Data Governance for the Public Good Through Local Solutions to Global Challenges — Indigenous data sovereignty and Pacific context Legal and regulatory | Development | Infrastructure Nancy identifies d…
S23
Digital politics in 2017: Unsettled weather, stormy at times, with sunny spells — Policy silos are reducing the effectiveness of digital policy. As the issue of data governance (Trend 5) shows, it is di…
S24
https://dig.watch/event/india-ai-impact-summit-2026/how-small-ai-solutions-are-creating-big-social-change — When we build LLM, we benchmark them, we evaluate the performance on benchmarks. And we have seen, like, there are only …
S25
Building the Next Wave of AI_ Responsible Frameworks & Standards — “The second most important element in this framework is to ensure these safety benchmarks are co -created with the indus…
S26
UNSC meeting: Artificial intelligence, peace and security — Switzerland:Thank you, Madam President. We are grateful to the Secretary General, Antonio Guterres, for participating in…
S27
Main Session on Artificial Intelligence | IGF 2023 — Seth Center:IAEA is an imperfect analogy for the current technology and the situation we faced for multiple reasons. One…
S28
Multistakeholder Partnerships for Thriving AI Ecosystems — How to address data fragmentation and silos that exist even within individual enterprises
S29
How to make AI governance fit for purpose? — All speakers recognize that AI’s global nature requires international cooperation and coordination, though they may diff…
S30
Open Forum #27 Make Your AI Greener a Workshop on Sustainable AI Solutions — Adham Abouzied emphasized the need for comprehensive governance structures that encourage data and intellectual property…
S31
AI is here. Are countries ready, or not? | IGF 2023 Open Forum #131 — Audience:Thank you very much. My name is Auke Aukepals, and I work for KPMG in the responsible AI practice. And first of…
S32
WS #145 Revitalizing Trust: Harnessing AI for Responsible Governance — Pellerin Matis: I think government can really learn from the private sector because there is lots of technologies and …
S33
Open Forum #64 Local AI Policy Pathways for Sustainable Digital Economies — Given the strategic importance of data for both AI and the digital economy, a collaborative approach involving multiple …
S34
Strategy — – Make better decisions – AI can provide timely analytics and data-driven insights to make better decisions, for example…
S35
Democratizing AI Building Trustworthy Systems for Everyone — “Because if I had to point to anything that’s holding back AI today, it’s not capability, it’s reliability, right?”[62]….
S36
Who Watches the Watchers Building Trust in AI Governance — “and also the multi -turn nature of AI”[9]. “They can still be jailbroken with enough effort or in edge cases, and it’s …
S37
Connecting open code with policymakers to development | IGF 2023 WS #500 — Helani Galpaya:Okay, I mean I’ll go on the data part I think. Sort of the superficial answer is it’s actually very diffi…
S38
AI shows promise in supporting emergency medical decisions — Drexel University researchers studied howAI can aid emergency decisions in pediatric traumaat Children’s National Medica…
S39
Research shows AI complements, not replaces, human work — AI headlines often flip between hype and fear, but the truth is more nuanced. Much research is misrepresented, with task…
S40
How AI Drives Innovation and Economic Growth — Kremer argues that while there are forces that may widen gaps, AI has significant potential to narrow development dispar…
S41
Comprehensive Discussion Report: AI’s Existential Challenge to Human Identity and Society — This scenario encapsulates the broader dilemma facing humanity: when AI consistently provides superior performance, what…
S42
Indias AI Leap Policy to Practice with AIP2 — The speakers demonstrated strong consensus on fundamental prerequisites for AI diffusion: skills development, clear gove…
S43
From Technical Safety to Societal Impact Rethinking AI Governanc — Both speakers support government involvement but disagree on scope – Ioannidis wants to keep core technology development…
S44
AI outperforms humans in debate persuasiveness — AI can be morepersuasivethan humans in debates, especially when given access to personal information, a new study finds….
S45
Importance of Professional standards for AI development and testing — Despite coming from different perspectives, both speakers agree that ethics should be flexible and contextual rather tha…
S46
Global AI Policy Framework: International Cooperation and Historical Perspectives — Despite coming from different backgrounds (diplomatic/legal vs academic), both speakers advocate for patience and carefu…
S47
Driving Indias AI Future Growth Innovation and Impact — But there was also a lot of fear around AI about trust factors, about privacy, data, sovereignty, multiple issues about …
S48
Data first in the AI era — This provided a unifying framework for understanding all the various tensions discussed – between convenience and privac…
S49
The Foundation of AI Democratizing Compute Data Infrastructure — And they could be partly technological and partly policy -based or protocol -based. And a combination of this will ensur…
S50
Adoption of agentic AI slowed by data readiness and governance gaps — Agentic AI is emerging as a new stage ofenterprise automation, enabling systems to reason, plan, and act across workflow…
S51
WS #288 An AI Policy Research Roadmap for Evidence-Based AI Policy — Alex Moltzau: Yes, thank you so much. My name is Alex Maltzau. And I work as a second national expert in the European AI…
S52
Panel #1 : « La gouvernance du numérique au service de l’inclusion : enjeux, freins, et opportunités » — Au lieu que les États pensent avoir toutes les compétences pour résoudre les problèmes, il faut adopter une approche inv…
S53
https://dig.watch/event/india-ai-impact-summit-2026/safe-and-responsible-ai-at-scale-practical-pathways — Yeah, I’ll give two very. These might not be exactly where the sandbox is today, but where it could go tomorrow. Right. …
S54
Data Governance in the Context of Emerging Technologies: Promoting Human-Centred and Development-Oriented Societies   — In the context of this data-driven economy, the governance of this key asset should be tackled in a multilayered way. On…
S55
Data governance — The fact that free flow of data across national and corporate borders facilitates economic development and contributes t…
S56
Why science metters in global AI governance — Finally, let us be clear. Science informs, but humans decide. Our goal is to make human control a technical reality, not…
S57
Safe and Responsible AI at Scale Practical Pathways — “guardrails human in the loop risk assessment these are the tools which are available today …”[95]. “If we immediately…
S58
Can we test for trust? The verification challenge in AI — **Chris Painter** highlighted the need for standardization of frontier safety policies and dangerous capability evaluati…
S59
Democratizing AI Building Trustworthy Systems for Everyone — The participant points out that trustworthiness depends on system responsiveness, accessibility and reliability at the e…
S60
Elections and the Internet: free, fair and open? | IGF 2023 Town Hall #39 — Data needed for policy making needs to reflect their specific local contexts
S61
Overcoming the fragmentation of the digital governance: what role for the Global Digital Compact and e-trade rules? (South Centre) — The analysis explores ongoing negotiations surrounding global digital governance and highlights the need for increased e…
S62
Closing remarks – Charting the path forward — Al Mesmar emphasizes the importance of unified policy approaches that can adapt to technological changes while maintaini…
S63
Open Forum #14 Data Without Borders? Navigating Policy Impacts in Africa — Data fragmentation within countries hinders effective data integration and utilization for decision-making, which needs …
S64
Safe and Responsible AI at Scale Practical Pathways — -Data Fragmentation and Silos: The discussion highlighted how valuable information remains trapped in PDFs, documents, a…
S65
Nri Collaborative Session Data Governance for the Public Good Through Local Solutions to Global Challenges — Examples include delays in issuing birth certificates in Papua New Guinea due to lack of coordinated data systems, and F…
S66
AI is here. Are countries ready, or not? | IGF 2023 Open Forum #131 — Data is extremely siloed and still available in paper format in many situations
S67
AI and Digital in 2023: From a winter of excitement to an autumn of clarity — At thetechnical level, data needs standards in order to be interoperable. Here, the work of standardisation and technica…
S68
Collaborative AI Network – Strengthening Skills Research and Innovation — This comment provides a systematic framework for thinking about data preparation for AI, moving beyond generic discussio…
S69
From Innovation to Impact_ Bringing AI to the Public — Audience questions and Sharma’s responses highlight specific applications: agricultural models that can analyse visual d…
S71
WS #145 Revitalizing Trust: Harnessing AI for Responsible Governance — Pellerin Matis: I think government can really learn from the private sector because there is lots of technologies and …
S72
How Small AI Solutions Are Creating Big Social Change — The fourth one is safety. When we build LLMs, usually we do some safety alignments with reinforcement learning, but thes…
S73
Town Hall: How to Trust Technology — The discussion revolves around the topic of artificial intelligence (AI) and large language models (LLMs). One viewpoint…
S74
Connecting open code with policymakers to development | IGF 2023 WS #500 — Helani Galpaya:Okay, I mean I’ll go on the data part I think. Sort of the superficial answer is it’s actually very diffi…
S75
Open Forum #64 Local AI Policy Pathways for Sustainable Digital Economies — Develop marketplace mechanisms for incentivizing data contributors through revenue sharing models
S76
Defending the Cyber Frontlines / Davos 2025 — The discussion began with a serious, concerned tone as panelists outlined cyber threats and challenges. As the conversat…
S77
AI and Human Connection: Navigating Trust and Reality in a Fragmented World — The tone began optimistically with audience engagement but became increasingly concerned and urgent as panelists reveale…
S78
Day 0 Event #256 Truth Under Siege: Tools to Counter Digital Censorship — The discussion maintained a serious, concerned tone throughout, reflecting the gravity of the challenges being discussed…
S79
AI and Digital Developments Forecast for 2026 — The tone begins as analytical and educational but becomes increasingly cautionary and urgent throughout the conversation…
S80
Comprehensive Report: Cyber Fraud and Human Trafficking – A Global Crisis Requiring Multilateral Response — The tone began as deeply concerning and urgent, with speakers emphasizing the gravity and scale of the problem. However,…
S81
Revamping Decision-Making in Digital Governance and the WSIS Framework — The discussion maintained a constructive and collaborative tone throughout, with speakers building upon each other’s poi…
S82
Smart Regulation Rightsizing Governance for the AI Revolution — The discussion began with a notably realistic and somewhat pessimistic assessment of global cooperation challenges, but …
S83
Impact & the Role of AI How Artificial Intelligence Is Changing Everything — The discussion maintained a cautiously optimistic tone throughout, balancing enthusiasm for AI’s potential with realisti…
S84
Transforming Agriculture_ AI for Resilient and Inclusive Food Systems — The tone was consistently optimistic yet pragmatic throughout the conversation. Speakers maintained an encouraging outlo…
S85
Regional experiences on the governance of emerging technologies NRI Collaborative Session — The overall tone was collaborative and solution-oriented. Participants shared insights from their regions in a construct…
S86
Afternoon session — The discussion began with a collaborative and appreciative tone as various stakeholders shared their visions and commitm…
S87
Final plenary session and adoption of the interim report — The necessity to monitor red lines while finding agreement outside these lines was highlighted.
S88
Webinar session — The discussion maintained a diplomatic and constructive tone throughout, with participants demonstrating nuanced thinkin…
S89
Pathways to De-escalation — The overall tone was serious and somewhat cautious, reflecting the gravity of cybersecurity challenges. While the speake…
S90
Building the AI-Ready Future From Infrastructure to Skills — The tone was consistently optimistic and collaborative throughout, with speakers expressing excitement about AI’s potent…
S91
Using AI to tackle our planet’s most urgent problems — The tone is passionate and advocacy-driven throughout, with the speaker maintaining an urgent, morally-charged perspecti…
S92
Business Engagement Session: Sustainable Leadership in the Digital Age – Shaping the Future of Business — The discussion maintained a consistently collaborative and optimistic tone throughout. It began with academic framing bu…
S93
Closing remarks — The tone is consistently celebratory, optimistic, and forward-looking throughout the discussion. It maintains an enthusi…
S94
Upskilling for the AI era: Education’s next revolution — The tone is consistently optimistic, motivational, and action-oriented throughout. The speaker maintains an enthusiastic…
S95
Is the AI bubble about to burst? Five causes and five scenarios — Behind the diminishing returns are conceptual and logical limitations of Large Language Models (LLMs), which cannot be r…
S96
Steering the future of AI — 2. **Persistent memory**: Current LLMs cannot maintain long-term memory across interactions. 3. **Reasoning capabilitie…
S97
How AI Is Transforming Diplomacy and Conflict Management — He argues that relying solely on large language models is problematic because their fluency is not verifiable in interna…
S98
Large Language Models on the Web: Anticipating the challenge | IGF 2023 WS #217 — The analysis discussed various aspects of language models (LLMs) and artificial intelligence (AI). One key point raised …
S99
Building Inclusive Societies with AI — Arundhati Bhattacharya, Chairperson and CEO of Salesforce India, emphasized that India’s scale demands digital solutions…
S100
WSIS Action Line C7: E-Agriculture — Garba advocated for integrated policy frameworks and emphasised that private sector telecommunications providers require…
S101
WS #97 Interoperability of AI Governance: Scope and Mechanism — Yik Chan Chin: Thank you, Olga. So, I speak on behalf of the PNAI because I’m the co-leader of the subgroup on the inte…
S102
Skilling and Education in AI — “Five second response, I think the one action that we need to take is improve the trust infrastructure and make sure tha…
S103
Law, Tech, Humanity, and Trust — Samit D’Cunha: Thanks, Joelle. That’s a really fair and, I think, necessary question. Maybe I’ll actually answer this qu…
S104
Government notices · GoewermentskennisGewinGs — There have been concerns raised, however, about the efficacy of the current structures. These stem mainly from ICASA’s…
S105
Operationalizing data free flow with trust | IGF 2023 WS #197 — David Pendle:as we aim to build trust? Thanks Tamim. So I sit on Microsoft’s law enforcement national security team whic…
S106
Laying the foundations for AI governance — Xue explains that there is a shared uncertainty about future risks and problems, with both regulators and companies lack…
S107
Can (generative) AI be compatible with Data Protection? | IGF 2023 #24 — Armando José Manzueta-Peña:Well, thank you, Luca, for the presentation. I’m more than thrilled to be present here and to…
S108
Building Population-Scale Digital Public Infrastructure for AI — These key comments fundamentally shaped the discussion by progressively deepening the analysis from technical implementa…
S109
WS #214 AI Readiness in Africa in a Shifting Geopolitical Landscape — Mlindi Mashologu: As the country, South Africa, we assume the G20 presidency and I think it’s important to note our bann…
S110
Collaborative Innovation Ecosystem and Digital Transformation: Accelerating the Achievement of Global Sustainable Development Goals (SDGs) — A significant portion of the discussion focused on the need for cross-border collaboration and harmonized policy framewo…
S111
Accelerating Structural Transformation and Industrialization in Developing Countries: Navigating the Future with Advanced ICTs and Industry 4.0 — Very low level of disagreement. The speakers were largely aligned on goals and strategies, with differences mainly in em…
S112
Pre 10: Regulation of Autonomous Weapon Systems: Navigating the Legal and Ethical Imperative — Elena Plexida: Thank you, Wolfgang. Thank you very much. Hello everyone. Yes, exactly. As you said, I work for one of th…
S113
High-level AI Standards panel — Paul Gaskell: Thank you, Bilel. So, I mean, as a government, we recognize that digital standards really matter. So we’re…
Speakers Analysis
Detailed breakdown of each speaker’s arguments and positions
S
Shalini Kapoor
9 arguments128 words per minute2572 words1200 seconds
Argument 1
Data trapped in PDFs and lack of trust hampers AI use (Shalini Kapoor)
EXPLANATION
Shalini points out that a large amount of valuable information resides in PDFs and other documents that organisations are reluctant to share with AI systems. This mistrust prevents AI from accessing and leveraging that data effectively.
EVIDENCE
She notes that information is “stuck in PDFs, stuck in documents” and that there is “a fear, there’s lack of trust today” which keeps the data where it is, even though AI thrives on data [5-6].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 highlights that valuable information remains locked in PDFs and fragmented silos, and a lack of trust prevents organisations from sharing data with AI systems, underscoring the need for interoperable, trusted data.
MAJOR DISCUSSION POINT
AI‑readiness of data & fragmented silos
Argument 2
AI‑ready data must be cleaned, linked, safe, trusted and interoperable (Shalini Kapoor)
EXPLANATION
She emphasizes that for data to be AI‑ready it must undergo cleaning, linking, and be presented in a safe and trusted manner. Interoperability and proper structuring are essential to make the data useful for AI applications.
EVIDENCE
She states that “the data has to be AI ready… safe, trusted manner, the data can be linked, made useful and then made available” and later describes the process of cleaning, linking, making data relevant and useful [17][20-22].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 stresses that AI‑ready data must be safe, trusted, cleaned and linked to become useful, while S13 provides a systematic framework that breaks down these preparation steps.
MAJOR DISCUSSION POINT
AI‑readiness of data & fragmented silos
AGREED WITH
Rohit Bardawaj, Prem Ramaswami
Argument 3
Trustworthy, safe and publicly accessible data is a core institutional responsibility (Shalini Kapoor)
EXPLANATION
Shalini asks the panel what responsibility institutions have to ensure data is trustworthy, safe, and openly available. She frames this as a duty of public bodies to make data usable for AI while protecting its integrity.
EVIDENCE
She poses the question to Rohit: “what do you think is the responsibility of institution how and yours is an institution to make the data trusted safe and available to all” [31-32].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 and S14 discuss the importance of trust, provenance and public accessibility of data as institutional duties for enabling AI.
MAJOR DISCUSSION POINT
Role of institutions & governance frameworks
Argument 4
Example of a 5,000‑term Marathi glossary for agricultural AI use‑cases (Shalini Kapoor)
EXPLANATION
Shalini describes a large domain‑specific glossary created in Marathi to support agricultural AI applications. The glossary contains thousands of localized terms that enable contextual understanding by AI models.
EVIDENCE
She mentions that “we actually created glossary of 5000 terms which is it is in Marathi” and that it was used in agricultural AI experiments [84-86].
MAJOR DISCUSSION POINT
Contextualisation & domain‑specific glossaries
AGREED WITH
Ashish Srivastava, Prem Ramaswami
Argument 5
MSME compliance queries involving millions of annual data points (Shalini Kapoor)
EXPLANATION
Shalini highlights the massive scale of compliance queries faced by micro, small and medium enterprises (MSMEs), noting millions of new compliance questions each year. This illustrates the data volume challenge that AI‑ready solutions must address.
EVIDENCE
She cites an organization handling “3,000 entities” that manage “5 million new compliances in a year” and the associated query load [23-27].
MAJOR DISCUSSION POINT
Practical use cases & applications
Argument 6
“Data boarding pass” concept to onboard and monetize AI‑ready datasets for B2B use (Shalini Kapoor)
EXPLANATION
Shalini introduces a “data boarding pass” framework that certifies datasets as AI‑ready through a checklist, enabling businesses and policymakers to access and monetize the data. It aims to streamline data onboarding and create a market for trusted data assets.
EVIDENCE
She describes the concept as a physical and digital checklist that, once passed, allows B2B players to “pick the data and then start using it in your applications” and gives a concrete scenario involving automobile MSME manufacturers [353-362].
MAJOR DISCUSSION POINT
Practical use cases & applications
Argument 7
Sustainable data economy needs incentives, clear value, and exchangeability for contributors (Shalini Kapoor)
EXPLANATION
Shalini outlines a GIVE model (Guaranteed trust, Incentive, Value, Exchangeability) to motivate data providers and users. She argues that a clear incentive structure is essential for a functional data economy.
EVIDENCE
She references a paper on the “GIVE” model, explaining each component-trust, incentive, value, exchangeability-and stresses that incentives must be clear for contributors and users [391-399].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 outlines the GIVE model—guaranteed trust, incentive, value, exchangeability—as essential for a sustainable data economy.
MAJOR DISCUSSION POINT
Business models & sustainability of data platforms
AGREED WITH
Audience, Rohit Bardawaj
Argument 8
Technology alone cannot solve data silos without accompanying governance (Shalini Kapoor)
EXPLANATION
Shalini argues that purely technical solutions are insufficient; effective governance frameworks are required to address data silos. She stresses the need for policies that balance data sovereignty with AI capabilities.
EVIDENCE
She remarks that “you don’t want to give you maybe want to keep the data and the sovereignty comes in” and that “countries want to keep the data with themselves” highlighting the governance dimension [149-152].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S21 notes that technology must be paired with proper governance structures, and S23 warns that policy silos hinder effective data governance.
MAJOR DISCUSSION POINT
Governance vs. technology debate for alternative data
AGREED WITH
Rohit Bardawaj, Prem Ramaswami
Argument 9
Developing benchmarks to measure answer stability across LLMs and users (Shalini Kapoor)
EXPLANATION
Shalini notes ongoing work to create benchmarks that assess whether repeated queries to LLMs produce consistent answers across models and users. Such benchmarks are intended to improve trust in AI outputs.
EVIDENCE
She explains that they are “working to create a benchmark” by testing if the same question yields the same answer across LLMs and multiple users, using examples like Amul AI and Bharat Vistar [84-88].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 points out variability in LLM answers and the need for benchmarking; S24 and S25 discuss the creation and co‑creation of benchmarks to improve reliability.
MAJOR DISCUSSION POINT
Trust, stability and benchmarking of AI outputs
AGREED WITH
Rohit Bardawaj, Ashish Srivastava
R
Rohit Bardawaj
9 arguments185 words per minute2308 words746 seconds
Argument 1
Institutions need a shared definition and framework for AI readiness (Rohit Bardawaj)
EXPLANATION
Rohit argues that without a common definition of AI readiness, institutions cannot coordinate efforts to prepare data for AI. He calls for a consensus framework that outlines the necessary standards and processes.
EVIDENCE
He questions whether a “uniform definition of what is AI readiness” exists and stresses the need for an “agreed framework” that institutions can adopt [33-46].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 emphasizes the lack of a uniform AI‑readiness definition and calls for a consensus framework among institutions.
MAJOR DISCUSSION POINT
AI‑readiness of data & fragmented silos
AGREED WITH
Shalini Kapoor, Prem Ramaswami
Argument 2
AI‑ready data requires machine‑readable catalogs, metadata and context files (Rohit Bardawaj)
EXPLANATION
Rohit outlines the technical components needed for AI‑ready data: a machine‑readable catalog (preferably JSON), comprehensive metadata, and a context file that explains domain‑specific terms. These elements enable AI systems to interpret data correctly.
EVIDENCE
He details the need for a “catalog of your data” in JSON, accompanying “metadata” and a “context file” that clarifies meanings such as “frequency” [184-205].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S13 provides a systematic approach that includes machine‑readable catalogs, rich metadata and domain context files as core components of AI‑ready data.
MAJOR DISCUSSION POINT
AI‑readiness of data & fragmented silos
AGREED WITH
Shalini Kapoor, Prem Ramaswami
Argument 3
Structured data with standardized codes and dimensions is essential (Rohit Bardawaj)
EXPLANATION
Rohit emphasizes that data must be standardized, with uniform codes and clearly defined dimensions, to be usable by AI. Without such structure, AI models cannot reliably interpret fields like time or frequency.
EVIDENCE
He discusses standardizing codes, defining dimensions and attributes, and clarifying that “time means temporal” to make data machine-understandable [208-221].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S13 stresses the need for standardized codes, clear dimensions and attributes to make data machine‑understandable for AI.
MAJOR DISCUSSION POINT
AI‑readiness of data & fragmented silos
Argument 4
MOSB should lead creation of an agreed AI‑readiness framework (Rohit Bardawaj)
EXPLANATION
Rohit states that the Ministry of Statistics and Programme Implementation (MOSB) has a key responsibility to spearhead the development of a national AI‑readiness framework. This leadership would help align stakeholders around common standards.
EVIDENCE
He says “the biggest responsibility of our institutions like MOSB to make people aware what AI readiness is all about” [43].
MAJOR DISCUSSION POINT
Role of institutions & governance frameworks
Argument 5
A data steward and federated model are needed to govern alternative data sources (Rohit Bardawaj)
EXPLANATION
Rohit proposes appointing a data steward and adopting a federated data model to manage alternative data sources. This approach ensures that no single entity owns all data and that governance is distributed.
EVIDENCE
He mentions the need for a “federated model” and a “data steward” to orchestrate the ecosystem, noting that “there cannot be one whole sole owner for a data” [181-184].
MAJOR DISCUSSION POINT
Role of institutions & governance frameworks
AGREED WITH
Shalini Kapoor, Prem Ramaswami
Argument 6
Identical prompts can yield different analyses; consistency is a trust issue (Rohit Bardawaj)
EXPLANATION
Rohit highlights research showing that the same prompt given to AI with the same dataset can produce divergent analyses, undermining trust in AI outputs. He warns against being overly enthusiastic before the technology is reliably tested.
EVIDENCE
He references a paper where “the same prompt to AI with the same data set, it gives you two types of analysis” [80-82].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 reports that the same prompt can produce divergent analyses across LLMs, highlighting a trust problem that benchmarking (S24) aims to address.
MAJOR DISCUSSION POINT
Trust, stability and benchmarking of AI outputs
AGREED WITH
Shalini Kapoor, Ashish Srivastava
Argument 7
Integrating alternative data is primarily a governance challenge, not just a technical one (Rohit Bardawaj)
EXPLANATION
Rohit asserts that the main obstacle to incorporating alternative data lies in governance rather than technology. He encourages the audience to view it as a policy and stewardship issue.
EVIDENCE
He conducts a poll asking whether the issue is “governance” or “technology” and concludes it is a governance issue [160-170].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S21 and S23 argue that governance, not technology, is the main barrier to incorporating alternative data sources.
MAJOR DISCUSSION POINT
Governance vs. technology debate for alternative data
AGREED WITH
Shalini Kapoor, Prem Ramaswami
Argument 8
Federated stewardship and policy frameworks are essential before technical solutions (Rohit Bardawaj)
EXPLANATION
Rohit reiterates that before deploying technical tools, a federated stewardship model and clear policy frameworks must be established to manage data responsibly. Governance sets the foundation for any technical implementation.
EVIDENCE
He emphasizes that “we need a federated model” and that “we need to understand first” the governance aspects before technical work [177-184].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S21 stresses the need for federated stewardship models and clear policy frameworks as prerequisites for any technical implementation.
MAJOR DISCUSSION POINT
Governance vs. technology debate for alternative data
Argument 9
NSO is publicly funded; commercial use may be charged under a policy (Rohit Bardawaj)
EXPLANATION
Rohit explains that the National Statistics Office (NSO) receives public funding, making its data free for research but potentially chargeable for commercial applications. A policy governs the pricing for commercial use.
EVIDENCE
He states that “NSO is fully funded by the Government” and that “if the use is commercial, then there is a policy and people have to pay accordingly” [380-388].
MAJOR DISCUSSION POINT
Business models & sustainability of data platforms
AGREED WITH
Shalini Kapoor, Audience
P
Prem Ramaswami
7 arguments188 words per minute2119 words672 seconds
Argument 1
Data Commons offers an open‑source, federated stack that lets organisations govern data locally (Prem Ramaswami)
EXPLANATION
Prem describes Data Commons as an open‑source platform that aggregates data globally while allowing each organization to retain local governance. This federated approach prevents centralised control and supports diverse data owners.
EVIDENCE
He notes that Data Commons “open-sourced the entire stack” and is used by entities like the United Nations Statistical Department, enabling local governance of data [55-64].
MAJOR DISCUSSION POINT
Knowledge graphs, Data Commons and open‑source solutions
AGREED WITH
Shalini Kapoor, Rohit Bardawaj
Argument 2
Data Commons aggregates global datasets into a knowledge graph with an AI search layer (Prem Ramaswami)
EXPLANATION
Prem explains that Data Commons combines multiple datasets into a common knowledge graph and places an AI‑powered search engine on top, allowing rapid data discovery and analysis.
EVIDENCE
He states that Data Commons “bring multiple data sets globally together in a common knowledge graph and then put an AI search engine on top of it” [59-60].
MAJOR DISCUSSION POINT
Knowledge graphs, Data Commons and open‑source solutions
Argument 3
Open‑sourcing prevents single‑point ownership and enables local governance (Prem Ramaswami)
EXPLANATION
Prem argues that by open‑sourcing the stack, no single entity can monopolise the data, and organisations can manage their own data locally. This decentralisation enhances trust and adaptability.
EVIDENCE
He mentions that open-sourcing “prevents single-point ownership” and that the UN uses Data Commons as a backend, illustrating distributed governance [61-64].
MAJOR DISCUSSION POINT
Knowledge graphs, Data Commons and open‑source solutions
Argument 4
Knowledge graphs can ground LLMs, fill gaps and improve answer accuracy (Prem Ramaswami)
EXPLANATION
Prem proposes that a knowledge graph provides factual grounding for large language models, allowing them to fill missing information and generate more accurate responses. The graph acts as a factual substrate for AI reasoning.
EVIDENCE
He explains that a knowledge graph can be used “to ground it in those facts” and then leverage LLM intelligence to fill gaps, improving answer quality [112-118].
MAJOR DISCUSSION POINT
Knowledge graphs, Data Commons and open‑source solutions
AGREED WITH
Shalini Kapoor, Ashish Srivastava
Argument 5
Overlaying a company’s own data with the commons creates network effects for richer analysis (Prem Ramaswami)
EXPLANATION
Prem illustrates that when an organisation uploads its own dataset to Data Commons, it automatically integrates with thousands of existing datasets, creating synergistic insights and reducing the need for multiple data transformations.
EVIDENCE
He describes a scenario where a chain store adds its sales data and instantly “overlays with the 50,000 data sets” already in Data Commons, enabling broader analysis [304-317].
MAJOR DISCUSSION POINT
Knowledge graphs, Data Commons and open‑source solutions
Argument 6
Imperfect AI can still be statistically safer than human‑only decisions (Prem Ramaswami)
EXPLANATION
Prem compares AI‑driven systems to human performance, noting that despite imperfections, AI can reduce overall risk compared to human‑only approaches, as illustrated by accident statistics.
EVIDENCE
He cites that “the 30,000 deaths from car accidents” in the U.S. are higher than AI-related accidents, suggesting AI is statistically safer [144-147].
MAJOR DISCUSSION POINT
Trust, stability and benchmarking of AI outputs
Argument 7
Decision support for small business location planning using Data Commons (Prem Ramaswami)
EXPLANATION
Prem provides an example where a small business owner can use Data Commons to evaluate factors such as mobility, traffic, demographics, and income to decide where to open a new shop, thereby de‑risking the investment decision.
EVIDENCE
He narrates a scenario of an MSME owner needing data on “mobility, traffic, demographics, affordability” and how Data Commons can provide that insight [289-301].
MAJOR DISCUSSION POINT
Practical use cases & applications
A
Ashish Srivastava
4 arguments151 words per minute1240 words491 seconds
Argument 1
LLMs struggle with domain vocabularies; glossaries/knowledge graphs improve performance (Ashish Srivastava)
EXPLANATION
Ashish observes that large language models perform well on general language but falter on domain‑specific terminology. Introducing glossaries or knowledge graphs helps bridge this gap and improves translation and understanding.
EVIDENCE
He explains that “LLMs are becoming increasingly good… but the moment they hit any domain-specific vocabulary, that’s when they start failing” and that they solved it by “using a glossary combined with the LLM” [102-106].
MAJOR DISCUSSION POINT
Contextualisation & domain‑specific glossaries
AGREED WITH
Shalini Kapoor, Prem Ramaswami
Argument 2
AI should be a tool that augments human solutions, not the sole answer (Ashish Srivastava)
EXPLANATION
Ashish stresses that AI models constitute only a small portion of a solution and must be combined with human oversight, guardrails, and risk assessment. AI is a supplement, not a replacement for human judgment.
EVIDENCE
He notes that “LLMs or AI models are the solution… they are only one of the inputs to the solution” and that guardrails and human-in-the-loop are necessary [125-130].
MAJOR DISCUSSION POINT
Contextualisation & domain‑specific glossaries
Argument 3
Human‑in‑the‑loop guardrails and risk assessment are required for reliable AI (Ashish Srivastava)
EXPLANATION
Ashish argues that to ensure trustworthy AI outputs, systems must incorporate human oversight, guardrails, and systematic risk assessments. This mitigates the variability and potential errors of AI models.
EVIDENCE
He references the need for “guardrails, human in the loop, risk assessment” as essential tools for reliable AI [125-130].
MAJOR DISCUSSION POINT
Trust, stability and benchmarking of AI outputs
AGREED WITH
Shalini Kapoor, Rohit Bardawaj
Argument 4
Education, health and inclusion solutions powered by AI and data (Ashish Srivastava)
EXPLANATION
Ashish describes projects that leverage AI for social sectors such as women and child health, education, and inclusion of marginalized groups. These initiatives aim to improve outcomes by providing data‑driven decision support.
EVIDENCE
He mentions a decade of work on “AI for social problems or digital, like women in child health” and later discusses work on education, health, and inclusion through AI [95-98][250-257].
MAJOR DISCUSSION POINT
Practical use cases & applications
A
Audience
1 argument172 words per minute200 words69 seconds
Argument 1
High‑quality data collection is costly; platforms must explore public‑private pricing mechanisms (Audience)
EXPLANATION
An audience member raises concerns about the sustainability of data platforms, noting that collecting and maintaining high‑quality data is expensive and may require mixed public‑private financing models.
EVIDENCE
The audience asks about “business models of these platforms” and whether they need “publicly paid or whatever models” given the cost of high-quality data [372-376].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
S1 discusses the necessity of clear incentive models and sustainable financing for high‑quality data, while S15 highlights public infrastructure investments that can complement private funding.
MAJOR DISCUSSION POINT
Business models & sustainability of data platforms
AGREED WITH
Shalini Kapoor, Rohit Bardawaj
S
Speaker 1
1 argument136 words per minute23 words10 seconds
Argument 1
Request for guidance on launching a Data Commons instance (Speaker 1)
EXPLANATION
Speaker 1 asks the panel for practical advice on how an organization can set up its own Data Commons instance, indicating interest in adopting the discussed technology.
EVIDENCE
The speaker asks, “Tell us a bit more about, like if suppose someone wants to put up a Data Commons instance, how can they get started?” [303].
MAJOR DISCUSSION POINT
Business models & sustainability of data platforms
Agreements
Agreement Points
A shared understanding and technical framework for AI‑ready data is essential, including cleaning, linking, safe and trusted handling, machine‑readable catalogs, metadata and context files.
Speakers: Shalini Kapoor, Rohit Bardawaj, Prem Ramaswami
AI‑ready data must be cleaned, linked, safe, trusted and interoperable (Shalini Kapoor) Institutions need a shared definition and framework for AI readiness (Rohit Bardawaj) AI‑ready data requires machine‑readable catalogs, metadata and context files (Rohit Bardawaj) Data Commons offers an open‑source, federated stack that lets organisations govern data locally (Prem Ramaswami) Knowledge graphs can ground LLMs, fill gaps and improve answer accuracy (Prem Ramaswami)
All three speakers stress that data must be prepared through cleaning, linking and standardisation, and that a common, machine-readable definition (catalogues, metadata, context files) is needed to make data trustworthy and usable by AI systems [17][20-22][184-205][55-58][112-118].
POLICY CONTEXT (KNOWLEDGE BASE)
This aligns with the ‘Data first in the AI era’ framework that calls for machine-readable catalogs, metadata and safe handling to enable trustworthy AI [S48]; similar recommendations appear in the Foundation of AI Democratizing Compute Data Infrastructure report emphasizing democratized data infrastructure [S49]; and the adoption barriers identified for agentic AI stress data readiness and governance gaps [S50]; multilayered data governance guidance also supports technical standards for AI-ready data [S54].
Governance, not just technology, is the primary hurdle for integrating fragmented and alternative data sources; a federated stewardship model and clear policy frameworks are required.
Speakers: Shalini Kapoor, Rohit Bardawaj, Prem Ramaswami
Technology alone cannot solve data silos without accompanying governance (Shalini Kapoor) Integrating alternative data is primarily a governance challenge, not just a technical one (Rohit Bardawaj) A data steward and federated model are needed to govern alternative data sources (Rohit Bardawaj) Data Commons offers an open‑source, federated stack that lets organisations govern data locally (Prem Ramaswami)
The panel agrees that policy and governance structures (federated model, data steward) must precede technical solutions for data interoperability [149-152][160-170][177-184][55-64].
POLICY CONTEXT (KNOWLEDGE BASE)
The need for federated stewardship mirrors India’s AI Leap policy which prioritises clear governance frameworks for AI diffusion [S42]; the debate on the scope of government involvement in AI governance underscores the centrality of policy coordination [S43]; multilayered governance approaches for emerging technologies highlight governance as the key challenge over technology [S54]; and analyses of digital governance fragmentation call for unified policy to integrate fragmented data sources [S61][S62][S63].
Trustworthiness of AI outputs requires benchmarking, consistency checks and human‑in‑the‑loop guardrails.
Speakers: Shalini Kapoor, Rohit Bardawaj, Ashish Srivastava
Developing benchmarks to measure answer stability across LLMs and users (Shalini Kapoor) Identical prompts can yield different analyses; consistency is a trust issue (Rohit Bardawaj) Human‑in‑the‑loop guardrails and risk assessment are required for reliable AI (Ashish Srivastava)
All three highlight the need for systematic evaluation (benchmarks) and safeguards to ensure reliable AI results [84-88][80-82][125-130].
POLICY CONTEXT (KNOWLEDGE BASE)
Professional standards for AI development stress benchmarking and human-in-the-loop oversight as essential safeguards [S45]; safe and responsible AI pathways describe guardrails and risk-assessment tools that operationalise these principles [S57]; calls for standardisation of trust verification highlight the need for consistent safety policies across the industry [S58]; and broader trustworthiness frameworks emphasise reliability, accessibility and human oversight [S59][S56].
Domain‑specific glossaries or knowledge graphs are needed to contextualise data and improve LLM performance.
Speakers: Shalini Kapoor, Ashish Srivastava, Prem Ramaswami
Example of a 5,000‑term Marathi glossary for agricultural AI use‑cases (Shalini Kapoor) LLMs struggle with domain vocabularies; glossaries/knowledge graphs improve performance (Ashish Srivastava) Knowledge graphs can ground LLMs, fill gaps and improve answer accuracy (Prem Ramaswami)
The speakers concur that adding structured domain knowledge (glossaries, knowledge graphs) bridges the gap between raw data and LLM understanding [84-86][102-106][112-118].
A sustainable data economy requires clear incentives, value creation and appropriate pricing models for public and commercial use.
Speakers: Shalini Kapoor, Audience, Rohit Bardawaj
Sustainable data economy needs incentives, clear value, and exchangeability for contributors (Shalini Kapoor) High‑quality data collection is costly; platforms must explore public‑private pricing mechanisms (Audience) NSO is publicly funded; commercial use may be charged under a policy (Rohit Bardawaj)
All three acknowledge that financing high-quality data and defining incentive structures (GIVE model, public-private mix, commercial licensing) are essential for a viable data market [391-399][372-376][380-388].
POLICY CONTEXT (KNOWLEDGE BASE)
Policy analyses of AI-driven economic growth argue that targeted incentives and pricing models are needed to harness AI for development and reduce disparities [S40]; the role of free data flows in fostering economic development underscores the importance of pricing mechanisms for public and commercial data use [S55]; and the ‘Data first’ framework stresses collective benefit and value creation in a data-centric economy [S48].
Similar Viewpoints
Both stress that institutions must define clear standards and processes to make data trustworthy and usable by AI [17][20-22][33-46].
Speakers: Shalini Kapoor, Rohit Bardawaj
AI‑ready data must be cleaned, linked, safe, trusted and interoperable (Shalini Kapoor) Institutions need a shared definition and framework for AI readiness (Rohit Bardawaj)
Both view AI as an augmenting tool that, despite imperfections, can improve decision‑making when combined with human oversight [125-130][144-147].
Speakers: Prem Ramaswami, Ashish Srivastava
AI should be a tool that augments human solutions, not the sole answer (Ashish Srivastava) Imperfect AI can still be statistically safer than human‑only decisions (Prem Ramaswami)
Both highlight the variability of AI outputs and the necessity of safeguards to maintain trust [80-82][125-130].
Speakers: Rohit Bardawaj, Ashish Srivastava
Identical prompts can yield different analyses; consistency is a trust issue (Rohit Bardawaj) Human‑in‑the‑loop guardrails and risk assessment are required for reliable AI (Ashish Srivastava)
Both advocate a federated, decentralized approach to data stewardship before technical deployment [181-184][55-64].
Speakers: Prem Ramaswami, Rohit Bardawaj
A data steward and federated model are needed to govern alternative data sources (Rohit Bardawaj) Data Commons offers an open‑source, federated stack that lets organisations govern data locally (Prem Ramaswami)
Unexpected Consensus
Both a statistician (Rohit) and a solution‑builder (Ashish) agree that AI output variability is a critical trust issue requiring guardrails, despite their different professional backgrounds.
Speakers: Rohit Bardawaj, Ashish Srivastava
Identical prompts can yield different analyses; consistency is a trust issue (Rohit Bardawaj) Human‑in‑the‑loop guardrails and risk assessment are required for reliable AI (Ashish Srivastava)
The convergence of a data-centric researcher and a practitioner on the need for human oversight and consistency checks was not anticipated given their distinct roles [80-82][125-130].
Overall Assessment

The panel shows strong consensus on four pillars: (1) a common, technically detailed definition of AI‑ready data; (2) governance‑first, federated stewardship of data; (3) the necessity of benchmarks and human guardrails for trustworthy AI; (4) the role of domain‑specific glossaries/knowledge graphs; and (5) the need for incentive‑based data economy models.

High consensus across technical, policy and economic dimensions, indicating that future work should prioritize coordinated standards, federated governance structures, and sustainable financing mechanisms to unlock AI‑driven development.

Differences
Different Viewpoints
Characterisation of data collection approach (top‑down vs bottom‑up)
Speakers: Prem Ramaswami, Shalini Kapoor
Prem states that the data collection by the Ministry of Statistics is bottom-up [269-271] Shalini initially describes the data collection as top-down before being corrected [272-276]
Prem describes the government data pipeline as originating from the field (bottom-up), whereas Shalini first frames it as a top-down process, indicating a mismatch in how the flow of statistical data is perceived [269-276].
POLICY CONTEXT (KNOWLEDGE BASE)
A French panel on digital governance advocated an inverse, community-driven (bottom-up) approach to policy design, contrasting with top-down models [S52]; the Ministry of Statistics example illustrates challenges of a top-down data gathering strategy [S53]; and multilayered data-governance literature discusses balancing top-down standards with bottom-up participation [S54].
Extent to which AI can replace or supplement human decision‑making
Speakers: Prem Ramaswami, Ashish Srivastava
Prem argues that, despite imperfections, AI can be statistically safer than human-only decisions and can be used to de-risk choices [144-147] Ashish stresses that AI models constitute only 10-15 % of a solution, requiring guardrails, human-in-the-loop and risk assessment, and should not be treated as the sole answer [125-130]
Prem sees AI as a tool that can, in many cases, outperform human judgment, while Ashish cautions that AI should remain a minor component of solutions, emphasizing the need for extensive human oversight and guardrails [144-147][125-130].
POLICY CONTEXT (KNOWLEDGE BASE)
Studies in emergency medicine show AI can support but not replace clinicians, highlighting augmentation rather than substitution [S38]; broader workplace research confirms AI complements human work rather than displaces it [S39]; philosophical discussions on AI’s existential challenge to human expertise provide context on concerns about replacement [S41]; and evidence that AI can outperform humans in specific tasks (e.g., debate persuasiveness) adds nuance to the replacement debate [S44].
Preferred technical architecture for making data AI‑ready
Speakers: Rohit Bardawaj, Prem Ramaswami
Rohit outlines a technical stack centred on machine-readable catalogs, rich metadata and context files, plus standardised codes and dimensions [184-221] Prem promotes an open-source, federated knowledge-graph stack with an AI search layer that aggregates global datasets while allowing local governance [55-64][59-60]
Rohit focuses on cataloguing, metadata and context files as the core of AI-readiness, whereas Prem advocates a knowledge-graph-based, federated platform as the primary solution, reflecting divergent technical priorities [184-221][55-64].
POLICY CONTEXT (KNOWLEDGE BASE)
Recommendations for democratized compute and data infrastructure aim to avoid new dependencies and support interoperable AI-ready architectures [S49]; and the identified governance and data-readiness gaps for agentic AI stress the need for scalable, standards-based technical solutions [S50].
Unexpected Differences
Differing views on AI’s capacity to outperform human decision‑making
Speakers: Prem Ramaswami, Ashish Srivastava
Prem claims AI can be statistically safer than human-only decisions, citing accident statistics [144-147] Ashish warns that AI is only a small part of a solution and must be coupled with extensive human oversight and guardrails [125-130]
Given Prem’s background in large-scale data platforms, his confidence in AI’s safety is stronger than Ashish’s cautious stance, which is unexpected given both operate in AI-focused environments [144-147][125-130].
POLICY CONTEXT (KNOWLEDGE BASE)
The same body of evidence that AI can augment human decisions in healthcare and work contexts [S38][S39] is balanced by analyses of AI’s potential to surpass human performance in certain domains, raising questions about the future role of expertise [S41][S44].
Contrasting perception of statistical data flow (top‑down vs bottom‑up)
Speakers: Prem Ramaswami, Shalini Kapoor
Prem describes the data pipeline as bottom-up, originating from field-level collection [269-271] Shalini initially frames it as a top-down process before being corrected [272-276]
The mismatch in describing the direction of data flow was not anticipated, revealing differing mental models of how governmental statistics are generated [269-276].
POLICY CONTEXT (KNOWLEDGE BASE)
The inverse, bottom-up governance approach advocated in digital policy discussions contrasts with traditional top-down statistical data collection models, as highlighted in the French governance panel [S52] and the Ministry of Statistics case study [S53]; multilayered governance frameworks also address this tension [S54].
Overall Assessment

The panel shows moderate disagreement centred on technical implementation choices (catalog vs knowledge‑graph) and the role of AI relative to human decision‑making, while there is broad consensus on the need for governance frameworks, trust, and federated stewardship. The disagreements are substantive but not polarising, indicating that collaborative standard‑setting and pilot projects could reconcile the differing viewpoints.

Moderate – differing technical preferences and philosophical stances on AI’s authority, but shared commitment to governance, trust and open‑source solutions, suggesting that coordinated policy and technical work can bridge gaps.

Partial Agreements
Both concur that institutions must adopt a common framework to ensure data is trustworthy, safe and accessible, even though Rohit stresses the need for a formal definition first [31-32][33-46].
Speakers: Shalini Kapoor, Rohit Bardawaj
Shalini asks institutions to make data trusted, safe and publicly available [31-32] Rohit calls for a shared, agreed definition and framework for AI-readiness [33-46]
Both recognise variability in AI outputs as a trust problem and agree on the necessity of benchmarking to improve reliability [80-82][84-88].
Speakers: Rohit Bardawaj, Shalini Kapoor
Rohit cites research showing identical prompts can yield divergent analyses, highlighting trust issues [80-82] Shalini mentions ongoing work to create benchmarks that test answer stability across LLMs and users [84-88]
Both support a federated, decentralized governance model for data, differing only in the concrete implementation details [181-184][55-64].
Speakers: Rohit Bardawaj, Prem Ramaswami
Rohit proposes a federated stewardship model with a data steward to govern alternative data sources [181-184] Prem describes an open-source, federated Data Commons stack that lets each organisation govern its data locally [55-64]
Takeaways
Key takeaways
AI‑ready data must be cleaned, linked, safe, trusted, interoperable and presented in machine‑readable formats (catalogs, metadata, context files). A shared, agreed‑upon definition and framework for AI‑readiness is needed; institutions like MOSB/NSO should lead its creation. Federated stewardship and local governance are essential to avoid single‑point ownership while enabling data sharing. Open‑source stacks such as Google Data Commons can aggregate diverse datasets into a knowledge graph with an AI search layer, creating network effects when organisations overlay their own data. Domain‑specific glossaries or knowledge graphs are required to contextualise LLM outputs, especially for local languages and sector vocabularies. Trust and stability of AI answers are concerns; benchmarking across LLMs and human‑in‑the‑loop guardrails are being explored. AI should be treated as a tool that augments human decision‑making, not as a complete solution. Governance challenges (policy, stewardship, incentives) outweigh pure technical challenges when integrating alternative data sources. Sustainable business models need clear incentives, value propositions and exchangeability for data contributors; public funding covers research use, commercial use may be monetised.
Resolutions and action items
Rohit Bardawaj will draft a slide deck and an agreed‑upon AI‑readiness framework (core + aspirational components). Create machine‑readable data catalogs (JSON/XML) with metadata and context files for public datasets. Standardise codes, dimensions and create business glossaries/knowledge graphs for domain vocabularies. Prem Ramaswami will continue development of contextualisation features (glossary‑grounded LLMs) within Data Commons and provide guidance for setting up a Data Commons instance (20‑minute guide). NSO will formalise a data‑stewardship role and publish policies for commercial access to public data. Develop a benchmark suite to measure answer stability across LLMs and repeated queries (as mentioned by Shalini). Promote the “data boarding pass” concept to onboard AI‑ready datasets for B2B consumption. Ashish Srivastava’s lab will prototype reusable policy artifacts (DPIs/DPGs) for automated data‑governance in solutions.
Unresolved issues
No consensus yet on a precise, industry‑wide definition of “AI‑readiness”. How to systematically integrate and govern alternative/secondary data sources beyond administrative data. Exact funding and pricing mechanisms for a sustainable data economy; how incentives will be calibrated. Scalable process for creating and maintaining domain‑specific glossaries across many languages and sectors. Implementation details for automatic enforcement of data‑use policies at the API level. How to ensure consistent LLM outputs in practice; the benchmark is still under development. Privacy and consent handling for personal or sensitive data when building AI‑ready repositories.
Suggested compromises
Adopt a hybrid approach: combine Retrieval‑Augmented Generation (RAG) with LLM capabilities rather than relying solely on one technique. Use open‑source, federated architectures (e.g., Data Commons) to balance data sovereignty with broad accessibility. Apply guardrails and human‑in‑the‑loop checks while still leveraging AI’s speed and scalability. Accept that AI outputs will be imperfect but can be statistically safer than human‑only decisions; focus on risk assessment rather than perfection. Provide both free access for research/public good and a paid tier for commercial use to fund platform maintenance.
Thought Provoking Comments
Do we have a uniform definition of what AI readiness is? People are not aware what it takes to make data AI ready, and we need an agreed framework and a slide deck showing what AI can see versus what a human can see.
Highlights a foundational gap – the lack of a shared definition of AI‑ready data – and proposes creating a common framework, which is essential before any technical work can proceed.
Shifted the discussion from abstract problem statements to the need for standardization. It prompted subsequent speakers (Prem, Ashish) to talk about metadata, catalogs, and governance, and set the stage for Rohit’s later detailed checklist.
Speaker: Rohit Bardawaj
If we can get our data in a machine‑readable format (structured, with metadata) and put it into a knowledge graph, then layering a large language model on top gives a much better chance of answering questions correctly.
Introduces the concrete technical architecture (knowledge graph + LLM) and the principle of federated, open‑source data commons, moving the conversation from problem description to a viable solution model.
Steered the dialogue toward practical implementation, influencing Rohit’s later points about cataloging and APIs, and prompting Ashish to discuss contextualization and glossaries.
Speaker: Prem Ramaswami
Data is not a transaction; it is a journey. We need interoperable, contextual, and verifiable data. LLMs fail on domain‑specific vocabularies, so we should combine a glossary with the LLM to improve translation and understanding.
Broadens the perspective from static datasets to dynamic data flows and stresses the importance of context and verification, while offering a tangible remedy (glossary) for LLM limitations.
Deepened the conversation about data quality and contextualization, leading Prem to elaborate on knowledge graphs as factual bases and prompting Rohit to discuss metadata and business glossaries.
Speaker: Ashish Srivastava
We are seeing that the same prompt to an LLM with the same dataset can give two different analyses. We need benchmarks to measure stability of answers across models and repetitions.
Raises the critical issue of reproducibility and trust in AI outputs, calling for systematic benchmarking—a step toward responsible AI deployment.
Prompted Shalini to mention ongoing benchmark work (Amul AI, Bharat Vistar) and reinforced the need for evaluation frameworks throughout the panel.
Speaker: Rohit Bardawaj
LLMs are only 10‑15 % of what you need for a solution; the rest is guardrails, human‑in‑the‑loop, risk assessment. Probabilistic models will never be perfectly consistent, and we must focus on external controls, not just the model itself.
Challenges the hype around LLMs by emphasizing their limited role and the necessity of governance, risk, and human oversight.
Shifted the tone from optimism to caution, influencing Prem’s later remarks about using AI as a tool and reinforcing Rohit’s governance emphasis.
Speaker: Ashish Srivastava
We need a catalog of data in a machine‑readable JSON (or XML) file, with metadata, a context file, a business glossary, standardized codes, and structured storage – otherwise AI cannot reliably consume it.
Provides a concrete, step‑by‑step checklist for making data AI‑ready, translating abstract concepts into actionable items.
Served as a practical roadmap that other participants referenced (e.g., Prem’s knowledge graph, Ashish’s policy engine), and set up the later discussion of the “data boarding pass” concept.
Speaker: Rohit Bardawaj
The ‘data boarding pass’ – a physical (or digital) checklist that certifies data as AI‑ready, enabling B2B players, policymakers, and researchers to onboard and use the data instantly.
Introduces an innovative metaphor and operational tool for certifying and sharing AI‑ready data, bridging governance and usability.
Provided a tangible product concept that tied together earlier discussions on standards, benchmarks, and federated access, and gave the audience a concrete takeaway.
Speaker: Shalini Kapoor
Our GIVE framework – Guaranteed trust, Incentive, Value, Exchangeability – defines the economics of data sharing: why data owners should contribute and how value can be monetized while ensuring exchangeability.
Addresses the often‑overlooked business model aspect, linking technical readiness to sustainable incentives and market mechanisms.
Answered the audience’s question on business models, linked back to earlier points about trust and incentives, and rounded out the discussion by connecting technical, governance, and economic layers.
Speaker: Shalini Kapoor
Overall Assessment

The discussion evolved from a broad problem statement about fragmented data silos to a multi‑layered roadmap for AI‑ready data. Key turning points were triggered by comments that exposed foundational gaps (Rohit’s call for a shared definition of AI readiness), proposed concrete architectures (Prem’s knowledge‑graph + LLM model), highlighted practical challenges (Ashish’s journey metaphor and glossary solution), and demanded accountability (Rohit’s benchmark concern). These insights prompted participants to converge on a common language—metadata, catalogs, federated governance—and to envision operational tools such as the data boarding pass and the GIVE economic framework. Collectively, the highlighted comments steered the panel from abstract concerns to actionable strategies, balancing technical possibilities with governance, trust, and sustainability.

Follow-up Questions
Is there a uniform definition or agreed‑upon framework for “AI‑readiness” of data?
Rohit highlighted uncertainty about whether the ecosystem has a common definition of AI readiness, indicating the need to establish a shared standard.
Speaker: Rohit Bardawaj
How can a shared AI‑readiness framework (core and aspirational components) be created and adopted across institutions?
He proposed developing a collaborative framework to define AI‑ready data, suggesting a coordinated effort among stakeholders.
Speaker: Rohit Bardawaj
How can contextualization and domain‑specific glossaries be integrated into Google Data Commons to improve AI responses?
Prem was asked to explain adding domain glossaries to Data Commons, pointing to the need for methods to embed contextual knowledge.
Speaker: Prem Ramaswami
How can alternative or secondary data (beyond administrative sources) be incorporated into the AI‑ready data framework, and what kind of data economy could emerge?
Shalini queried the feasibility of extending the framework to non‑administrative data and its economic implications.
Speaker: Shalini Kapoor (to Rohit Bardawaj)
What sustainable business models (public funding, commercial licensing, incentives) can support the maintenance and growth of high‑quality data platforms?
The audience asked about financing mechanisms for data platforms, prompting discussion on public vs. commercial models.
Speaker: Audience member (addressed to Rohit and Shalini)
How can AI‑ready data be used to detect and resolve data gaps or disconnections in infrastructure projects (e.g., road construction, tender processes)?
The participant raised a practical problem of project delays due to data disconnects, seeking solutions via AI‑ready data.
Speaker: Audience member (addressed to Shalini)
Who should be accountable for data quality and governance in solution pipelines that combine multiple data sources?
Ashish identified accountability for data as a key challenge, indicating a need for clear responsibility mechanisms.
Speaker: Ashish Srivastava
Can a benchmark be created to measure answer stability across different LLMs and repeated queries?
She mentioned ongoing work on a benchmark to ensure consistent answers, highlighting a research gap in evaluation metrics.
Speaker: Shalini Kapoor
How can standards for AI‑ready data keep pace with the rapidly evolving AI landscape?
Prem noted that agreements made today may be obsolete in six months, underscoring the need for continual research and updates.
Speaker: Prem Ramaswami
What methods can combine knowledge graphs with LLMs to fill factual gaps and improve answer accuracy?
He discussed using knowledge graphs as factual backbones for LLMs, indicating a research direction for hybrid systems.
Speaker: Prem Ramaswami
How can reusable policy artifacts (DPIs/DPGs) be designed to enforce data governance automatically at API and policy‑engine levels?
Ashish highlighted the need for standardized, enforceable data policies to streamline compliance.
Speaker: Ashish Srivastava
What governance model should oversee a federated national data ecosystem, and who should act as the data steward?
He suggested a federated model with a designated steward (e.g., NSO) to orchestrate data sharing and governance.
Speaker: Rohit Bardawaj
What standards for metadata, context files, and machine‑readable catalogs (e.g., JSON) are needed to make data AI‑ready?
Rohit emphasized the importance of cataloging, metadata, and context files in machine‑readable formats for AI consumption.
Speaker: Rohit Bardawaj
How can the verification and trustworthiness of publicly declared survey data be ensured for AI applications?
He pointed out that many public datasets are unverified, raising the need for mechanisms to validate such data.
Speaker: Ashish Srivastava

Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.