AI Without the Cost Rethinking Intelligence for a Constrained World
20 Feb 2026 10:00h - 11:00h
AI Without the Cost Rethinking Intelligence for a Constrained World
Summary
The panel opened by highlighting that the rapid AI adoption has led to a scramble for GPU-based infrastructure, often without considering whether applications are built on optimized architectures [3-7]. Bernie argued that reducing computational complexity would allow AI workloads to run on CPUs, edge devices, or even mobile hardware, a step he says is being overlooked in current software development practices [12-17]. He noted that decades of optimization expertise from Oracle and its partner STEM Practice Company provide mathematical methods to lower algorithmic complexity and infrastructure cost [30-31].
Anshumali introduced dynamic sparsity, a technique that selects only the necessary parameters per input rather than performing full matrix multiplication, thereby cutting compute while preserving scaling laws [102-107]. He warned that model parameter growth outpaces GPU memory and compute advances, creating a “memory wall” that will make future large models slower unless new algorithms are adopted [95-98]. To break this plateau, he described a new attention formulation that, when run on CPUs, outperforms GPU-based flash attention for very large context windows, because the algorithm reduces the quadratic cost of attention [152-158].
Kenny reported that their AI-MSET technology can lower compute costs by three orders of magnitude and provide early-warning prognostics for CPUs and GPUs, preventing costly downtime in data centers [178-184][188-194]. He demonstrated a specific use case where MSET reduced anomaly-detection expenses by a factor of 2,500, illustrating the potential for massive energy and cost savings [199-203]. Kevin emphasized that conventional probabilistic LLMs suffer from hallucinations, so deterministic AI architectures that guarantee repeatable outputs are needed for safety-critical domains [390-403]. Ayush explained that generic chat-GPT solutions lack enterprise data context and cannot meet the reliability required for decision-making, prompting the need for domain-specific models and governance frameworks [362-369][371-378].
The panel agreed that robust governance must address data sovereignty, GDPR/DPDPI compliance, and false-alarm mitigation, especially when scaling sensor-driven AI systems [413-418][429-435]. Bernie linked these technical challenges to environmental sustainability, arguing that avoiding massive GPU clusters reduces power, water, and heat footprints, and that more efficient algorithms constitute a responsible AI path [75-78][305-314]. He also mentioned the company’s upcoming quantum-enablement center as a longer-term avenue for energy-efficient computation, while acknowledging that quantum hardware remains a distant solution [635-637]. The discussion concluded that advancing AI responsibly will require integrating mathematical optimization, new algorithmic designs, deterministic models, and sustainable hardware to meet both enterprise needs and societal constraints [12-17][152-158][390-403][75-78].
Keypoints
Major discussion points
– Infrastructure cost and the need for optimization – The panel opened by stressing that AI development is dominated by expensive GPU-based clusters and that many projects skip the usual software-level optimization steps, leading to wasteful power consumption and environmental impact. ([3-8], [12-17], [20-22], [75-78])
– Algorithmic and architectural innovations to reduce compute – Several speakers highlighted research on dynamic sparsity, mixture-of-experts, and especially new attention mathematics that break the quadratic scaling of long-context windows, allowing CPUs to outperform GPUs for very large contexts. ([100-108], [112-119], [124-133], [152-158], [162-169], [170-176])
– Energy-saving prognostics (MSET/AI-MSET) and massive compute reductions – Kenny described the AI-MSET suite that predicts hardware failures weeks in advance and cuts inference cost by orders of magnitude (up to 2 500× in a case study), while also eliminating false-alarm cascades in sensor-rich systems. ([178-184], [188-194], [199-203], [429-441])
– Governance, reliability and deterministic AI – Kevin and Ayush explained why probabilistic LLMs are unsuitable for regulated domains, proposing deterministic AI architectures, strict auditability, and enterprise-level data-governance (GDPR, DPDP-I) as essential safeguards. ([390-401], [402-404], [362-381], [413-421])
– Societal and educational implications – The conversation turned to how AI tools affect students and professionals, the risk of over-reliance on “hallucinating” models, and the need for curricula and policy that balance convenience with critical thinking and reliability. ([623-627], [625-632], [629-632], [560-564])
Overall purpose / goal
The panel was convened to educate the audience about the hidden costs of the current GPU-centric AI boom, showcase mature mathematical and systems-level techniques that can dramatically lower those costs, and discuss how such efficiencies intersect with sustainability, governance, and real-world deployment challenges. The speakers repeatedly urged participants to adopt these alternative methods before the industry’s resource consumption becomes unsustainable.
Overall tone and its evolution
– Opening (0:00-10:48) – Technical, urgent, and a bit critical: “we are just buying more GPUs… we are burning the planet.”
– Middle (10:48-45:12) – Shifts to optimistic and solution-focused, with detailed explanations of novel algorithms, energy-saving case studies, and concrete governance frameworks.
– Later (45:12-84:48) – Becomes collaborative and conversational, featuring jokes, personal anecdotes, and a promotional “let’s work together” vibe while still emphasizing responsibility.
– Closing (84:48-end) – Returns to a call-to-action tone, urging the audience to adopt sustainable practices, consider policy implications, and visit the company’s booth.
Overall, the discussion moved from a problem-identification stance to a hopeful, solution-driven dialogue, punctuated by moments of humor and promotional outreach.
Speakers
– Ayush Gupta – Representative from Genloop; focuses on agentic data analysis platforms, enterprise AI integration, and cost-effective inference solutions.
– Kenny Gross – Senior Distinguished Scientist at Oracle; master machine learning technologist with a patent per day (≈365 patents) [S2][S3].
– Kevin Zane – Speaker on AI sustainability; discusses energy-efficient AI and environmental impact.
– Participant – Unnamed audience members who asked questions during the panel.
– Bernie Alen – Founder and leader of STEM Practice Company; former head of advanced technologies market development at Oracle, now runs an Oracle-partner consultancy [S8].
– Anshumali Shrivastava – Professor at Rice University; member of the Super Intelligence team for MEDA, expert in dynamic sparsity, context windows, and efficient attention mechanisms [S9].
– Abhideep Rastogi – Representative from Tata Group (U.S.-based operations); works on AI-driven workflow automation and enterprise transformation. [S10]
Additional speakers:
– Avi – Mentioned as a participant to take a question; no further details provided.
Opening remarks & problem framing (Bernie Alen)
Bernie Alen opened by warning that the rapid, almost indiscriminate acquisition of GPU-based infrastructure is inflating AI-related costs and bypassing optimisation steps that are standard in large-scale software development. He argued that most AI projects are built on “extensive infrastructure” without asking whether the applications are “optimised” and that the race to hoard GPUs is driven by fear of being left behind [3-8][12-17][20-22]. He linked this wasteful practice to a broader environmental threat, noting that the high-heat, high-failure-rate GPU clusters demand excessive power, water and cooling, thereby “hurting the planet” [75-78][310-312].
Alen positioned his company, the STEM Practice Company, as an Oracle partner that inherits decades of optimisation expertise from Oracle’s work with the world’s largest customers. He explained that Oracle has accumulated “a collection of intellectual property… to reduce complexity of algorithms, reduce computation and therefore create better infrastructure architectures” [23-31][30]. This heritage underpins the panel’s focus on mature mathematical methods that can lower AI infrastructure costs.
Technical solutions
Dynamic sparsity & new attention math (Anshumali Shrivastava)
Anshumali Shrivastava presented data showing that the exponential growth of large-language-model (LLM) parameters far outpaces the logarithmic growth of GPU memory and compute capacity, creating a “memory wall” that will make future models slower and less accessible [95-98]. He traced the evolution from full-matrix computation to static sparsity, then to “dynamic sparsity” – a technique that retains the full parameter set but selects, for each input, only the subset of computations that are actually needed, thereby respecting scaling laws while reducing work [100-108]. Shrivastava noted that mixture-of-experts is a “band-age” that still relies on GPUs built for dense matrix multiplication [112-119]. He identified the next competitive frontier as the context window, arguing that larger windows are essential for complex, common-sense reasoning but have plateaued around one-million tokens [124-133]. To break this plateau he introduced a new attention formulation that reduces the quadratic cost of attention; on CPUs this new math outperforms GPU-based flash-attention for very large contexts [152-158][162-169][170-176].
AI MSET for sensor analytics (Kenny Gross)
Kenny Gross described the AI MSET (Multivariate Sensor-Analytics Technique) suite, which predicts hardware failures weeks in advance and therefore avoids costly downtime in data-centre environments. He reported three-order-of-magnitude energy savings and the ability to detect failure mechanisms “in days and often weeks in advance of failure” [178-184][188-194]. A concrete use-case demonstrated a 2 500-fold reduction in compute cost for anomaly detection, illustrating the massive efficiency gains possible without large GPU clusters [199-203]. Gross also highlighted that MSET’s multivariate approach dramatically lowers false-alarm rates, a critical advantage when thousands of sensors generate noisy data [429-441]. He added that the work has been documented in a few dozen publications and presented at four NVIDIA GTC conferences [??-??].
Deterministic AI & auditability (Kevin Zane)
Kevin Zane argued that probabilistic LLMs are unsuitable for safety-critical domains because they produce non-deterministic outputs and hallucinations. He advocated “deterministic AI” architectures that guarantee the same output for the same input, thereby enabling auditability and eliminating hallucinations [390-403][404-405][??-??]. Zane stressed that false alarms in human-in-the-loop systems can cause cognitive overload and catastrophic accidents, underscoring the need for provably low false-alarm probabilities [429-435][440-441].
Enterprise-focused LLM deployment (Ayush Gupta)
Ayush Gupta explained why generic ChatGPT-style models cannot meet enterprise requirements. He pointed out that such models lack access to proprietary data and cannot capture the nuanced “context” of a specific business, making them unreliable for decision-making [362-369][371-378]. Gupta quantified the cost driver as GPU-heavy inference and argued that “cheaper inference” is achievable by hosting domain-specific models in-house, which can deliver high-quality insights at a fraction of a dollar per conversation [276-284][285-286].
Adoption roadmap & governance (Abhideep Rastogi)
Abhideep Rastogi outlined a structured, multi-stage roadmap for AI adoption. The process begins with defining the business aim, proceeds through data quality assessment, architecture selection (CPU vs GPU, on-prem vs hyperscaler), pilot execution, governance implementation, and finally production-scale platformisation [324-345][346-349]. This framework embeds compliance checks for GDPR, India’s DPDP-I and forthcoming AI-Act regulations, ensuring that AI projects are legally and ethically sound [413-421].
Integration & ecosystem questions
Plug-and-play / ecosystem integration
When audience members asked whether MSET and the other techniques could be integrated with existing LLM-based pipelines, Retrieval-Augmented Generation (RAG) services, and Managed Cloud Platform (MCP) offerings, the panel responded that these methods sit at a foundational layer and can be combined with current services without major re-architecting [??-??].
Policy-maker misunderstanding & EU charger analogy
Abhideep highlighted a common policy-maker misunderstanding by citing the EU-mandated C-type charger requirement, explaining that regulation often follows industry pressure rather than technical feasibility [??-??].
AGI vs. quantum computing
A participant asked whether artificial general intelligence (AGI) will only be possible with quantum computers. Bernie answered that while the upcoming quantum-enablement centre aims to demonstrate that quantum processors could eventually provide a fraction of the energy required for GPU-based simulation [??-??][635-638].
Broader implications
Sustainability
By moving workloads from GPU clusters to CPUs, edge devices or even mobile phones, organisations can dramatically cut power, water and cooling demands [75-78][310-312]. Kenny’s MSET example reinforced this point, showing that algorithmic efficiency can replace “expensive infrastructure” while also improving reliability [178-184][199-203].
Mathematics research relevance
A student asked whether AI advances render mathematics research obsolete. Anshumali responded that a strong mathematics background remains essential for understanding LLM capabilities and for formal reasoning about AI systems [??-??].
Legal-domain hallucinations
A legal-profession participant inquired about the reliability of AI for citations and case law. Kevin and Ayush explained that while 100 % accuracy is unrealistic, deterministic and auditable pipelines can flag low-confidence outputs for expert human review, mitigating the risk of hallucinated legal references [??-??][390-403][404-405].
Education & policy on AI use
The panel discussed the need for curriculum changes that emphasise problem-solving and creativity rather than rote reliance on AI tools. Concerns about academic cheating were raised, and the speakers agreed that AI should augment learning, not replace it, requiring updated pedagogical policies [??-??][625-632][606-618].
Closing & next steps
In closing, the speakers reiterated that AI development must be framed within a “constrained world” where cost, energy and governance cannot be ignored. They called on the audience to adopt the presented optimisation techniques, engage with the STEM Practice Company for pilot projects, and consider the broader policy landscape when deploying AI [456-460][462-470]. The panel expressed differing perspectives on how quickly deterministic, hallucination-free AI can be realised and on the relative emphasis between near-term software-centric optimisations versus longer-term quantum approaches [390-404][152-158][635-638].
Can you hear me better? Is this better? Okay. So, infrastructure cost, it’s a very important topic because everybody who is trying to create something in AI, we all know that we are running into having to use extensive infrastructure, right? And mainly it is a GPU -based infrastructure architecture. And last two, three years, I think we are not stopping to ask the questions that we would normally ask. Are we creating these applications in optimized infrastructure? We are just running around getting as many GPUs as possible because we’re all afraid that the other guy would get it and then we’ll be left out, right? So, I think it’s a very important topic. So, there has been an extremely rapid adoption of AI.
and everybody wants to have an AI answer for everything. And so we are not asking the questions that we would normally ask in any project of this scale, right? So we’re going to take a look at what are the optimization methods and good mathematics that’s existed for a long time that we should bring in to optimizing and reducing the computation needs that these models create, that these AI applications create. And if you reduce the complexity and if you reduce the computation, then you don’t need to run a lot of these things on expensive, high heat generating, high failure rate, limited supply. clustered GPUs. You can run this on CPUs. You can run this on clustered CPUs.
You can run it on edge computers. You can even run it on mobile phones and laptops. There is a software optimization step that everybody is skipping, that we would normally not skip in software development. For any large -scale software development, heavy amount of infrastructure optimization goes on. But we are not doing that in deploying these AI models. So we first want to make sure that there is enough understanding of the mechanisms and the methods that are available. And a lot of this is derived from mathematics that has existed forever. So we’re going to talk about that. I’ve got a great panel over here. By the way, just to introduce my company, we are the STEM practice company.
We are an Oracle Corporation partner company. and if you think about I think most people know about the Oracle Corporation, right? I don’t need to introduce the Oracle Corporation. If you think about Oracle, they have had to create solutions and create software and create products for very large customers. They serve the largest customers on the planet. So always they’ve had to worry about optimization, performance improvement and all that because without that infrastructure cost would just be so high. So over decades there has been a collection of intellectual property, collection of ideas, collection of methods to reduce complexity of algorithms, reduce computation and therefore create better infrastructure architectures, right? So STEM Practice Company is an independent company.
We run as an Oracle partner company, but the origins of the STEM Practice Company is within the Oracle Corporation. I led advanced technologies market development for Oracle. and then we separated as a separate company and we launched as a separate company two years ago. Now we operate as an Oracle partner company. Let me introduce the team here. This is a slide that my lawyer says I should show. So just because I paid the lawyer a lot of money to make this one slide up, I’m going to show this slide. Right? Nobody knows where we are going with AI, to be honest. Nobody knows where we are going with quantum. We’re all doing the best to predict what may come, but with any prediction, use your own logic.
That’s what my lawyer wants me to say. So I’ve said it. Okay, let’s go to the next one. So this is my panel, and you may not be able to see their names on the screen. So let me start with the gentleman from the Tata Group. We are a U .S.-based company. We launched two years ago, and we just started working. with our India operations, India opportunities, and we had the great fortune to start with a Tata company. And I think they are quite happy with what we have shown because some people say, hey, if you’re not using GPUs, not using expensive infrastructure, is there a compromise? Am I introducing more latency? Am I creating less confident output?
None of that. In fact, all of that gets better. And we were able to demonstrate that with the opportunity that we got working with Tata that we were able to show that we are getting 100 % accuracy and we have not used any GPUs at all in the infrastructure that we have proposed. Right? Okay. So that is Mr. Abhideep on the back with Tata. Say hi, wave. Okay. And the gentleman next to him is a part of a STEM practice company. He is from Oracle. I did steal him from Oracle. Oracle because it was essential to at least steal some people. before Oracle gets pissed off at me. So Kenny Gross is the senior scientist at Oracle, distinguished scientist at Oracle.
He has a patent for every day of the year. So his patent count is approaching 365. So that’s Kenny Gross, a master machine learning technologist. And next to him is a professor from Rice University, Anshu. Also serving on the super intelligence team for MEDA. And Anshu is very passionate about, as a professor, right? Professors are usually passionate. My father is a professor, right? So that’s why he lives here and I live in the U .S. I live that far away from him, okay? Just because that’s the way I can deal with this passion. But very passionate about, hey, all of these methods, these methods to do things better have existed. so let’s make sure that we are bringing those methods creating awareness for those methods and he’s going to talk about how he sees what the challenges are going to be and how we already have methods to go address the challenges that are coming up by the way this panel is going to be very interesting so all of you all can start texting two or three of your friends to start showing up here so we can spread the word more and next to him is somebody you all may already know it’s one of the top successful companies that is working with the foundations of AI that is shaping up in India it’s Ayush from Genloop and he will talk about it in a very much of an Indian context because what is he has a front row seat to everything that is going on and then the last person on the panel is this who I’m most proud of because he’s my nephew, and he is in IIT.
I went to Bitspalani, so that part I’m not proud of, that he goes to IIT and doesn’t go to Bits, right? By choice, right? So, and he is in IIT Madras, and he is working closely with us and learning deeply how to build some of these very complex AI methods up front, right? Okay, so we are a small enough team here, so always feel free to interrupt, raise your hand, come up, ask questions. The goal is to educate because I think we are going very fast, and we are spending a lot, and we are creating problems, like we need more power generation, and we need more power generation rapidly, and, because of that, we are, causing harm to the planet.
So it is good. You know, all this mathematics has existed for a long time, but it’s never been productized because there’s never been a market. But now there’s a phenomenal market for this, right? Because mathematicians are poor people, right? We have a paper and a pencil most of the time, right? But now we can productize it and we can bring these solutions to the market before we end up burning the planet. Right? Okay. So let’s go to… So these are what people are going to talk about as Professor Anshumali is going to talk about the problems that are coming up and how we have the solutions, demonstrated solutions, benchmark solutions to address the problems that are coming up.
Dr. Kenny Gross is going to talk about doing a large amount of real -time AI and… stream -based AI without using any neural networks even. Not just without using GPUs, but without using neural networks and getting a very high level of accuracy, very low rate of false warnings at a tiny fraction of the cost that everybody else is spending. Okay? And then we’ll have questions for the panel and then we’ll have questions for the audience. Right? But we have no problems in making this collaborative. So sometimes a question can’t wait till the end. So just raise your hand, ask a question, and we’ll talk about it. Okay. Now I’m going to turn it over to Professor Anshu to come here and talk about how we are already ready to address
the challenges that are coming up. Right? Thank you very much. Can you guys hear this? Okay. so I’m pretty sure you must all have heard of we need AI without the cost the cost is too much right there’s never enough GPUs how many of you have heard about solutions and how many of you have heard about yeah this is an idea that will definitely go into work or at least there is some merit to these ideas I think we are going to go into that so I think the first part that we need AI without cost is kind of obvious I’m not going to rant about it though I will talk about something that motivates why the problem that you are going to see is just going to get worse so I don’t know if you can see the plots here but what are these on the x axis is the year and on the y axis here is the parameter count of the LLMs now you see kind of two interpolated straight line the green ones is the amount of memory available in the GPUs right h hundreds a hundreds so on and so forth and the red ones are the memory or the model parameter count for the demand right that’s the models like gpt3 g chart switch transformer megatron etc what do we see here that the rate of growth of hardware and by the way it’s on logarithmic scale so it’s exponential the rate of growth of hardware is nowhere close to the rate of growth of demand the other plot that you cannot see is kind of similar but it’s in compute so it’s like in teraflops or petaflops what gpus can offer and what we need to reach to certain latency this was a famous paper from berkeley that says ai memory wall and what you should expect is if you are hoping that your latency will become better with algebraic l l m that’s not happening unless there is some breakthrough okay so models will get bigger, but they will not be able to cope up with the even the GPU growth, which means models will feel slower, the better models will feel slower and unaccessible, inachievable, right?
I mean, there are many models, I’m pretty sure you cannot even run in whatever infrastructure you have, this is going to get worse. That’s what this plot kind of says that. So clearly, there is a need for what we are talking here, right? So again, a little bit on the past work, one idea that was that is very popular. And basically, I’ve been working on it since 2016. And it’s kind of now catching up as a mainstream. One idea was why do we do full computations, let’s do sparse computation, and it’s not static sparsity, it’s dynamic sparsity. What is dynamic sparsity? Well, I need all the parameters. So I’m not throwing away, I’m not going against scaling laws.
But I will pick which ones I need based on my input or dynamically. And that is called dynamic sparsity, right? So So I’ve shown you two cartoons here. The traditional model is you do all the computation, which is what GPUs were built for, right? And the argument now is, well, you don’t do all the computation. You only do what is needed. But then GPUs are not kind of quite built for. But there is a sweet spot in between. You can do block sparsity something and get things around, which is what mixture of experts is, right? So mixture of expert is now the de facto way of training large language models. So one idea is obviously there.
But remember, the fundamental kernel of GPUs were always built for full matrix multiplication. And mixture of expert was kind of a bandage that seemed to work. But obviously, we need a lot more, as we have seen. So let’s take a pause here. We all have seen the evolution of model, right? Getting a foundation model, large parameter models with large capability. Where is this all going? What is the next race? and I want to argue here that the next phase is context window why? what is a context window? everybody is familiar with what a context window of an LLM is? it’s kind of a working memory right? so let’s say if I want to solve a simple problem like 2 plus 2 equals 2 that only requires a very simple context but let’s say I want to solve an Olympiad problem so you are asking me to prove a theorem and I generate 40 intermediate theorems I need to have all the theorem in my context to go to the next theorem otherwise if I miss any of the theorem if it goes out of context I cannot prove things so more context window means I can process more information correlate across and make decisions so complex workflows will start to happen when the context window grows and that is what we have seen with GPT’s right?
GPT 3 came up small context window as the context window is growing now we know cloud code kind of works because it has like what 200k context window or something right and even then I don’t know how many of you have experienced that you have to compactify the context because you run out of context window, right? What this plot shows is on the x axis, I don’t know if you can see it, is the ear and on the y axis is the context window. What do we see here? Almost a flat plateau after a while. And by the way, that’s also experimental. 10 million context window is experimental. The closest is 1 million. That is, you can achieve and play with it.
But it has plateaued. People are not talking about 100 million context window and more. And it is very clear to people that more complicated task means more complicated context window. And we believe this is what the next race is. At least I am very much bully on that this is what the next race is. You want to do complex automation, very complex automation, right? We talk about like building agentic workflows and all that. But I believe we are underestimating how much complex automation we want to do. And we believe that we are underestimating how much complex common sense is, right? Common sense workflows requires a lot of reasoning. and it will not happen unless we have large context windows.
But large context windows are plateauing and we are talking about some of the frontier models. So let me tell you what the current problem is. The mindset is, okay, the kernel remains the same, which is full matrix multiplication. Let’s apply bandages like mixture of expert and whatever, stretch as much as we can and see where we go. That’s strategy number one. That’s probably one strategy that we know of, seem to work, but has plateaued. We’ve seen in the previous plot, it has plateaued. What I am bully on is, we have to rethink to break that plateau. Okay. And again, like I’m not going to go very technical. This is an upcoming paper in ICLR. But I want to argue there is a new math, a new way of doing attention.
Again, I’m not going to start uttering words like sharpened softmax and like, like exponentiated all that. I’m going to start uttering words like you can read the paper it’s coming in iClear this year it’s going to be presented in Brazil in this summer but what we have shown is that if you change the math of attention then there is something which gives you the same capability but in a different cost so it’s changing the math rethinking the math like dynamic sparsity right it’s some sort of a sketched way of estimating things what is interesting is we have experimented this so if you see this plot on the x axis is the context window the y axis is the latency token the time to first tokens or token per second the two red plots are the best attention mechanism flash attention 2 flash attention 3 on the best possible hardware GH200 and the green one is actually the new math on a CPU now what is interesting is if the context window is below 131000 GPUs are obviously faster which makes sense but But as I go beyond that, the CPUs dominate.
And actually, it’s not the CPU. It’s the algorithm. And the reason is context windows scale quadratically in attention. So you can throw as much hardware as you want, but you cannot beat quadratic complexity. Right? You are throwing linear number of GPUs to tackle something that goes quadratically. So something goes like 10, 10 square, 100, 100 square, and you are just doubling things. That’s not going to work. That is what this kind of plot shows. It says something fundamental. So what we are trying, and again, this is what I argue. I’m not going to bore you with the math. But what we are trying to argue is the hope is, and remember the title of the talk, how. The how part is the rethink.
We have to rethink beyond how attention is done. Because in the current race, if you have 1 ,000 GPUs, if you have 10 ,000 GPUs, you are 10x ahead of that person. but that race is plateauing because of the quadratic complexity. So yes, you will always be ahead because you have more GPUs but not very far ahead. But if we change the math, then we can actually break that plateau and I believe we can unlock capabilities of the next level. We will see automation that hopefully we expect is possible. Again, I will say parameter count and benchmark hacking. We have seen it enough. We want to now see complex tasks happening. And it is my belief, again, I am an academic, so one of the things as an academic, you get to ask hard questions and you can think about it for a very long time.
So for me, the next race is can I break the barrier of how much complex tasks we can solve with the LLMs using this context window. And I believe if we can make progress there, that’s a very tangible real progress. so can we go to 100 million contacts faster than others I think we can and with that I would stop my
So the energy savings come from the three orders of magnitude lower compute costs. We’ve done four presentations with NVIDIA GTC conferences demonstrating with real data the reduction in compute costs. And then the other aspect that I wanted to mention in terms of data centers is the prognostic for avoiding downtime in servers and chips, CPUs and GPUs. We developed and published long ago a few dozen publications on the new AI MSET. MSET 3 is capable of detecting all mechanisms that cause CPUs and GPUs to fail. In data centers days and often weeks in advance of failure. This avoids downtime. Now in prior data centers five years ago. The downtime wasn’t a big deal because if you’re just doing web serving applications or even database applications, there’s a lot of horizontal redundancy.
With the new AI workloads, though, when a company is running a five -day training run with their LLM, system board failures are very costly for that. And one spinoff of MSET for data center applications is called electronic prognostics, where we’re able to detect all mechanisms that lead to failures of chips and system boards in the data centers. And the final point that I wanted to make with that bottom bullet there is I got data. What we always tell other industries, and MSET has been used in locomotives, wind farms, all aspects of utilities. All defense aspects, land, air, sea, and space. But if any company, whatever system that you’re using now, if you have data, historian data, we welcome doing a blind bake -off with your own data.
And whatever technique you’re using now, third -party commercial technique or homegrown technique, we’ll be happy to demonstrate with your own data in a bake -off where the winning criteria for the bake -off is lowest compute cost, earliest detection of incipient anomalies in the assets, and the lowest false alarm and missed alarm probabilities. With conventional approaches, it’s the false alarms that cause a lot of losses from shutting down assets unnecessarily, revenue -generating assets, and they’re not broken. And missed alarms can be catastrophic. And in most cases, they’re not broken. Life -critical industries. they can be extra catastrophic. So that’s an overview for our AI MSEP. I’ll turn it over now.
Okay. Thank you, Kenny. So in one of the use cases where we used the MSEP method to process anomaly detection, the cost of running the use case was 1 over 2 ,500. So it’s not just a 10x reduction or a 20x reduction. You’re talking about a reduction of 2 ,500 times, right? So that’s the power of these kinds of protocols. And so certainly before you guys start implementing… And whatever AI method you’re using, whatever solution you’re going after, educate yourself on these kinds of methods that exist. Right. Feel free to reach out to us. And not everything, you know, you need to go through a massive GPU cluster to be solved. Right. OK. So we’re going to go to the panel now and ask the panel some questions.
Questions. So I’m going to start with all the all the panelists over here. Right. And we can first talk about. Things have never been this crazy. Right. I mean, I think the last two years, two and a half years, the world has kind of gone mad in some ways. Because everybody is chasing this and everybody feels a sense of great urgency to chase this. Right. So how do you all see this? How do you all see this? I mean, I think there are a lot of challenges in AI. Maybe we’ll start with you, Abby, and then we’ll come down. closer to me.
Sure. So what I’ve observed that in the recent past, if I take an example of the last two to three years, we started with the process of like Gen -H chatbot. That was a very big thing at that point. Now I can see the trend that everything is converting from Gen -H chatbot towards I would say a workflow automation where agent tech AI and agents are running on an executive level as well as on an enterprise tools where it is already executing the proper workflow which is supposed to be handled by a person or any particular code something like that. So it’s been automation which I can see in the current organization even when I talk to other clients also, they are also looking forward for these kind of things.
That’s my understanding. on this.
Kenny? Kenny, you want to comment on the same thing? What are the challenges that you’re seeing and how we are doing things now and how fast we are going? What is your prediction for what’s coming our way?
One of the early challenges with MSAT pattern recognition was getting the sensor signals out of the asset to a central location. And that challenge has been solved now for most industries and certainly for data center industries. The challenge in the early days when the two biggest locomotive manufacturers in the United States licensed MSAT, they had to bring a computer on the train to monitor the signals because there were not good techniques for offloading the signals from a locomotive. Now there are good wireless networks for bringing the sensors. And back to the data center. Thank you. we developed at Sun Microsystems computer system telemetry that picks up all the signals from all sensors and processes inside servers.
Voltages, temperatures, currents, fan speeds, in many cases vibrations are in the servers also. Thousands of variables. And we’ve made a very lightweight harness that doesn’t interfere with the customer’s compute capacity at all. It runs on the system processor, brings the telemetry out. So that challenge has been solved. And now with the latest GPU servers, there is a commercial system, Prometheus. And on December 15th, NVIDIA released freeware telemetry for all their servers and clusters. So that challenge has been solved. And we at STEM can show you how. to stream the signals from any asset, airplane engines, any asset, autonomous vehicles, into the compute box that is lightweight on CPUs, not GPUs, that gives real -time prognostics with early warning of incipient problems, not a high -low threshold.
That’s what they use now. By the time that something hits a high threshold, something is already severely wrong or the system crashed before it ever got to the threshold. We are able to detect the onset of anomalies below the noise floor. They’re in chaotic noise, and MSAT’s able to detect the onset of those. So that would be the challenge. If somebody doesn’t have sensors in their assets, they’re going to have to wait until next year’s model and put sensors in. But most assets now have lots of sensors. But do not have a good technique to… consume that data and give prognostics without having to train somebody to get a master’s degree. It works out of the box.
We hook up the sensor signals to the M -SET and get early warning enunciation of anomalies. And the energy savings is very significant because the control algorithms now have highly reliable signals going into them. M -SET’s the only technique that can disambiguate between sensor problems and problems in the assets. And so the control algorithms are using fully validated signals, and it’s much more efficient operation. And if anything starts to go wrong in the assets, you get an early warning of that.
Thank you, Kenny. Anshu, what’s your take?
So, I mean, again, as I already said, I’m very bully on long context. Let me give you an example, right? So By this time, we all know chess is easy, math is easy, programming is easy, right? I think common sense is very hard. I think common sense is very hard in short context for humans, easy in long context. So if I keep talking with you over a period of time, no matter how much I think or not, I’ll figure it out that you’re bored. It will take me some time, but I’ll figure it out. So over a long context, you need long context to figure that out. And I think machines are right now gaining context, but they are gaining it quadratically, which is what I talked about.
So I believe right now the biggest complaint in enterprises are agents do not have common sense. They hallucinate. They are not 99 % agent. They are like 50, 60 % agent. To go from 50, 60 to 99. you need that constant you are working with a human and over a period of time you figure out damn this guy needs this and that will happen when we will have really long context so I will just double down on what I just said I think it’s the next thing is efficiency and long context
very good and what do you think you are having a front row seat with everything that’s going on around here so you have not only the large context you have the relevant context too so tell me
first of all thanks for the question and very good evening to all who have joined so we are in the space of unifying the entire data universe of an enterprise and providing an agentic data analysis platform so what that means is a normal business user who so far was used to just static dashboards can come on a system and have conversations get proactive insights and do better decision making faster so the most exciting part in that context for us is how can proactive decision making and the right quality of insights help improve in enterprises, top line, bottom line, efficiencies, etc. For instance, we have increasingly seen that the need for the warehouses, data warehouses, the big data warehouses and ATL pipelines that so far were required to be maintained will go down in future because so far everything had to come to a single source of truth table from where human analysts could actually query and get insights or power these power BI dashboards.
But now with agentic analysis, when they can connect with different data sources, different modalities, not just tables or PDFs, but also like images, presentations, documents, etc. you might not have the need to create multiple replicas copies and versions of the data set the bronze table silver tables gold tables etc you might just want to connect to those native systems of records directly and get the insights required we have seen that happening with a lot of our enterprise customers that they are able to see value when the agentic analysis is able to give their business users very good insights so that is the most exciting part for me how can we have data analysis give ROI to an enterprise and the challenge for that is exactly quality and reliability so how do you make sure those insights are of quality they are not just hey the sales are down but it is more about why are the sales down what are the next steps that you can take to fix them if you have not been able to achieve your incentives in your store or your targets in the store what is going wrong what are the other stores doing that you could learn from that and then do better and the other is a reliability of insights.
Like it’s not just getting it right 1 out of 10 times. It’s getting it right 10 out of 10 times. Even with questions that are less known or unseen and unlock value. And lastly I touched on the ROI point and that is where there is synergy with what STEM is doing. In US it’s still fine to charge roughly a dollar for one kind of an insight. If I do a rough mathematic that still comes out to be a decent enough ROI. When you are paying $125 ,000 to a data analyst for same insights. Like in case you have to hire one. But in India the cost has to come down even further. Like it has to be probably 1 rupee per conversation to actually unlock the same quality of insights.
And the major cost driver is the GPU. Like how do you have cheaper inference and that is where I’m excited about what you guys are doing at STEM. Like we are hosting our own models many a times. We are also one of the companies training SLMs to power this use case. So the exciting thing about us is can we have an alternate architecture that scales and gives us a very cheap cost of inference so that we can give the same technology at a much scalable use.
Very good. And I want to, before I go to you, I want to say that that’s why I think a lot of these solutions can be perfected in India. Because India is going to throw the toughest problems at us. We’ve got to solve these at a massive scale. India has more people than anywhere else. Everybody knows that. But India also has more mobile devices than India has people. So… And you talk about sensors. tens to hundreds of thousands of sensors all coming in from a very large population and you are telling me I need to give it to you for a rupee. Right? So India is going to throw the toughest problems at us and as we…
I saw this somewhere that it built in India but for the world. So I think if we solve them then I think we have wonderful solutions for everywhere else. Right? Okay. So what excites you? And you just got into IIT Madras and you are doing well. Thank you for that. Even though you didn’t go to bits like I asked you to. So go ahead and talk about what is exciting to you and where do you see the challenges are.
I think I’ll start with the challenges in this one. The challenge I’m going to talk about is sustainability of AI because that’s something that’s grown increasingly relevant as of late and as Anshu here said. Well, the challenge I’m going to talk about now is the sustainability of AI because, well, as Anshu here said, we’re rapidly approaching a hard limit on how scalable GPU -based infrastructure is and with the very large impact on the environment, on water and power and the amount that is required to fuel these GPU server stacks, I’m excited mostly for what STEM is doing for STEM’s ability to use better algorithms to increase AI’s efficiency, increase its speed and increase all of that without having to take up massive amounts of power, massive amounts of water and damage the planet in the process.
Very good. We are taking that water and that power and the planet from you. That’s the key point, right? Not mean we are all been around, but that’s the thing. We are taking, this is very important, right? We are taking… We are… By using this expensive infrastructure and this infrastructure that creates other high costs, like I need more power and I will generate more heat and therefore I need to be cooled down and the cooling needs more power and we need all power plants and everything will break down because none of these are very reliable systems. We need to be very careful about what we are doing to the planet in doing this so fast and believing that this is the only method that is out there.
So it’s a very high responsibility and a high burden on everyone to understand these other methods that exist. These are good mathematics so that the software can reduce the hardware requirements. That’s the sustainable method that’s out there. That’s the responsible method that’s out there. Okay. Let’s go to the next question. So it’s about process. So maybe we’ll start. I’m going to start with you there, Abby, from Tata, right? So once we know what to do, how do you take an organization through that change of going from manual processes to automation and automation of decision making, right? Which is what autonomous nature comes in and artificial intelligence comes in. And we got to address what it means for people who were so scared about job loss and everything else, right?
So what is talk about the process?
So in our organization, what we do is specifically we have follow multiple stages. So if we talk about any use cases coming to us, anyone is asking that we wanted to perform certain tasks through an AI. It’s a very broad term, right? So we start with the stage zero, like what’s your aim of using AI? Is it cost reduction? Is it revenue? Or is it something that you want for customer experience? Once we finalize that, then we come into a stage one where your AI mapping to an opportunity that you have been handling that. OK, I’m interested in revenue generation, so it will be attaching to a finance department and how finance application will be useful for that.
So that’s where all stages will come into picture. Once you finalize the stage one, the next stage is what about your data? Data. That is the critical part of our journey and the transformation that where is your data is. Is your data have a quality? Is the data quality existing data lineage? And what are the sources of the data? Is it legacy data? Is it something which is cleaned or is need to be transformed into a clean data which we are looking forward into that? So that will be a big picture where the data part comes into. once you have the data all your alignment is done then next stage comes into as a period of what’s your architecture strategy under that it’s a big umbrella like first we have to under architecture you have to finalize what’s your deployment strategy are you looking for a GPU are you looking for a CPU and then what type of deployment are you planning is it on premises if it is a hyperscaler and then once you finalize the deployment then you comes into model are you looking for SLM are you looking for LLM or what other things need to be done once your stage is done then where are you going to host the model into once your architecture is finalized then your computer also will come into picture what’s your computer strategy you are looking for are you going to run on virtual CPUs or is it something that you can run in your local system also So, depend use case to use cases, right?
Once you have done that, what we prefer is to have a pilot execution where we will get to know what’s my accuracy is, what’s my ROI can be estimated and using this particular use case, how I’m going to achieve a particular target. Once this is done, your governance into coming to picture the next stage where you will be having some guardrails, and what’s your policies, if there’s any GDPR compliances are there, or if there is anything like maybe a HIPAA where your healthcare is concerned, right? Once your governance is finalized, then you are going to finalize into a platformization from a POC to a productionized to enterprise level deployment where you will be having all your sorted.
So, you have all the details which you are performing going to do. and you will be ready to go live with the AI transformation for that. So these are the stages which we usually follow. But next stage is what we internally do follow is for your employees, how you are going to learn what we did. So that is more important because in future, this will keep on coming up as a new use case or something. So you have a background so that you have a better alignment to that. So this is how we usually follow that about the transformation in our organization.
Very good. I’m going to take that and I’m going to segue into what I wanted to ask you, Ayush. He talked about governance, right? And I don’t think we have completely cracked the code. We don’t have a code on that, right? Because one of the questions we wanted to ask you was… given what you’re doing as general thinking there is. At one point, AI was synonymous to chat GPT. Outside of our technicals, if you talk to a doctor or a lawyer, they say, oh, I’m using AI. What are you using? I’m using chat GPT, right? And especially those two professions I mentioned, we’re very concerned about governance, right? So if you can talk more about that aspect of, because when you take the models to the end user, why is it not all just chat GPT?
And is it governable if you have these big, large, open source models and whatever you’re building on top of this, at the end of the day, that intelligence that’s been created, is it governable?
So chat GPT definitely has been very instrumental in democratizing AI and has become a symbol of what AI means in the new world. So I’m going to credits to them for that. But definitely for an enterprise, a chat GP does not solve majority of the problems. It could be good for some lazy tasks like email writing or some personal plannings etc. But in an enterprise when it is about taking real decisions or even when doing some actions like I’m in the customer success team and I want to create a presentation for my customer around their usage in the last month and the issues that they had and how much time we took to kind of solve them.
This is something that cannot be done on chat GPT. For two reasons one, it does not know your enterprise data. You cannot connect all your know -hows to systems like open AI because somewhere or the other like open AI, entropics they are all tracking what kind of activity is happening on top of their APIs and then planning what would be the next expansion as an application. We’ve seen that with cursor use case like coding use case transitioning into a codex from open AI and a cloud code from entropics. So what is that you never connect your enterprise data because of the compliances, privacies and tying back to the data governance aspect of it. Second is the context.
Now what separates a steel company in US, Texas versus a steel company in another region in US or any one company from other even in the same vertical is the context is how is the culture of doing business, what are the KPIs, how are the processes set up, what actions do they take to actually doing an RCA, what are the decision making activities. That is basically the core of the business. That core of the business is not known to systems like chat GPT or clouds for two reasons again. One, they don’t know that process. The data is not explained. They don’t know how to do it. They don’t know how to do it. They don’t know how to do it.
They don’t know how to do it. Second, they are very general APIs, stateless APIs that will never be able to understand those nuances without learning. So those are the things that, you know, become the reality of enterprises and those are the things that, you know, chat GPT’s are not solving the real enterprise problem because of the context and, you know, the understanding of the business itself.
Very good. So, leading from that, Kevin, what I would ask you is what he said was, you know, a large enterprise context needs to be understood by open source models and there’s a responsible way to do that, right? You cannot just release all enterprise information to the public. But he also said that we need to have things like root cause analysis that needs to be done, which leads to deterministic AI, right? So if you talk about deterministic AI and also talk about the sovereignty aspect that he talked about. Which is that we need to create. We may be using public domain models where it makes sense, but we need to do it in such a way that the data is completely sovereign.
Go ahead. Talk about it.
See, deterministic AI is a solution to a very specific problem with most modern large language models, which is that they’re quintessentially probabilistic. You can give a chat GPT a prompt twice and you will get a different result. Chat GPT also has the capability to just make stuff up. And it is not bound to fact. It is not bound to a stringent set of rules. And the issue with that is that it’s great. It’s great when you want to generate a picture of a cat on the Eiffel Tower or write a Shakespearean ballad. But if you need to apply it in production. Production content. then hallucinations and false data is not something that you can afford to have in those kinds of situations, say cyber security or the medical field and that’s the very specific problem that we use deterministic AI to solve, right.
It’s at its core it’s an architectural response to this problem, we don’t eliminate machine learning entirely, we just bind it within a very set system and a set of rules, right. Objective isn’t like open ended generation but controlled and audible execution. So generally I would say there’s a few principles, very core principles this sort of approach, right. Your system has to be predictable as in your responses must give the same output for the same input. Right. Because that directly leads to auditability. Which is a very difficult thing to do. Maintaining intelligence.
once before. We are all playing around with creating intelligence, but truly it’s been done once before. Whatever faith you all believe in, it’s been done once before. And what were we all told? You know, you have your free will to go you know, so it’s, once you created intelligence, putting it in a box is a very, very difficult thing to do. Right? So, but then if you cannot put it in a box, how can you have a governance function? At some point it’s going to say something that’s going to embarrass your customer how can you have a governance function? Some thoughts?
So there are a couple of rules we need to apply. That’s what I can think of at this point, like there are a couple of rules in terms of GDPR, DPDPI is coming into picture for India specifically and that’s where we follow those rules, that if that is compatible, we apply those. If not, then we may have to think about from the policy … on the other side of the company, right? If the word will be applicable or not. There are a couple of scenarios where your PII data is, to be very frank, it doesn’t matter in India much. But if you’re part of in US or somewhere else, it does matter. So we have to take care of those scenarios when we are implementing.
So at our organization, we have to make sure that we are following all the compatible policies, making sure all the guardrails are in place. So that’s where it’s my understanding.
Kenny, any thoughts on governance? I know that you deal with sensor data, which comes from measured things, not made up stuff for the most part, unless the sensor itself is showing some biology, right? When the sensor misbehaves, what do we call it? We call it sensor biology. You see how we blame the human race for that, but anyway. So from that point of view of governance, you live in a, I think, less complex space than people who are making user content, right? But what are your thoughts on governance?
One of the biggest challenges for governance is for applications where there is human -in -the -loop supervisory control of complex processes and systems. And this challenge, and it’s turned into this year the biggest challenge for defense AI, called situational awareness. With situational awareness, you can have a highly trained human operating a ship or an airplane, and if there are false alarms in the process. And that’s a problem. We talked about with the chat GPT, the hallucinations. And so forth. In physical systems, it’s false alarms on sensors. And I keep going back to the false alarm rate because the number of sensors, if you go from six sensors to 600 sensors, the probability of false alarms multiplies up with that to 50 ,000 sensors.
And so you have a pilot of an airplane has been highly trained for every situation. And when they test the pilots to give them their pilot’s license in the big simulators, they throw in a second problem when the pilot’s dealing with the first problem. The challenge from false alarms is you can have the most highly trained human. And if red lights are going off at different places from false alarms, the human gets to the point of cognitive overload. stupid mistakes and this is long before any hallucinations out of AI and just one example of that that I’m not talking out of school and giving away secret information the US Navy in the last five years has had three spectacular accidents in broad daylight with the latest instrumentation on on ships where they would run into a big oil barge or a fizzing fishing vessel hundred million dollar accidents and some of them are resulting in loss of lives well the human bridge watchers they’re called they’re in a sophisticated control room that if you imagine the cockpit of a 777 multiply that by a hundred you have highly trained humans watching all these signals and and if too many things are happening and if too many false alarms are happening the human gets mixed up, gets to cognitive overload.
We’ve published a half a dozen papers in the international cognitive science conferences around the world and demonstrated how MSET is able to eliminate that process for monitoring complex processes where a human has to make decisions. And the one technical point I’ll make, and this is in a lot of our journal articles, MSET has the lowest mathematically possible false alarm and missed alarm probability for
So Anshu, so we’ve collected all the requirements, right, in this conversation. Now we’re going to give them all to you to actually solve them, right? So because we’ve said that there’s a step -by -step process to doing all of this stuff, that a sensor explosion, right, that is sovereignty and RCA type requirements and there is user content and data explosion, all of this stuff. Finally, when we map it all to where is the compute to go do all this, because a lot of these algorithms are complex algorithms, right? And where is the compute to go do all this? So current methods are not taking us there, right?
So I’ll just add one thing here. Look, I mean, if you look at the progression of AI, everything is still one of the most powerful method in the humankind. It’s trial and error. Right? How do I know that prompt engineering works? I mean, if anybody has worked on prompt engineering, you keep trying and at some point of time you suddenly see it solves 80 % of the problem. That’s a good prompt and then you hill climb from there. The whole AI is about right now we are dealing with a new entity, a new species. We are trying to co -live with them and we don’t understand them. it’s not very different it’s just like my brain right sometimes it works maybe on tuesday it doesn’t work because of whatever my schedule is but i have learned over a period of time to live with it i think we are asking some very important question about governance guardrails all of that we will i think solve a lot of them with trial and error but the most important thing is trial and error should be regretless if to do a million tries i’m burning like hundreds of millions of dollars i would be careful so i will still say the biggest hurdle in the advancement is the ease at which i can trial and error and experiment with them and the ease is directly proportional to how much energy we are burning how much money we are paying imagine if compute was free imagine i give you the best model and i give you as many queries as you want and now imagine the hardest problem you are facing governance, accuracy I am pretty sure if you sit down and hill climb make 10 agents, let them talk with each other cloud bot, figure out some strategies you go on a dinner, maybe sleep overnight and these guys keep talking, all of them the most expensive model running at the highest possible latency I think you will make remarkable progress but you won’t be allowed to do that and that is why I will come back again and this is why this panel to me is very important because everything at the end of the day boils down to efficiency it’s like raising the tide because it raises all these boats all interesting problems will be solved if you are allowed enough trial and error that’s what my belief is
and that’s the thing the title of the program the title of the panel is Constrain the World right We can’t just all mint money. I’ve tried that. It doesn’t work. You know? So, and it’s a constrained world, right? So, how do we solve this problem? This is the largest conference probably ever. This is not a conference. This is AI Olympics. Okay? Largest conference ever. People are talking about like 700 ,000 people. This is the kind of scale we need to solve. Right? Think about it in the AI space. Every day in an AI space, it’s going to be this busy and this heavy and this crowded and this much of data, etc., etc. We can’t be throwing expensive infrastructure all the time to solve the problem.
We got to get better. We got to understand all of these other methods exist and implement those methods and have sustainable. Sustainable AI. Right? So, questions from the panel? Everybody? there’s enough time for all of you to ask at least one question. How about that? There’s a lot of time, so ask a question. What are some of the some, or you just have an opinion, just have an input. That is fine. Go for it. Who wants to go first?
There is this trend of, you know, AI will solve everything. It’s coming into the picture. And you talked about hallucination, and I see a lot of the engineering meaning, whether it is automotive, ship, aircraft, naval, it’s not always, solution is not always probabilistic. You know, it is also binary. You know, sensors give zero or one, so you need to decide. So, applying this in the real… lot of it in the real engineering world wherein we have to be deterministic to be safe you know you said MSAT could solve all this problem but if you could demystify MSAT for me that would be you know great
oh yes the best way to demystify MSAT in the way that it works is the conventional approach for monitoring signals from an asset let’s say from an automobile or a locomotive is to monitor put high low limits on each variable if the engine gets too hot a red light comes on the dashboard if the fan gets a bad bearing in it and it doesn’t go fast enough that will cause a problem and The coolant can get too hot. Pressures get too high. RPMs get too low. This has been the conventional approach for decades, high -low limits on thresholds. The problem that will never go away with putting high -low limits on individual signals, it’s called univariate monitoring, is when you’re monitoring noisy physics processes, if you want to get an earlier warning about a small developing problem, you reduce the thresholds.
But then spurious data values will trip the thresholds, and you’re shutting down a locomotive in the middle of Kansas. It’s got a bunch of cattle on the back, and they send the repair people. Oh, there wasn’t anything wrong with it. It’s a false alarm. And so the industries and manufacturing industries… It’s very expensive to shut down a manufacturing industry from the assets. But, oh, sorry, it was just a false alarm. And people who take their car in on a Saturday because of the red light, oh, there wasn’t really anything wrong. That’s the good news. You should be happy. It was just a false alarm. So to avoid the false alarms, they raise the thresholds. When now the system can be severely degraded before you get any alarm, and it’s in no way predictive.
So let me say. High -low thresholds are reactive. And so MSAT works fundamentally differently. It learns the patterns of correlation between and among all the signals. Some signals go up and down in unison. Some go up when others go down. It learns those patterns. And it detects an anomaly in the pattern days and often weeks before you’ll ever get near a threshold. So that’s the fundamental difference. So in. And.
do you play music? What do you play? Can you hear me? Okay. Do you play music? Huh? And you play chords? So it’s simple, right? When you’re playing chords and let’s say if you’re like me and you do a bad job at it, any untrained musician can even tell that I’m doing a bad job at it. Why? Those things need to go together. Independent notes? Maybe not. You cannot figure it out. But if you’re playing chords, anybody can say that guy sucks, right? So, same way. Understand, looking at the variations of a single variable can only take you down. But the multivariate part of MSAT where it looks at a joint number of sensors in one way, you can figure things out that are starting to go bad.
Misery loves company. Have you heard that? Similarly, anomalies don’t like to be alone. There’s always, they’re hiding amongst other anomalies. right?
Okay. Yeah. So I came in by accident but was really interested to hear what’s being discussed, especially MSET and the power of reducing the kind of compute and also translating into the CPU. It really was music to my ear. I have a question extended. What happens to the current ecosystem where plug and play and interoperability across the entire data engineering and RAG and MCPs and all is there? Is it possible to plug and play this thing? What I understand MSET is at a foundational and fundamental layer. So how does it merges with the current set of LLMs and services?
So we have to look at the problems we are trying to solve. Okay. And how do we build the correct architecture for that? The quick answer to that question is absolutely. I think we have… through sensor augmentation, we are augmenting a big field with sensors and we’ll certainly bring that and we’ll run that into techniques like the multivariate technique to come up with anomaly detection and predictive maintenance and etc. But after that, if you want to have a control system that is going and deploying the decision making, there will be other MCP based solutions that we develop. So it does all integrate. That’s the reason we have to closely look at the problem and make sure that for not all problems we are starting with and downloaded large language model.
I just have a follow up. So are there any open ecosystem where we can go and see and plug it into our current infrastructure on GenII services?
Yes, because remember, STEM practice company is an Oracle Corporation partner. So there is a lot that we do in the open source community and open integration. We can certainly spend time with you to educate you on all of that stuff. Any other questions? Hello everyone. So what is the most critical risk that policymakers, business, businesses and users are currently understanding about AI? Avi, you want to take that and then Anshu, you can go.
So it depends on the use case to use case, first of all, and at this point, country to country. Like if you think about ESG. You act is one of the first act has been released across the world. Similarly for data point of view DPDPI is coming into India so we have been following up all the policies which is being implemented and we are also thinking ahead of the time that okay this AI act will be soon coming into picture in India as well as in other countries also what are the things we need to make sure that it’s being following up properly that’s what we follow as a process
I mean if I am correct the question is what is the misunderstanding that policy makers make about AI sorry
No sir it’s not a misunderstanding for the policy maker actually DPDPI act is in 2023 also but after some long time it’s not enforcement and implemented on the current situation because some IT laws, I know this is existence in our country, but they are not sufficient for the upcoming and in future and presently cyber related crimes. And totally the AI loopholes because we can’t generate the accuracy of laws with only with the IT laws especially in old IT laws. DPDP, okay, it’s enacted but what is the enforcement date and when they come?
So the process basically in my understanding that it need to be forced by the government as an industry and need to come together government and industries as well as all the private entities need to come together. To make sure that is being forced upon. like if you think about I will give you an example very simple example right so if you think about iPhone chargers right it have a separate cable for lightning cable what we call but because of the EU and using by US policies maker they have mandate that it need to be C type charger right so these are the forces coming from the higher side so it need to be followed by that process but it does matter that when an organization start implementing those in a first way when government is releasing something definitely it can be followed up
hello good afternoon to all of you sir I am a master’s student in mathematics and I want to research in mathematics so as I seen there are advancement in AI so math is also integrated to machine learning so as I work on a project like the cancer detection technique 70 % are used for AI like neural network something like that so it’s research is also relevant in which direction are going on so research is also worth it in mathematics or something like that
by the way I am a math major so I think understanding math even though AI can solve math our understanding of math is very important to some extent understand AI right the closest we have come to understand AI is with formal reasoning so math is always a good background so we are doing research in fundamental understanding of what are the capabilities of LLM and reasoning about LLM and reasoning about it with your formal background is a very good research
greetings to all the panelists here I apologize I wasn’t here before so I couldn’t hear the conversation but as I can see the questions here I have one question that as far it relates with hallucinations as a legal background person I get often I just get the citations wrong and the case laws often get wrong so how far we can rely on the AI currently I know that the hallucination will evolve and the problem will be resolved eventually but at the current timeline how much can we rely on AI system and if it possible that in future that AI not like hallucinate it ever be a hallucinate free AI forever I hope I can make my question understood
okay yeah we go there, I just wanted Kevin to get that slide up. So we just, this is healthcare use case that we had the fortune of working with Tata on and we released it a few months ago and here we have 100 % accuracy. It’s not future. It’s now. We just do it differently. Right? And non -hallucinating methods are completely possible. With that, I’m going to let Ayush and Anshu address that topic as subject experts, but it’s not future. Demand for it first, because you are in a profession where you come up with some nonsense, the judge is going to throw you out of the room. Right? So you don’t have that luxury. Or in a doctor, he can end up much worse.
Right? So demand that first, but the solutions are here today. Right.
So thanks for the question, first of all. You know, to err is human. To err more is AI. so errors can always be there now what are the scopes of errors and how to reduce them so one you should have a proper understanding of things like the system should know about your context all things there then the thinking process should be auditable what are the sub steps that have been taken that should be auditable so as a user as a responsible user I can always see what are the reasons it got to that answer maybe it made a mistake in one of his thinking processes then accuracy like it’s very difficult to have a probabilistic system be 100 % accurate but it can still be 100 % reliable so maybe it is 95 % accurate but the 5 % times it is wrong we are able to tell 100 % of the times that this is probably wrong you need to double check or you need to have the expert involved in auditing this answer so 100 % reliability is definitely achievable we just need the right processes and thinkings and validations in place to make sure we can really trust the answer because it is really critical to take actions on
So, Anshu, if you can address some of the fundamentals about why these hallucinations happen and why domain -specific training avoids that.
So, let’s think about hallucination. So, prior system were non -hallucinating system and they were like search. By the way, humans hallucinate. If I ask two people to tell about the exact same incident, making them sit in different room, they will have different explanations. Right? So, human mind is fundamentally a hallucinating mind. In fact, LLMs, when they became LLMs is because we focused on prompt completion. And prompt completion comes from psychology, where psychology is our mind has a tendency to fill. And that is how you come with prompt completion and go beyond search. So, search is non -hallucinating and LLMs has to be hallucinating because it has to be intelligent and smart. So, again. Again, like this goes back to what Bernie was saying, biology.
Right? So, if you are like humans, you are like humans. And that also becomes to the answer, how do we increase the reliability on humans? Well, you train them, right? And you rely not just on one, but a multitude committee of experts. And then you do debates and discussions. You have multiple LLMs that debate with each other, right? These are the standard way. In fact, you can also mathematically, right? We have a student who is mathematics. You can mathematically show that if I have a way to reduce the probability of something by delta, then I can run that process in a cycle and keep reducing the probability of hallucination and reach near perfect hallucination free stuff.
But again, coming back to it, you have to do a lot of LLMs. That’s a lot of cost. Barrier is again the cost. Sorry.
Wonderful. Any other question? That side of the room. Okay. Any side of the room?
Hi everyone I am working in IT company and I should be loud hello everyone am I audible now clear and loud that’s the only tone I have I don’t know how to be loud ok is it fine now yeah I can’t be more louder ok so my question is yes AI and any technology you know as and when we grow it helps majority right AI is solving a problem and it will in future is going to solve a lot of problems giving us industrial solutions speeding up our you know software solutions that we are currently working on and I think that’s the main even helping us in a variety of the areas. My question is, for the students who are in a school, right, we do have chat, GPT, HMNI, all the AI tools there.
And so it’s very easy for the students who are in a school. You know, they can do their assignments in a minute or in a few seconds. So how is it helping the students? Do we have any, I don’t know whether it’s a correct question or not, but are there any steps taken by the government or taken by the, you know, great leaders of our country all over the world? How the students’ mindset, you know, we can, are there any, you know, obligations if we are applying to our students to not use such a tool? Because it’s free, of course, and it’s available, right, over the Internet. They can do their assignments in a minute.
So it’s, I don’t know. I think as for me, it’s basically for the, you know, college students, for the industry. For the employees, it’s helping us. But how is it helping the students? Because there has been no academic changes done by, I don’t know whether the school are doing any curriculum changes in their syllabus or not. So, yeah, any thoughts on that?
So, did you guys get the question? Because it’s a profound and important and deep question. Are we screwing up the children is what she’s asking, right? By allowing them to quickly come up with anything. So, love to hear your take. Why don’t you go first?
tasks from AI, otherwise you know, the same kind of journey we’ve had with calculators. Everyone knew how to multiply, divide, do many numbers until they started using calculator and now even for simple additions you go to the calculator. So one, it’s on personally us how much we start delegating to AI and lose touch of it. Then on second, all these educators, the pedagogies that form around the use of AI for education, the careers that start forming into it, they will themselves metamorphosize into what AI means in education space.
I mean this is a question that every university is asking and I think it’s a, as you said a profound question and I think the partial answer has already been said by Ayush, right? There are certain I mean skill set, right? If I want you to know addition, subtraction, you should not use calculators. But once you have gotten a basic feeling of that you the problem is not about using calculator the problem is what you can do with that calculator right so problem solving never goes away right you see what I am saying so imagine AI makes everybody 10x better then 10x better is the average and we will now aspire for something more so whatever is average is what AI can do you see what I am saying and going beyond that will require ingenuity creativity so I agree that education system need to transform and we are also learning as we go as to how we transform it but the goal is will always be can we solve problems that we cannot solve otherwise and that will require us to always think out of the box and so that is that will come I think it’s still an early stage but I think a lot of people are thinking about it talking about it and as I said it will start getting th
at’s a very profound question I have an 8 year old so I worry about that every day right but I hope I’m doing the right thing by letting him play with whatever AI he wants to play with but we are 10 minutes over time I’m told so I need to apologize for the next session but go ahead you had a question will you make that the last question good evening to everyone my que
stion is related to AGI because we have we are using AI right now so thinking of the next step I was thinking that is there any relation between AGI and quantum computer or is it like that AGI will be only possible after quantum computer or with the current processors
that’s a wonderful question but it’s a 2 hour topic question and I don’t know why you waited the last minute to ask that question we are launching our first quantum enablement center as a stem practice company in 2 sites in Chattanooga Tennessee and we hope that we can start and we hope that we can start launching quantum computers launching quantum computers and we hope that we can start and we hope that we can start and we hope that we can start in India as well, because we are thinking through specific problems that we can solve with quantum computers today and get it over there, but that is such a big topic, and we can’t, I think thinking about reducing competition needs, thinking about reducing cost is the way we are doing, but quantum computers don’t use a fraction of the energy compared to a similar GPU simulated machine that tries to simulate quantum processing, right?
So yes, there’s a lot of energy advantage to going that route, but that’s a very deep and very profound topic. Thanks for bringing that up, because we didn’t bring quantum up, but that’s a broad topic. With that, we have to close, because apparently we are stealing time from the next panel now, which is a difficult thing to do. I see a hand, I’ll just talk to you one -on -one outside, but thank you everybody for coming, I thank the panel. I think you guys got it and so by the way we are in we have a stall a booth whatever we call it it’s hall 6 stall 100 easy to remember 6 100 please come there and get our material and then get connected and we can keep the conversation going thank you very much thank you
The speaker mentions the need for GPU infrastructure and the high costs associated with it.
EventThis comment reframes the entire AI infrastructure discussion by suggesting the industry has abandoned fundamental engineering principles due to AI hype. It challenges the assumption that expensive GP…
EventCurrently, it is still a significant amount that is being incurred upon. So the cost optimization, a larger chunk of cost optimization comes from the infrastructure as a whole. So about 80 -90 % of th…
EventAnd actually, it’s not the CPU. It’s the algorithm. And the reason is context windows scale quadratically in attention. So you can throw as much hardware as you want, but you cannot beat quadratic com…
Event_reporting3. **Processing Architecture Shift**: The transition from CPU-based to GPU-based computing, fundamentally altering how computational power is organized and accessed. Jovan Kurbalija: on digitalizatio…
EventInfrastructure | Development | Economic Ioanna Ntinou described a practical technique where a large, accurate model teaches a smaller model to achieve similar performance with significantly fewer par…
EventAntonia Gawel:I mean, I think very much a focus on decarbonization of the power sector is a critical input and a significant part of the footprint. So working together to ensure that grids around the …
EventData management improvements are not without dangers for data governance. While AI algorithmscan identify and mitigate biases in data, they can also inadvertently introduce or amplify biases.Data secu…
TopicAccelerating AI adoptionis exposingclear weaknesses in corporate AI governance. Research shows that while most organisations claim to have oversight processes, only a small minority describe them as m…
UpdatesNot because that software is insecure, but because the security of software is often about how software is designed, how it’s implemented, and what capabilities it inherently has. So deploying softwar…
EventThe digital transformation increasingly contributes to greenhouse gas (GHG) emissions. For example, generative artificial intelligence (AI) applications consume energy at different stages of their lif…
EventThe educational implications are immediate and severe. Teachers and students are increasingly relying on AI to perform cognitive tasks that are essential for learning and development. This has forced …
EventSociocultural | Human rights Tracey expresses concern that over-reliance on AI for decision-making and problem-solving will lead to atrophy of human critical thinking abilities. She observes this tre…
EventHowever, it is important to note that there is a potential risk associated with the use of such systems, as they may produce hallucinations and false information due to a lack of control over the mode…
EventThe panel reached consensus on the need for fundamental educational reform to prepare students for an AI-integrated future. Traditional models of knowledge acquisition are becoming obsolete as informa…
EventEducational institutions need to adapt curricula to emphasize critical thinking, question-asking, and evaluation skills over information retention
EventAasheim’s argument is supported by the statement that the planet is metaphorically “on fire,” highlighting the severity of the situation. Her sentiment throughout the discussion is primarily negative,…
EventAbsolutely, Ankit, just trying to, this is something which I know two years back when we said that I’m putting 8000 GPUs, everybody started laughing. Because we were starting with the base when India …
Event_reportingIt looks like the slides are not there. There’s a certain, turning on the screen. There it goes. I will say that while we wait, I’ll say that I really like the metaphor that you had, Uday, about two h…
Event_reportingAI is increasingly recognised for its transformative potential and growing environmental footprint across industries. The development and deployment of large-scaleAImodels require vast computational r…
UpdatesWai Sit Si Thou: Just to double-check whether you can see my screen and hear me well. Yes. Yes. Okay, perfect. So my sharing will be based on this UNTACF flagship publication that was just released tw…
EventThe overall tone was optimistic and solution-oriented, with speakers focusing on practical ways to overcome obstacles through collaboration, policy changes, and capacity building. As the region moves …
EventThe discussion maintained a consistently professional and collaborative tone throughout. It began with formal introductions and technical explanations, evolved into an enthusiastic presentation of pra…
EventThe tone of the discussion was largely serious and concerned, given the gravity of the issues being discussed. However, there were also notes of optimism, especially towards the end, as speakers empha…
EventThis comment reframes the entire discussion by suggesting that the solution isn’t to create new governance structures but to revitalize existing ones. It shifts from a problem-focused to a solution-fo…
EventThe tone throughout the discussion was consistently formal, collaborative, and optimistic. It maintained a celebratory yet professional atmosphere, with speakers expressing gratitude for the collabora…
EventDu Guimei: distinguished guests and friends from around the world. I’m the principal from Tsinghua University Primary School. I would like to start from a passage from the UN 52 years ago entitled We …
EventJonathan Cave: Thank you very much, Favre, can I be heard? It asks if I want to unmute, okay, that’s fine. Okay, yes, on the issue of resilience, because the IoT and the internet beyond it are large c…
EventThe tone is consistently optimistic, motivational, and action-oriented throughout. The speaker maintains an enthusiastic and inclusive approach, emphasizing collective effort and shared responsibility…
EventThe tone throughout the discussion was consistently optimistic and solution-oriented. All presenters maintained a professional, confident demeanor while discussing serious societal challenges. The ton…
EventThe overall tone was one of urgency and determination. Many speakers emphasized that “the future starts now” and stressed the need for immediate action rather than just words. While acknowledging the …
EventThe tone was collaborative and solution-oriented throughout, with participants acknowledging both the urgency and complexity of the challenges. Speakers maintained a pragmatic optimism, recognizing si…
EventThe speakers advocate for proactive action to actively impact people’s lives and empower individuals. They reflect on the actions taken by the United Nations Conference on Trade and Development (UNCTA…
Event“GPU‑based infrastructure creates expensive, high‑heat, high‑failure‑rate systems with limited supply”
The knowledge base explicitly describes GPU-based infrastructure as expensive, generating high heat, having a high failure rate and limited supply [S12].
“The wasteful practice of hoarding GPUs is linked to a broader environmental threat, as high‑heat, high‑failure‑rate GPU clusters demand excessive power, water and cooling, hurting the planet”
Additional sources highlight the environmental impact of large AI compute: extensive electricity use and cooling requirements [S14] and the heavy water and energy consumption of data centres [S95].
“Both speakers critique current GPU‑centric approaches, with Bernie advocating moving away from this model”
The knowledge base notes that both speakers criticize GPU-centric AI development and that Bernie Alen advocates moving away from it [S1].
The discussion reveals strong convergence on three pillars: (1) the necessity of algorithmic/software optimisation to curb GPU‑centric compute costs; (2) the urgent environmental sustainability challenge posed by current AI hardware scaling; (3) the requirement for robust governance, auditability and data‑sovereignty frameworks. Additional consensus appears around the transformative potential of low‑compute technologies such as MSET and the educational implications of AI adoption.
High consensus on efficiency, sustainability and governance, indicating that participants broadly agree these are the critical levers for responsible AI development. The agreement provides a solid foundation for policy recommendations that prioritise algorithmic innovation, green AI practices, and strong regulatory safeguards.
The panel converged on the urgency of reducing AI compute costs and environmental impact, but diverged on how to achieve reliable, non‑hallucinating outcomes and on the preferred technological roadmap—quantum hardware versus algorithmic/software innovations. Governance priorities also differed, with some emphasizing data sovereignty and others deterministic auditability.
Moderate – while there is broad consensus on the problem (excessive GPU‑centric compute, sustainability, need for governance), the differing technical visions and contrasting views on hallucination mitigation create notable tension that could hinder coordinated policy or industry action.
The discussion was driven by a series of pivotal comments that moved the conversation from a generic concern about AI cost to a deep technical and ethical analysis of scalability, reliability, and sustainability. Bernie Alen’s opening remark framed the problem, while Anshumali’s data‑driven exposition of hardware limits and the context‑window bottleneck introduced a clear technical challenge. Kenny Gross’s 2,500× compute‑reduction example and Kevin Zane’s deterministic AI principle offered concrete, solution‑oriented counterpoints, prompting the panel to explore practical implementations (MSET, agentic analytics) and governance implications. Subsequent remarks on hallucinations, energy impact, and enterprise value reinforced the central theme: without innovative algorithmic and architectural changes, the AI boom is unsustainable. These key insights shaped the flow, steering the dialogue toward actionable research directions and responsible deployment strategies.
Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.
Related event

