Building Trustworthy AI: Foundations and Practical Pathways

Contents

Executive Summary

This talk explores the emergence of general-purpose AI software and its profound societal implications, arguing that while AI capabilities enable unprecedented productivity gains, they simultaneously destabilize existing economic models and create novel, context-specific risks. The speakers introduce Astra, a comprehensive AI safety risk database grounded in the Indian context, demonstrating that global AI safety frameworks fail to account for region-specific challenges like linguistic diversity, connectivity constraints, and scale considerations unique to India.

Key Takeaways

Generative AI is a platform shift with systemic consequences: It's not merely incremental progress but a fundamental change in how software works, with ripple effects across entire industries (content, design, web infrastructure). Policymakers must recognize this as infrastructure-level disruption, not just a new tool.
Trustworthiness requires context-specific risk frameworks: One-size-fits-all international AI safety standards will fail. India (and other non-Western markets) must develop localized risk taxonomies accounting for linguistic diversity, connectivity, regulatory context, and economic realities.
The alignment problem is fundamentally about language: The ease of natural language interfaces obscures the underlying problem: human language is ambiguous, and AI systems will optimize for literal compliance rather than intent. This is a design problem, not just a training problem.
Infrastructure destruction is a real near-term harm: The economic model supporting much of the internet and open-source software is collapsing now (not hypothetically). This creates a second-order crisis: the systems training future AI lose their data sources.
Measurement and accountability gaps exist: Frontier risks and long-term harms lack established quantification methods. Until we can measure the probability and severity of risks like job displacement or cognitive decline, mitigation strategies will remain reactive and ad hoc.

Key Topics Covered

General-purpose computing paradigm: The historical evolution from task-specific machines to unified hardware and software platforms
Economic disruption: Collapse of web design, content creation, and advertising-dependent internet business models due to generative AI
The alignment problem: Challenges in ensuring AI systems execute user intent rather than technically correct but harmful interpretations
AI safety taxonomy: Comprehensive framework for identifying, categorizing, and contextualizing AI risks
Indian-specific AI challenges: Linguistic diversity, connectivity issues, scale, and sector-specific deployment contexts (education, financial lending, agriculture)
Risk lifecycle and mitigation: How risks manifest across development, deployment, and usage phases
Infrastructure and exclusion risks: How connectivity gaps and technical constraints create unique failure modes in developing markets
Frontier vs. social risks: The distinction between quantifiable and emergent, difficult-to-measure risks

Key Points & Insights

General software as a paradigm shift: The speaker argues that moving from general-purpose hardware (achieved in the 1940s-2000s) to general-purpose software (current AI era) will be as revolutionary as the original computing revolution, but the transition risks are underexplored.
Band-aid solutions are insufficient: Current AI vendors (OpenAI, Google, DeepSeek) are applying tactical fixes to specific failures rather than addressing underlying architectural problems. This "mugging up answers" approach doesn't constitute genuine learning or trustworthiness.
Internet infrastructure collapse via displacement: The click-through rate for websites previously ranked in Google's top results has dropped from 1-in-6 to 1-in-500 in the past year because users now query LLMs directly, destroying the economic model supporting vast portions of the web and open-source infrastructure that trained these systems.
Natural language ambiguity as a foundational problem: Programming languages exist precisely because human language is ambiguous and context-dependent (e.g., "The teacher told the student he was going to the fair" — who?). Using natural language as the primary AI interface reintroduces this ambiguity at scale, enabling misalignment.
Risk definition requires contextual grounding: Risk cannot be universally defined—it must account for deployment context. The same AI system poses different risks in an educational setting vs. a financial lending context vs. a rural agricultural setting with poor connectivity.
India's unique technological vulnerabilities: Large-scale systems (Aadhaar, UPI, EVMs) and linguistic/connectivity diversity create distinct failure modes absent in Western-centric AI safety frameworks. Astra addresses this "contextual blindness" in international repositories.
Social vs. frontier risks require different strategies: Social risks (linguistic bias, toxicity, hallucination) are observable and quantifiable; frontier risks (job displacement, power-seeking behavior, cognitive decline) are speculative but potentially catastrophic and difficult to measure, requiring distinct mitigation approaches.
Mitigation creates new tradeoffs: Strong safety measures often reduce system utility for end users, creating a contextual optimization problem. A mitigation that's too restrictive may cause users to bypass safety systems or abandon the technology.
Risk manifestation stages matter: Identical risks have different causes and solutions depending on whether they occur in development (biased training data), deployment (language mismatch), or usage (user manipulation). Responsibility assignment varies accordingly.
Empirical grounding is missing: Current AI safety work lacks rigorous data on actual risk probabilities in real-world contexts, particularly outside Western markets. Astra is a first step, but expansion to agriculture, healthcare, and other sectors is critical.

Notable Quotes or Statements

"We can't run life on band-aids. Band-aids is what? Band-aids is students mugging up one answer before the exam so they get the marks for it. That's not real learning by definition."

On the insufficiency of tactical fixes in AI systems.

"Economics is the study of the allocation of resources under conditions of scarcity. What if it's not scarce? There's no economics of air."

On how generative AI disrupts economic models built on scarcity.

"The teacher told the student that he was going to the fair. Who's going to the fair? The teacher or the student?"

On fundamental ambiguity in natural language as a justification for why programming languages exist.

"One formula fits all kind of a narrative does not work in AI safety."

On the necessity of context-specific risk frameworks over universal standards.

"Risk is not just one term... you also have to look at what is the intent behind it."

On the multidimensional nature of AI risk assessment.

"When we are deploying it in context, it's our job to take into consideration that our connectivity might be poor."

On the responsibility of deployers in India to account for infrastructure constraints.

Speakers & Organizations Mentioned

Debayan Gupta (speaker on general-purpose AI and alignment)
Alok (speaker on risk taxonomy and Astra framework)
Anan (contributor and technical speaker on Astra methodology)
Ashoka University (home institution for risk research team)
Astep Foundation (partner organization for Astra database)
OpenAI (ChatGPT, example of general software)
Google (Gemini, recipient of researcher feedback on failures)
DeepSeek (mentioned as competing LLM provider)
Air Canada (real-world example of AI safety failure with human cost)
Tailwind (CSS framework impacted by AI training and infrastructure displacement)

Technical Concepts & Resources

AI Systems & Models Referenced

ChatGPT (OpenAI) — subject of failure examples and correction cycles
Gemini (Google) — shows evolution in handling specific error types
DeepSeek — mentioned as alternative LLM provider
Large Language Models (LLMs) — general category of systems discussed

Frameworks & Databases

Astra (AI Safety Trust and Risk Assessments) — newly launched, India-specific AI safety risk database
- 7-step process for risk identification and assessment
- Contextual focus on India (linguistic diversity, connectivity, scale, regulation)
- Covers development, deployment, and usage phases
- 37 different risk categories identified
- Current sectors: Education, Financial Lending
- Planned expansion: Agriculture, healthcare, other domains
- Available via preprint (arXiv)

Risk Typology

Social risks: Observable, quantifiable (e.g., linguistic bias, toxicity, hallucination, infrastructure exclusion)
Frontier risks: Speculative, difficult to measure (e.g., job displacement, power-seeking, cognitive decline from overreliance)
Causal taxonomy: Distinguishes risk by manifestation stage (development, deployment, usage) and intent (intentional vs. unintentional)

Referenced Systems & Technologies

Aadhaar (Indian biometric identification system) — example of India's large-scale technology deployment
UPI (Unified Payments Interface) — example of India's large-scale technology deployment
EVMs (Electronic Voting Machines) — example of India's large-scale technology deployment
Tailwind CSS — example of open-source library negatively impacted by LLM training and displacement

Key Concepts

Alignment: Ensuring AI systems execute user intent rather than literal interpretations of ambiguous requests
Contextual blindness: International AI safety frameworks' failure to account for region-specific technological and social challenges
Infrastructure exclusion: Failure mode when systems cannot function in resource-constrained contexts (e.g., low connectivity)
Power-seeking behavior: AI system optimization beyond intended parameters (e.g., algorithmic trading system executing harmful transactions autonomously)
Linguistic diversity: Challenge specific to India with 22+ official languages; current LLMs often trained primarily on English

Methodological Notes

The Astra framework employs:

Bottom-up research into Indian-specific risk manifestations (rather than top-down adaptation of Western frameworks)
Empirical grounding through real-world use cases in education and financial lending
Lifecycle analysis tracking risks across development → deployment → usage phases
Stakeholder accountability mapping (distinguishing AI system responsibility from user/developer responsibility)
Ongoing expansion model (currently 2 sectors, planned expansion to agriculture and beyond)

Limitations & Open Questions

The speakers acknowledge:

Astra is not exhaustive and is a "formative step" requiring continuous refinement
Frontier risks lack established quantification methods — measurement strategies are not yet standardized
Mitigation effectiveness is context-dependent and often creates new tradeoffs with utility
Empirical data on actual risk probabilities in Indian contexts is sparse — much current knowledge is translated from Western examples
Expansion to additional sectors will reveal new risk categories currently not captured