All sessions

Beyond Ethics: Operationalizing Responsible AI for Global Impact

Contents

Executive Summary

This panel discussion at an AI summit brings together industry leaders, security experts, and academic researchers to address the critical gap between AI ethics principles and operational deployment. The core argument is that responsible AI requires integrating technical safeguards (data security, model protection), contextual risk assessment (geography-specific testing), and organizational processes (governance frameworks, human oversight) rather than treating ethics as a standalone concern. Participants emphasize that India must develop its own risk mapping and testing standards rather than blindly adopting frameworks from the West.

Key Takeaways

  1. "Trust the builder, not just the encryption": Confidential computing protects data and model IP in transit, but trustworthiness of an AI system depends on the competence and ethics of whoever built it. No technology eliminates that accountability.

  2. Right-size your AI model: Most enterprise and public sector use cases don't require ChatGPT-scale models. Smaller, locally-trained models tailored to specific problems offer better explainability, controllability, and fit for Indian contexts than adopting generic foundation models.

  3. India must test for India: Western AI governance frameworks (EU AI Act, US benchmarks, OpenAI evaluations) miss Indian-specific risks—language nuances, traffic customs, lending biases tied to financial inclusion, agricultural contexts. Develop sector-specific risk mapping and testing locally.

  4. Operationalizing responsibility is a process, not a product: Responsible AI requires continuous incident reporting, risk re-assessment, multi-functional governance (not just technical teams), and willingness to not deploy AI in certain contexts altogether (e.g., human empathy in roadside assistance).

  5. Testing should precede deployment, not follow it: Benchmark testing against ethics, security, performance, and regulation before production use. Without shared testing infrastructure (Trusted AI Commons), most organizations—especially smaller ones—will struggle to do this adequately.

Key Topics Covered

  • Data Security & Sovereignty: Localization requirements, confidential computing, encryption, and proprietary data protection
  • Model Security & Trustworthiness: Distinguishing between protecting model IP versus ensuring models aren't biased or malicious
  • Build vs. Buy Decision: In-house development of predictive AI versus adoption of external foundation/generative AI models
  • Small Language Models (SLMs) vs. Large Language Models (LLMs): Advantages of right-sized, context-specific models for edge deployment
  • Contextual Risk Assessment: Geographic, sectoral, and use-case-specific risk mapping (e.g., lending bias definitions differ between US and India)
  • Testing & Certification Standards: Absence of India-specific AI quality benchmarks and the need for trusted testing frameworks
  • Human-in-the-Loop Oversight: Supervision mechanisms rather than full automation in high-stakes domains
  • Bias & Systemic Fairness: Distinguishing between intentional demographic targeting (legal in India for financial inclusion) vs. unintended systematic bias
  • Organizational Governance: Multi-functional oversight (legal, compliance, risk teams) and champion/challenger model testing
  • Foundational Model Dependencies: Challenges of relying on externally-built large models inadequately tested for Indian languages and contexts

Key Points & Insights

  1. Responsible AI ≠ Ethics Alone: Operationalization requires three integrated components: technical security (encryption, confidential computing), contextual risk assessment (sector and geography-specific), and organizational processes (governance, testing, incident reporting). Ethics principles alone don't prevent deployment failures.

  2. Confidential Computing Protects Model IP, Not Model Quality: The technology prevents model theft and ensures data privacy in transit, but cannot guarantee a model is unbiased, well-trained, or appropriate for its context. Trust in the model builder remains essential and non-technical.

  3. Context-Specific Risk Definitions: What constitutes "fair" lending differs between India (demographic factors for financial inclusion) and the US (illegal racial discrimination). Similarly, traffic rules and autonomous driving expectations vary globally. One-size-fits-all risk frameworks fail.

  4. Small Language Models Over Foundation Models for Most Use Cases: For well-defined, enterprise applications (manufacturing, logistics, service centers), smaller, domain-trained models outperform large foundation models because they:

    • Require less compute and data
    • Are easier to explain and validate
    • Allow output monitoring and filtering
    • Reduce reliance on models tested only in English-speaking contexts
  5. India Lacks Adequate Testing Benchmarks for AI: Existing benchmarks are English-centric and Western-focused. Safety evaluations in Indian languages detect fewer risks than English equivalents. A "Trusted AI Commons" with India-specific benchmarks is essential but missing.

  6. Testing Happens Far Less Than Assumed: High-profile deployments (Gap chatbot) revealed inadequate testing despite enterprise scale. Many organizations test AI only after deployment, not before. Common barriers: lack of Indian-language test data, unclear what to test for, insufficient internal capacity.

  7. Predictive AI Has Mature Governance; Generative AI Doesn't: Finance and manufacturing industries have tested predictive models for 10–15 years with established quality processes. GenAI lacks equivalent organizational maturity, making in-house development or heavy vetting of external models more critical.

  8. Human-in-the-Loop ≠ Constraining Model Architecture: Supervision doesn't restrict transformer models or limit training approaches; it reflects current inability to guarantee AI behavior on novel scenarios. As technology matures, this requirement may evolve.

  9. Incident Reporting & Redressal Mechanisms Are Missing: Beyond risk identification, operationalization requires documented processes for reporting AI failures, determining if incidents are one-off or systematic issues, and deciding whether fixes are technical, legal, or policy-based.

  10. Build Internal Capacity or Partner for Testing: Organizations without in-house testing capacity can either develop it internally (requiring time/investment) or work with external partners—but outsourcing testing of proprietary, sensitive AI still carries risks.


Notable Quotes or Statements

"We have to hold AI to a higher standard than humans." — Professor Balaraman
Justification: Automated systems receive inherent trust that humans don't; institutionalizing bias in AI models is different from individual human bias.

"You are now giving official institutional sanction to the bias." — Professor Balaraman
On the responsibility to address systematic bias once an AI model is deployed and distributed widely.

"Where not to use AI is also very important." — Mohit Kapoor (Mahindra Group CTO)
Example: Using a human instead of AI for roadside assistance to distressed drivers because empathy matters.

"It's not just a model, it's a system of models." — Anand Kashib (Forotanics CEO)
Real-world AI applications combine multiple models (speech-to-text, SLM translation, text-to-speech) running together; security and testing must account for the full pipeline.

"Confidential computing protects the privacy of the model. But if the model itself is malicious, it has been poorly created, it has bias. That's the point." — Professor Balaraman
Clarifying the limits of confidential computing technology.

"We probably will live in a world where there'll be hundreds, thousands, maybe billions of models." — Anand Kashib
Acknowledging the coming proliferation of specialized, smaller models rather than centralized mega-models.


Speakers & Organizations Mentioned

SpeakerTitle / RoleOrganization
Mohit KapoorGroup CTOMahindra Group
Anand KashibCo-founder & CEOForotanics (confidential computing/cybersecurity)
Professor Balaraman (Prof. Aendran implied)Scientific Appointee to UN Committee; Head, Center for Responsible AIBadwani School of AI and Data Science
Turmo (moderator & panelist)Runs Tattle (contextual risks in Indian languages)Tattle
TunimaModerator (stepped in last-minute)
Richard SuttonAI researcher (referenced, not present)University of Alberta (author of "The Bitter Lesson")
Yann LeCunAI researcher (referenced, seen at summit)

Other Organizations/Initiatives Referenced:

  • Mahindra Group (automobiles, finance, tractors, real estate, logistics)
  • Nvidia (confidential computing partnership)
  • OpenAI, Anthropic, Google (foundation model builders)
  • ISO (standards development)
  • EU (EU AI Act)
  • DPDP (Digital Personal Data Protection Act, India)
  • Trusted AI Commons (initiative being developed at summit)
  • Use Case Commons (being set up)

Technical Concepts & Resources

Technologies & Methods

  • Confidential Computing: Hardware-based trusted execution environments (enclaves) using CPU/GPU to protect data and models at runtime, even from administrators
  • Model Encryption & Key Management: Encrypting model weights; releasing decryption keys only when runtime environment proves secure via measurement (hash matching)
  • Model Distillation: Transferring knowledge from large models to smaller ones (caveat: not fully understood impact on data distribution)
  • Measurement: Cryptographic hash of binary model; ensures runtime matches build-time security guarantees
  • RAG (Retrieval Augmented Generation): Grounding generative models with curated data
  • Edge Deployment: Running models locally on device (car, phone, farm equipment) without cloud connectivity
  • Tokenization & Containerization: Protecting customer data through abstraction before use with GenAI
  • Output Monitoring & Filtering: Runtime controls to detect and block out-of-distribution queries for small models

Testing & Governance Frameworks

  • ISO 27001 (data security certification)
  • ISO/IEC 62304 (software lifecycle for medical devices; referenced for AI analogy)
  • EU AI Act (stringent verification requirements; being walked back due to ecosystem harm)
  • LTD30 (ISO working group): Developing AI model standards globally with Indian participation
  • Incident Reporting Mechanisms: Standardized processes for documenting AI failures
  • Risk Mapping: Sector-by-sector assessment of potential AI risks (recommended by India's AI governance guidelines)
  • Champion/Challenger Model: Internal testing framework where proposed AI must pass scrutiny from advocates and skeptics before deployment

Data & Benchmarks Gaps

  • No India-specific AI safety benchmarks for Indian languages (Hindi, Tamil, Telugu, etc.)
  • Existing benchmarks focus on English and Western cultural contexts
  • Lack of curated Indian-language datasets for training smaller models
  • Missing test data for Indian-specific use cases (agricultural advisory, rural lending, traffic patterns)

Use Cases Referenced

  • Autonomous & Assisted Driving: Detecting obstacles (cattle, pedestrians) at edge; battery/thermal management in EVs
  • Agricultural Advisory: Satellite imagery, soil data, crop recommendations for small-holding Indian farmers
  • Rural Finance: Credit scoring for those new to formal banking; assessing ability-to-pay and intent-to-pay
  • Manufacturing: Predictive maintenance (detecting error codes), defect detection on shop floors, service center efficiency
  • Medical Devices (analogy): Software lifecycle lessons apply to safety-critical AI

Papers/Essays Referenced

  • Richard Sutton's "The Bitter Lesson" (traditional) and recent follow-up essays (more nuanced on constraints)
  • Work by Todd (referenced but not detailed) on impact of automated system failures on affected populations
  • Yann LeCun's recent essays on future model architectures and human-AI interaction

Additional Context & Caveats

  • Transcript Quality: Contains some audio artifacts (repeated phrases, incomplete sentences) that may affect precise attribution, particularly around moderator transitions
  • Jargon Density: Discussion assumes familiarity with ML terminology; some technical explanations (e.g., confidential computing architecture) are dense but present
  • India-Specific Focus: Panel emphasizes India's unique regulatory environment (DPDP Act), linguistic diversity, agricultural economy, and financial inclusion goals as reasons why Western AI governance doesn't directly apply
  • No Policy Recommendations Finalized: Discussion identifies gaps and challenges but avoids prescriptive solutions, instead framing operationalization as an ongoing, sector-specific process