Language as Digital Infrastructure: Enabling Inclusive AI Across Communities

Contents

Executive Summary

This panel discussion explores the multi-layered architecture required to deploy voice AI at production scale in India, covering infrastructure, orchestration, policy, and adoption challenges. The speakers emphasize that voice AI success in India requires addressing not just model quality, but also telecom infrastructure, regulatory compliance, multilingual support, and user trust—creating a holistic ecosystem rather than isolated technological solutions.

Key Takeaways

Voice AI in India requires thinking beyond the model. Infrastructure, telecom integration, edge computing, policy compliance, and user trust design are equally important as LLM quality. Success is an ecosystem play.
Declare that you're AI—don't try to fake humanity. Users care more about problem-solving consistency and transparency than perfect human mimicry. Building trust through trustworthy behavior (solving problems, being responsive, being transparent) is the unlock for adoption at scale.
Data responsibility is not a compliance checkbox; it's a competitive advantage. Organizations that adopt proactive duty-of-care frameworks (auditability, deletion pipelines, purpose limitation) will build user trust faster and face fewer regulatory surprises than those treating policy as a burden.
Multilingual India demands platform-level orchestration, not model parity. No single off-the-shelf model handles code-switching, dialect sensitivity, and cost optimization simultaneously. Purpose-built orchestration platforms that route requests intelligently across multiple models are the missing layer.
Success metrics must tie to business impact and user retention. Leading indicators (e.g., CES, repeat call rates, problem-solving consistency, latency) are easier to track than long-term ROI. Start with high-volume, low-stakes use cases; measure obsessively; iterate continuously.

Key Topics Covered

Infrastructure & Telephony Layer: GPU compute sizing, edge data centers, cloud scalability, PSTN vs. data network integration, redundancy requirements
Orchestration & Multi-Model Coordination: Language-aware routing, code-switching, multilingual prompt engineering, human-in-the-loop workflows
Policy, Governance & Data Protection: DPDP compliance, consent frameworks, voice as biometric data, sectoral regulations (BFSI/RBI), liability attribution
Adoption Barriers & User Trust: Thinking vs. typing gap, conversational design, behavioral retention, ROI measurement, observability/feedback loops
Multilingual Challenges in India: Code-switching, dialect diversity, language bias blind spots, cost sensitivity per minute
Edge Computing & Sovereignty: Sovereign cloud requirements, data residency, compliance with national data protection laws
Failure Modes & Testing: Detailed logging/tracing, regression testing, impact of prompt changes, graceful degradation

Key Points & Insights

Voice AI isn't just a model problem—it's an infrastructure ecosystem problem. Sunil Gupta emphasizes that telecom network quality, latency, concurrency handling, and GPU compute sizing are equally critical as LLM performance. A single dropped call or network failure cascades to millions of users at scale.
Telecom network architecture in India requires dual-path solutions. PSTN networks (for feature phones and landlines) and IP/data networks (for smartphones) operate under different constraints. Voice AI must support both seamlessly, with intelligent call routing via voice gateways.
Edge data centers are no longer theoretical—they're essential for real-time voice AI. Latency sensitivity in conversational systems makes central data centers impractical; processing must happen at the edge to deliver conversation-like responsiveness within milliseconds.
Multilingual orchestration is fundamentally different from translation. Code-switching (mixing languages in single sentences), language-specific weights in speech patterns, and context-aware model selection require platform-level intelligence—not just tokenization of translated text.
Consent is the beginning, not the end, of data responsibility. Deepika Malletti argues that voice is biometric data; consent alone doesn't satisfy duty of care. Deletion pipelines, purpose limitation, collection minimization, and auditability are non-negotiable.
Policy-first mindset > compliance-checkbox mentality. The framing should shift from "I must take consent because required" to "I must create a safe experience because I value my user." This mindset change unlocks more meaningful and sustainable compliance.
Trust barriers in voice AI adoption are multifaceted. Users have unclear expectations about bot capabilities, distrust unnatural-sounding speech, and harbor anxiety about talking to systems. Building trust requires humane conversation design, empathy, and transparent problem-solving.
Cost per minute is a critical metric for Indian enterprises. Unlike Western enterprises, Indian businesses operate with extreme cost sensitivity (e.g., 3 rupees/min vs. 2.8 rupees/min is significant). Model efficiency, token optimization, and cheap model alternatives are strategic priorities.
Observability and continuous feedback loops drive retention. Voice AI products fail silently without deep tracing across all system layers (speech-to-text, LLM, text-to-speech, telephony). Identifying leading indicators (e.g., repeat call rates) and problem discovery at scale requires sophisticated monitoring.
The next 12–18 months are critical. DPDP enforcement begins May 2025, sectoral regulations (BFSI) are emerging, and model diversity is accelerating. Organizations must prepare now—not after problems surface.

Notable Quotes or Statements

Sunil Gupta: "Once you move from prototype to scale, the stack starts to matter in ways you didn't anticipate. Latency, multilingual support, infra, evals, governance frameworks—all of that keeps what's being deployed."
Sunil Gupta: "The telecom part plays as much a bigger role as the model part... voice is so much latency-sensitive... you may not have time for your call to go from Gujhati to a data center in Bombay and back."
Matria Vag: "We don't believe there will be one model to rule them all situation ever in the future... each call needs to run on different models for all these layers."
Deepika Malletti: "Voice is biometric. It's very different from any other form of data... consent is only the beginning. It is not by any means the end of the responsibility."
Deepika Malletti: "If we were to approach this whole problem not as 'I need to take consent because I'm required to take consent' but 'I need to create a safe experience for my user because that's how I'm creating trust and valuing my user,' we may actually do a lot of things much more meaningfully."
Subhash Mukharji: "The thinking versus the typing gap... you're thinking in an Indic language in your mother tongue and you're trying to type using an alien English keyboard in Latin... voice is important to provide that freedom of expression."
Subhash Mukharji: "Retention is defined when what you expect the app to solve, it solves that. When that happens, retention happens... it really boils down to consistency in problem solving."
Matria Vag: "Once you go beyond 'why is this voice AI calling me,' people are happy to have the conversation... people are now understanding that yes, they can have a nice conversation with an AI if it is solving the problem for them."

Speakers & Organizations Mentioned

Speaker	Role / Organization	Focus Area
Sunil Gupta	Co-founder & CEO, Yota	Infrastructure, sovereign cloud, GPU compute, telecom integration
Matria Vag	Founder, Bolna	Voice AI orchestration, multilingual platforms, enterprise workflows
Deepika Malletti	Chief of Policy & Partnership, AXA Foundation	Policy, governance, DPDP compliance, data protection, sectoral frameworks
Subhash Mukharji (Shiddhut)	Head of Demand Engineering, Misho	Adoption barriers, user trust, behavioral retention, observability, ROI measurement

Institutions/Frameworks Mentioned:

Digital Personal Data Protection (DPDP) Regime – enforcement May 2025
AI Governance Framework – released, non-binding guidance
BFSI/RBI Sectoral Guidelines – banking & financial services regulations
PSTN (Public Switched Telephone Network) – legacy telecom infrastructure

Technical Concepts & Resources

Infrastructure & Systems Architecture

PSTN (Public Switched Telephone Network) – legacy phone system for feature phones and landlines
Data networks – IP-based networks carrying voice as packets (WhatsApp, Signal, Telegram calls)
Voice gateway – identifies caller origin (PSTN vs. data network) and routes appropriately
GPU compute sizing – pre-calculation of concurrent capacity needed for production scale
Cloud autoscaling – dynamic capacity adjustment (burst on demand, scale down at off-peak)
Active-active redundancy – all system components duplicated; failure of one doesn't break service
Edge data centers – distributed compute at geographic edges to minimize latency in real-time conversations
Sovereign cloud – cloud infrastructure physically and legally within national boundaries; compliant with local data residency laws

Multilingual & NLP Processing

Code-switching – seamless mixing of multiple languages in single utterance (Hindi + English, e.g., "tinchar patch")
Speech-to-text (STT) → LLM → Text-to-speech (TTS) – full pipeline
Language detection – identifying speaker's preferred language from first few sentences
Language-specific weights – understanding how speakers weight multiple languages (e.g., numbers in English even in Hindi conversation)
Dialect diversity – 700+ active dialects in India; differences between Hindi in Bihar vs. Rajasthan, Tamil in Madurai vs. Madras
Language bias blind spots – systems trained on one language variant may systematically deny loans/services to speakers of other variants
Dynamic model routing – orchestration layer selects optimal STT, LLM, and TTS model per call based on language, cost, quality trade-offs

Models & Inference

Eleven Labs – high-quality TTS (modern English)
Cartesia – TTS alternative
Serv – cheaper TTS option
Token optimization – minimizing input/output tokens to reduce cost-per-call (critical in cost-sensitive Indian market)
Model diversity – newer, smaller indie models from universities and labs competing on specific metrics (cheapest, fastest, best-for-specific-domain)

Operational & Testing

Detailed log tracing – capturing decision path at each layer (STT, LLM, TTS, telephony) to diagnose failures
Regression testing – running 1,000–100,000+ test calls internally before production to catch unintended consequences of prompt/config changes
Observability platforms – tools enabling real-time monitoring and continuous problem discovery
Leading indicators – proxy metrics easier to measure than long-term ROI (e.g., CES, repeat call rate, problem-solving consistency, latency)
A/B testing – comparing voice AI enabled vs. disabled (or multiple approaches) to measure business lift (conversion, customer effort score, etc.)
Human-in-the-loop escalation – intelligent transfer to human agents when AI detects task failure, without explicit user request
Deep fake liability gap – risk of fraud/impersonation via synthetic voice; liability chains unclear

Policy, Governance & Compliance

DPDP (Digital Personal Data Protection) Act – consent, biometric data handling, purpose limitation, deletion rights; full enforcement May 2025
AI Governance Framework – India's voluntary (non-binding) guidance for responsible AI; sectoral frameworks (BFSI, healthcare, etc.) expected to emerge
Graded liability chain – distributed responsibility across multiple parties (cloud provider, model provider, platform orchestrator, enterprise deploying); emphasis on auditability and duty-of-care demonstration rather than punishment
Data flow mapping – documenting where voice/biometric data flows, who has access, how long retained
Deletion pipeline – process for purging biometric data post-transaction; as critical as collection pipeline
Purpose limitation, collection minimization, time limitation – principles for responsible voice data handling
Localization – storing/processing data locally (sovereign cloud) to give users maximum control and minimize enterprise risk

User Experience & Adoption

Thinking vs. typing gap – friction users face typing in Indic languages on Latin keyboards; voice removes this friction
Trust barriers – unclear expectations, unnatural speech, scripted/repetitive conversations, anxiety about bot competence
Conversational design – humane, empathetic, non-scripted interaction structure (clarify before acting, confirm understanding, probe appropriately, escalate gracefully)
Call drop patterns – data showing 10-second drop rate (user hangs up after realizing it's AI), 10–30 second low drop rate (after accepting it's AI), and gradual increase thereafter
Behavioral retention – user continues using voice AI when it consistently solves their problem; consistency > naturalness
Feed UX friction – voice avoids screen real estate conflict in mobile apps; allows simultaneous browsing + voice interaction (unlike chat UI)
High-volume, low-stakes use cases – recommended starting point before tackling sensitive/complex problems

Summary Table: Layer-by-Layer Voice AI Stack

Layer	Key Owner(s)	Key Challenges	Key Unlocks
Infrastructure & Telecom	Cloud operators, telecom providers, Yota	Dual-network support (PSTN + IP), latency, concurrency, redundancy, cost optimization	Sovereign cloud, edge data centers, auto-scaling contracts
Orchestration	Bolna (and similar platforms)	Multilingual routing, code-switching, model diversity, cost-per-call efficiency, human escalation	Multi-model selection framework, intelligent code-switch detection, purpose-built India orchestration
Policy & Governance	Regulators, enterprises, AXA Foundation	DPDP compliance, sectoral frameworks, biometric data liability, consent > checkbox mindset	Auditability, deletion pipelines, graded liability chains, duty-of-care demonstration
Adoption & User Experience	Enterprises, Misho (and adopters)	Trust barriers, conversational design, behavioral retention, ROI measurement, observability	Problem-solving consistency, transparent AI positioning, leading indicators, feedback loops

Recommended Next Steps for Stakeholders

For Infrastructure Providers:

Invest in edge data center rollout across India's geography
Negotiate auto-scaling SLAs with cloud operators
Build sovereign cloud offerings compliant with DPDP

For Orchestration/Platform Builders:

Implement multi-model selection logic with language detection at call start
Build deletion pipelines and data lineage tracking
Develop regression testing suites for prompt/config changes
Create detailed observability dashboards capturing all pipeline layers

For Enterprises/Adopters:

Start with high-volume, low-stakes use cases (customer support, recruitment screening, etc.)
Define leading indicators early (CES, repeat call rate, problem-solving consistency)
Implement detailed logging and continuous feedback loop infrastructure
Design conversational workflows around user clarification, confirmation, and graceful escalation

For Policymakers & Regulators:

Finalize sectoral guidance (BFSI, healthcare, education) before May 2025 DPDP enforcement
Clarify liability allocation in multi-party voice AI ecosystems
Encourage proactive duty-of-care culture over retroactive punishment
Support sovereign cloud infrastructure development

Critical Timeline

May 2025 – DPDP Act fully enforced; all voice data handling must comply
Next 12–18 months – Sectoral regulations emerge; edge data center deployments accelerate; model diversity increases; adoption at scale becomes feasible
Beyond 2025 – Winner-takes-most consolidation likely among orchestration platforms; voice AI becomes primary channel for citizen-government, customer-business, and peer-to-peer interaction in India