India AI Impact Buildathon 2026 | AI for Social Good & Cyber Safety | India AI Impact Summit 2026

Contents

Executive Summary

This summit featured six finalist teams pitching AI-driven solutions to detect AI-generated voice calls and prevent voice-based scams in India. The presentations revealed that 47% of Indian adults face AI scams (₹805 crores lost in UPI fraud alone), with only 6% recovery rates. Solutions ranged from REST API-based detection systems to lightweight edge-computing models (1.8MB), emphasizing the need for localized, scalable, India-first approaches using domestic AI models rather than relying exclusively on Western tools like Gemini and OpenAI.

Key Takeaways

India Needs Domestic AI Models, Not Just Wrappers Around Gemini/OpenAI: Multiple judges stressed that relying on American credits and APIs creates scalability bottlenecks and vendor dependency. Building CPU-native Indian LLMs is essential for a 1.4B-person market.
Edge Computing & Offline-First Design Are Non-Negotiable: With 89–90% of Indians on 2G/3G, solutions must work locally on 2MB models without cloud dependency. This shifts the entire architecture paradigm from REST API to distributed/federated systems.
Voice Detection Alone Won't Solve Scams—Require Multi-Layered Approach: Caller ID spoofing, SIM hacking, and cross-border fraud require telecom regulation + audio detection. Teams must collaborate rather than build siloed solutions.
Cross-Team Collaboration > Individual Pitches: Judges repeatedly urged finalists to merge complementary strengths (e.g., Kartav's analysis + Walker Penguins' speed + Analytics' splicing detection) into unified platform. Problem is too complex for solo solutions.
Real-Time vs. Post-Call Analysis Is a False Choice: Banks can flag high-risk calls (permission requests, data access) even at 3–5 second latency if accuracy is 95%+. Solution depends on use case (fraud prevention vs. forensic investigation).

Key Topics Covered

AI Voice Detection Technologies: Multiple approaches to distinguish human from AI-generated speech, including audio spectrum analysis, CNN models, and feature extraction
Indian Language Support: Solutions tested across 8-10+ Indian regional languages (Hindi, Telugu, Kannada, Chattisgari, Tulu, Malayalam, etc.)
Scam Prevention & Cybersecurity: Addressing voice spoofing, impersonation, and caller ID hacking as vectors for financial fraud
Edge Computing & Offline Processing: Emphasis on lightweight models deployable on mobile devices without GPU/TPU dependency
Technical Scalability: Challenges around REST API limitations, latency vs. accuracy trade-offs, and capacity planning for 500M+ users
Privacy & Data Ethics: Storage mechanisms, consent for voice data, encryption standards, and federated learning approaches
Integration with Banking & Telecom: Real-world deployment challenges within existing IVR systems, CRM platforms, and payment infrastructure
Policy & Regulation: Legal constraints (AIEL approval for call recording), caller ID authentication, and cross-border fraud sourcing
India-First Development: Recurring emphasis on building domestic AI models (Bhashini, other Indian LLMs) instead of depending on American platforms

Key Points & Insights

Scale & Impact of Voice Scams: 47% of Indian adults hit by AI scams; ₹805 crores lost in UPI fraud alone (as of November 2024), with only 6% recovery rate—demonstrating urgent need for detection solutions.
Multiple Viable Detection Approaches:
- Kartav: REST API system using OpenAI Whisper + Gemini for analysis; detects emotional inconsistencies and naturalness artifacts
- Walker Penguins: Lightweight 1.8–2MB CNN model trained on LFCC (Linear Frequency Cepstral Coefficients) audio spectrograms; 98% accuracy on test sets, 5ms latency
- Analytics with Anand: Forensic splicing detection distinguishing between human and AI segments within same audio; handles partially AI-manipulated content
- Sentinel Mavericks: Multi-layer system combining audio analysis with fraud keyword mapping and repetitive word detection
Edge Computing Is Critical for India: Solutions must work on 2G/3G bandwidth and lower-end devices; GPU/TPU dependency is impractical for 500M+ Indian users. Judges emphasized need for CPU-only models.
Chunking & Segmentation Challenges: Detecting where to split audio for analysis is complex—pauses don't always align with speech boundaries; handling continuous speech without unintended splits remains unsolved at scale.
Regional Language Robustness: AI artifacts are largely language-independent (repetition, unnatural frequency characteristics); testing across 10+ Indian dialects shows models generalize beyond English-centric training.
Accuracy vs. Speed Trade-off:
- For banking/finance: 3–5 second latency acceptable if 95%+ accuracy
- For real-time call screening: Need sub-1 second detection (not yet achieved)
- Live recording less stable (~75–80% accuracy) than batch upload (~90%+ accuracy)
Caller ID Spoofing Is Root Cause: Multiple judges noted scams often originate from hacked caller IDs (mimicking family/authority figures). Voice detection alone insufficient; requires telecom-level intervention (country code verification, SIM authentication).
Indian Model Adoption Is Slow: Teams cite financial constraints preventing use of Indian LLMs (Bhashini); reliance on free Gemini credits creates vendor lock-in. Government support/Indian stack commitment needed.
Data Privacy Concerns:
- Base64 encoding insufficient (not encryption)
- Federated learning proposed to avoid centralizing voice data
- Consent mechanisms for voice training datasets unclear
Integration Complexity: Real-world deployment requires API integration with existing banking IVRs, CRM systems, and telecom call recording pipelines—not trivial; post-call analysis more feasible than real-time intervention.

Notable Quotes or Statements

On India-First Development

"If the problem originated in India, the solution should too, and that is our duty." — Anurag Manik (Kartav)

"I believe in making in India, but let's transform from India also." — Anurag Manik

"Drop Gemini, man. Figure out a model from India... When will you end up using these American models?" — Jury Member (on vendor dependency)

"Let us create a model which runs on CPU itself so that we can get rid of this biggest problem... Why from India don't we create a large learning model using CPUs itself?" — Jury Member (PhD researcher at IIT Bombay)

On Scam Scale & Urgency

"47% of Indian adults are hit by AI scams, resulting in more than ₹805 crores lost in just UPI frauds till November last year, with only a 6% recovery rate." — Analytics with Anand team

On Root Causes vs. Symptoms

"The root cause is scams happening from outside the country... Why are you not identifying the telecom ID itself, the country ID, and give a notification?" — Jury Member

"The hacker will hack the caller ID itself... The problem is the caller ID itself is a problem." — Jury Member

On Collaboration Over Competition

"I think you know you all... make one team and work on the solution. The best brains need to come together to solve the problem." — Jury Member

"Our country is too big. We can have multiple companies. Nothing to worry. Our country is a huge country." — Government of India Ministry of Education representative

On Long-Term Thinking

"You are thinking for today... I am envisioning it for next two years... Whatever solution you give today will have no value after 3 months... Are you envisioning that after five years this thing will work?" — Jury Member

"Please don't create anything for today. You need to create for next 10 to 15 years." — Jury Member (to Sentinel Mavericks)

On Chunking Challenge

"When you hear Modi G... he will wait, mission doesn't know where to stop... Half of the chunk maybe one word... So chunking is one of the biggest challenges in audio." — Jury Member

Speakers & Organizations Mentioned

Finalist Teams & Key Members

Kartav — Anurag Manik (solo developer, used ChatGPT/Perplexity/Claude/Gemini for ideation, built REST API in 40–50 hours)
Walker Penguins — S Krishnan (team of 3; presenter built CNN model alone, teammates contributed data; name inspired by penguin meme)
Analytics with Anand — Shubhham (lead GenAI engineer), Subrachi (Python developer), founder/CEO not named in final round; 34,000+ YouTube subscribers; EdTech startup (3 years old)
Sentinel Mavericks — Team of students/young professionals (specific names not fully captured in transcript)

Government & Institutions

Government of India, Ministry of Education — Representative speaking about supporting all startups and students
IIT Bombay — Mentioned as institution where jury member conducts PhD research on CPU-only AI models
AIEL (Telecom Regulatory Authority) — Referenced regarding call recording approval requirements in India
ASV Spoof Competition — International audio detection benchmark used by Walker Penguins

Third-Party Services & Platforms

OpenAI: Whisper (transcription), GPT (not directly used by finalists)
Google: Gemini 2.5 Pro / Gemini 2.5 Flash (used by Kartav; free credits mentioned)
Anthropic: Claude (mentioned by Kartav for ideation)
11 Labs: Leading voice generation/cloning service; tested by multiple teams for AI sample generation
Perplexity AI: Used by Kartav for ideation/research
TrueCall: Existing caller ID fraud detection app (gray area in India regarding call recording legality)
Replit: Platform used by Kartav for development/coding
Nvidia: Recent audio innovations mentioned by jury for potential chunking solutions

Indian AI/ML Models Referenced

Bhashini — Indian LLM mentioned as alternative to Gemini (financial constraints prevented adoption by teams)

Policy/Legal References

AIEL Approval — Required in India for live call recording and analysis
UPI Fraud Statistics — ₹805 crores lost (as of November 2024); basis for problem statement

Technical Concepts & Resources

Audio Processing & Feature Extraction

LFCC (Linear Frequency Cepstral Coefficients): Used by Walker Penguins to convert MP3 → audio spectrum image for CNN analysis
Spectral Analysis: Analyzing frequency characteristics, breathing patterns, background noise to identify AI vs. human
Chunking/Segmentation: Dividing audio into smaller pieces; challenge of pause detection vs. time-based splits
Speaker Diarization: Vector database approach to identify individual speakers among multiple voices (mentioned by Analytics with Anand)

Machine Learning Models & Architectures

CNN (Convolutional Neural Networks): Walker Penguins' 2MB lightweight model for binary classification
Wave2Vec: Pre-trained model used by Sentinel Mavericks (fine-tuned on 5 Indian languages)
Whisper (OpenAI): Speech-to-text transcription used by Kartav
Gemini 2.5 Pro / Gemini 2.5 Flash: Large language models used by Kartav for analysis fallback
11 Labs TTS: Leading text-to-speech generator for synthetic voice training data
Federated Learning: Proposed approach to train models on device without centralizing voice data

Datasets & Training

ASV Spoof Dataset: International audio spoofing/deepfake detection competition dataset
1 Lakh (100K+) TTS Samples: Generated locally by Analytics team for initial training
10+ TTS Models: Trained on samples from ElevenLabs, custom generation, regional language data
5 Indian Languages: Minimum target for models (Hindi, English, Malayalam, Telugu, Kannada noted)
10+ Regional Dialects: Tested for robustness (Chattisgari, Tulu, Chhattisgarhi mentioned)

Performance Metrics & Benchmarks

98% Accuracy (Walker Penguins, upload option; 90% on live recording)
5ms Latency (Walker Penguins, 1.8–2MB model)
3–5 Second Latency (Analytics with Anand, 50-second audio files)
~90% Accuracy (Kartav, on 11 Labs samples; 18/20 detected as AI)
75–80% Accuracy (Kartav, live recording option)

Privacy & Security Mechanisms

Base64 Encoding: Mentioned by Sentinel Mavericks (insufficient for privacy; encryption recommended)
Federated Learning: Proposed by jury member for local model updates without data centralization
Vector Databases: Encrypted speaker embeddings for diarization

Infrastructure & Deployment

REST API: Kartav's architecture (criticized for over/under-fetching issues and 2G/3G incompatibility)
Edge Deployment: All solutions targeting local/on-device processing to avoid cloud dependency
CPU-Only Targets: Judges emphasized avoiding GPU/TPU dependency for Indian scale
Replit: Development platform used by Kartav
IVR Integration: Target deployment point for banking solutions (real-time or post-call analysis)

Regulatory/Technical Standards Referenced

AIEL Call Recording Approval: Legal requirement in India for live analysis
Caller ID Spoofing Detection: Mentioned as root cause but not directly addressed by voice detection alone
SIM Authentication: Suggested as complementary to voice detection
Nvidia Recent Inventions: Jury member referenced recent extraordinary audio solutions from Nvidia (specifics not detailed in transcript)

Fallback & Robustness Strategies

Multiple TTS Model Coverage: Training on 10+ generators to handle new voice synthesis methods
Continuous Model Updates: Planned retraining as new voice generators emerge (11 Labs updates referenced)
Emotion/Intonation Analysis: Detecting monotonous/unnatural delivery patterns inconsistent with stated emotions
Keyword Mapping: Financial fraud terminology detection (used by Sentinel Mavericks)
Repetition Detection: Identifying repetitive patterns in generated speech

Open Problems & Challenges

Chunking Boundary Detection: Determining optimal split points without breaking semantic units
Similar Voice Discrimination: Distinguishing between two naturally similar voices
Robotic Human Voices: False positives when humans naturally sound mechanical
Continuous Learning: Keeping models current as synthesis technology evolves
Latency-Accuracy Trade-off: Real-time detection (< 1 second) vs. high accuracy (95%+)
Bandwidth Constraints: REST API inefficiency on 2G/3G; need for offline-first architecture

Additional Context

Event Context

India AI Impact Buildathon 2026 — Competitive hackathon/pitch event with 6 finalists selected from larger pool
Summit Theme: AI for Social Good & Cyber Safety
Timing: Presented as urgent national problem (scam crisis in India)
Judging Panel: Multiple jury members from academia (IIT Bombay), industry, and government

Recurring Themes in Jury Feedback

Domestic AI First: Use Indian models (Bhashini, future Indian LLMs) instead of American platforms
Edge/Offline Critical: CPU-based, 2G/3G-compatible solutions mandatory for India's scale
Collaborate, Don't Compete: Merge teams to avoid siloed solutions; problem too large for individual teams
Think Long-Term: Build for 10–15 years, not today; technology evolves rapidly
Root Cause Analysis: Address caller ID spoofing + telecom regulation, not just voice detection
Privacy & Consent: Encryption, federated learning, and explicit user consent required
Real-World Integration: Solutions must plug into existing banking/telecom infrastructure seamlessly
Capacity Planning: 500M+ concurrent users, 3–4 interactions/day = billions of requests; scalability stress-tested?

Implications & Recommendations (Synthesis)

For Startups/Builders:

Secure government/institutional support to access Indian AI stacks
Prioritize edge deployment and CPU-native models from day one
Partner