Democratizing AI for Social Good | India AI Impact Buildathon 2026 | India AI Impact Summit 2026

Contents

Executive Summary

The India AI Impact Buildathon 2026 was a competitive hackathon focused on developing AI solutions for social impact, featuring a three-stage evaluation process culminating in pitch presentations before a distinguished jury panel. The event showcased multiple AI-driven fraud detection systems, particularly focusing on detecting AI-generated voice calls—a critical problem affecting India's financial security and citizen safety, with an estimated ₹19,800 crore lost to cyber frauds and scam calls in 2025 alone.

Key Takeaways

AI voice fraud is now a critical infrastructure threat in India: With ₹19,800 crore lost in 2025 and 50%+ of scam calls using AI voices, this isn't a niche problem—it's a national security issue requiring immediate, scalable solutions.
Multi-modal detection (audio + spectrogram analysis) beats single-model approaches: Ensemble methods that combine different feature representations provide more robust detection than any single algorithm, particularly against rapidly evolving TTS systems.
Privacy-preserving, on-device processing is non-negotiable: Solutions must avoid uploading voice data to cloud servers. Deploying models locally on device TPUs/CPUs is technically feasible and essential for user trust and regulatory compliance.
Real-time fraud call blocking requires preemptive architecture changes, not just detection APIs: Current detection latencies (2–8 seconds) are too slow for active call termination. Solutions need infrastructure-level changes (like SIM binding) to prevent fraudulent calls at the network level, not just identify them post-hoc.
Continuous retraining and learning from false positives is the competitive moat: As attackers deploy new TTS models, solutions that can rapidly retrain on new acoustic patterns and integrate learnings from misclassifications will remain effective; static models will degrade within months.

Key Topics Covered

AI Voice Detection & Deepfake Prevention: Multiple teams developed systems to distinguish AI-generated voices from human voices
Fraud Prevention & Cybersecurity: Solutions addressing voice cloning, spoofing, and financial fraud through fraudulent calls
Buildathon Format & Evaluation Structure: Three-stage competition (technical testing, evaluation, pitch round) with evaluation criteria including problem statement, innovation, technical strength, applicability, and clarity
Prize Pool & Incentives: ₹4 lakh total prize pool (₹1 lakh first place, ₹60K second, ₹40K third per category)
Real-World Applicability & Deployment Challenges: Discussions on privacy, permissions, latency, edge deployment, and regulatory considerations
SIM Binding & Root-Cause Problem Solving: Feedback emphasizing fundamental infrastructure solutions vs. wrapper-based approaches
Continuous Model Retraining & Evolution: Strategies for adapting models to new TTS systems and emerging AI voice technologies

Key Points & Insights

Massive Scale of the Problem: An estimated 55 crores (₹19,800 crore) were lost daily to cybercrime in India in 2025; approximately 20 million scam calls made daily, with over 50% now using AI-generated voices. Only 31% of Indians can reliably identify AI-generated voices on calls.
Ensemble Model Approach: Team "DV Codes" developed an ensemble model called "Ear and Eye"—combining wave2vec (audio feature extraction) with spectrogram analysis (visual waveform patterns) to detect AI-generated audio with higher accuracy than single-model approaches.
Micro-Feature Detection: Team "Sigmoid" focused on 173 unique voice features that AI cannot replicate perfectly, including breathing patterns, tongue clicking, pitch consistency, and tone variations—capturing human micro-behaviors that generated voices struggle to replicate.
Latency vs. Accuracy Trade-off: Solutions require careful calibration; DV Codes prioritized accuracy over latency (200ms), while Sigmoid achieved 2-3 second analysis on uploaded audio but required 5-6 seconds of sample audio for reliable detection.
Privacy & On-Device Processing Concerns: A critical jury question raised privacy risks of model training on user voice data. Solutions proposed edge deployment (using device TPUs/CPUs) to process voice locally without uploading data to cloud servers, maintaining privacy while enabling detection.
Deployment at Scale Challenges: Integration with existing infrastructure (banks, telecom providers, apps like TrueCaller) requires addressing permissions, user consent mechanisms, standardization, and regulatory compliance—particularly for real-time fraud call blocking during active conversations.
Continuous Retraining Imperative: New text-to-speech (TTS) systems and AI voice models emerge regularly; solutions must continuously retrain on new acoustic patterns. Teams discussed using Gemini Pro and other code generation tools, but jury emphasized the importance of learning from false positives to prevent degradation in accuracy.
Fundamental vs. Wrapper Solutions: Jury feedback highlighted the distinction between building at architectural roots (e.g., SIM binding technology) versus creating detection wrappers on top of existing infrastructure. The jury advocated for deeper, infrastructure-level solutions rather than relying solely on detection APIs.
Multi-Algorithm Voting Mechanism: Sigmoid's approach of using three independent algorithms with configurable weights—rather than a single detection model—provides better security and allows adaptive tuning as threat landscapes evolve.
Data & Training Dataset Composition: Effective models required diverse, multi-language training data. Teams utilized: Mozilla Common Voice dataset, Google TTS, Microsoft Edge TTS, Hindi-language datasets, and locally collected human voice samples (typically 2,500–5,000 examples per category).

Notable Quotes or Statements

"The total price pool is ₹4 lakh: First position ₹1 lakh, second ₹60,000, third ₹40,000 from each category (working professionals and students)." — Buildathon Organizer

"In 2025 alone, Indians lost nearly ₹19,800 crore to cyber frauds and scam calls—about ₹55 crore daily. Over 50% of the 20 million scam calls made every day in India now use AI-generated voices." — Team Sigmoid (Problem Statement)

"Accuracy over latency—false positives are not allowed. That's why we trade off latency for safety and accuracy." — Team DV Codes (on technical trade-offs)

"You can make a model run locally on the phone itself using TPUs. We can run real-time analysis on the phone so data is not transmitted anywhere, maintaining privacy." — Team DV Codes (on privacy solution)

"We can detect these micro variations in breath, tone, rhythm. In uncertain cases with lower confidence scores, the model flags for human review rather than making a hard classification." — Team Sigmoid (on handling edge cases)

"Most of these companies are all wrappers. You are training Indian voices on a foreign model for free. We need to go backward to the roots and find fundamental solutions like SIM binding, not forward with detection layers on top." — Jury Member (on architectural philosophy)

"You've got 40,000 people who applied. You are here among the top six. You must be proud. Don't worry about the end result—we are all your family." — Jury Member (to student team, providing encouragement)

"The amazing thing about your pitch was that you started with the right data which resonates with everybody—here's the problem, here's the impact fracturing the economy." — Jury Member (on effective problem framing)

Speakers & Organizations Mentioned

Jury Panel (Felicitated)

Mr. Dashish Mishra – Chief General Manager, Delhi Circle, State Bank of India (SBI)
Dr. Buddha Chandra Shakhar – Chief Coordinating Officer, AICT, Ministry of Education, Government of India
Mr. Ankit Kakar – Senior Director Technical Services Engineering, MongoDB
Mr. Syramenal – Senior Vice President and Head of Strategic Initiatives, Cell Technologies
Mr. Nirj Valia (Easy Snippet) – Content creator, 3.3M+ followers (tech education)
Mr. Vasan Vijay Bhaskar – Chief Strategy Officer, GUEI (GUI organization)

Organizing Partners (Implied)

Google (mentioned as providing leadership for felicitation)
GUEI (Buildathon organizer)
India AI Summit 2026 (host event)

Teams Presented

Team DV Codes (Working Professionals) – AI Voice Detection System ("Ear and Eye")
Team Sigmoid (Student Category) – AI Voice Detection API
Team Siblings (Working Professionals) – Fraud Prevention Solution (pitch partially transcribed)

Technical Concepts & Resources

AI Models & Architectures

Wave2Vec (by Facebook/Meta) – Transformer-based model for audio feature extraction and self-supervised learning on raw waveform data
Spectrogram Analysis – Visual representation of audio frequencies over time; used to detect artifacts and anomalies in AI-generated speech
Ensemble Learning – Combining multiple independent classifiers (three algorithms voting) with configurable weights for improved robustness

Datasets & Data Sources

Mozilla Common Voice – Multilingual open-source voice dataset
Google Text-to-Speech (TTS) – TTS system used for generating synthetic training data
Microsoft Edge TTS – Alternative TTS for model generalization
Hindi-Specific Datasets – Local datasets for regional language support (Hindi detected audio)
Custom Collected Data – 5,000 audio samples (2,500 human + 2,500 AI-generated) per training iteration

Technical Approaches

Hyperparameter Tuning – Used to determine optimal training dataset size (e.g., why 5,000 samples)
Transfer Learning – Leveraging pre-trained models (Wave2Vec) rather than training from scratch
Feature Engineering – Extracting 173 unique voice features including:
- Breathing patterns
- Pitch consistency and variation
- Tone transitions
- Rhythm irregularities
- Tongue clicking
- Background noise characteristics

Tools & Technologies

Gemini Pro – Code generation tool used for rapid prototyping
TrueCaller API – Reference model for spam call detection at scale
Device TPUs/CPUs – On-device execution for privacy-preserving inference
JSON API Response Format – Current deployment model (text-based API returning classification + confidence + explanation)

Evaluation Metrics (Jury Framework)

Accuracy – Primary metric; prioritized over latency for fraud detection
False Positive Rate – Critical concern; misclassifying human voices as AI is costly
Latency – 200ms for live analysis; 2–3 seconds for file upload (acceptable for post-hoc verification, not real-time blocking)
Confidence Scoring – Probabilistic outputs allowing human review for borderline cases

Infrastructure & Deployment Concepts

SIM Binding – Jury-suggested fundamental technology: binding voice verification to SIM card identity to prevent spoofing at network level (not detection layer)
Edge Deployment – Running inference locally on device rather than cloud transmission
Preemptive Blocking Architecture – Proposed but not yet implemented; would require telecom-level integration

Document Metadata:

Event: India AI Impact Buildathon 2026 / India AI Impact Summit 2026
Transcript Source: YouTube (https://www.youtube.com/watch?v=wfY1gO_niqs)
Coverage: Stage 3 (Pitch Round) – Presentations from 2–3 teams with jury Q&A
Format: 10 minutes per team (2-min pitch + 8-min Q&A)
Time Period: 2025–2026