All sessions

India AI Impact Buildathon 2026 | AI for Social Good & Cyber Safety | India AI Impact Summit 2026

Contents

Executive Summary

This summit featured six finalist teams pitching AI-driven solutions to detect AI-generated voice calls and prevent voice-based scams in India. The presentations revealed that 47% of Indian adults face AI scams (₹805 crores lost in UPI fraud alone), with only 6% recovery rates. Solutions ranged from REST API-based detection systems to lightweight edge-computing models (1.8MB), emphasizing the need for localized, scalable, India-first approaches using domestic AI models rather than relying exclusively on Western tools like Gemini and OpenAI.

Key Takeaways

  1. India Needs Domestic AI Models, Not Just Wrappers Around Gemini/OpenAI: Multiple judges stressed that relying on American credits and APIs creates scalability bottlenecks and vendor dependency. Building CPU-native Indian LLMs is essential for a 1.4B-person market.

  2. Edge Computing & Offline-First Design Are Non-Negotiable: With 89–90% of Indians on 2G/3G, solutions must work locally on 2MB models without cloud dependency. This shifts the entire architecture paradigm from REST API to distributed/federated systems.

  3. Voice Detection Alone Won't Solve Scams—Require Multi-Layered Approach: Caller ID spoofing, SIM hacking, and cross-border fraud require telecom regulation + audio detection. Teams must collaborate rather than build siloed solutions.

  4. Cross-Team Collaboration > Individual Pitches: Judges repeatedly urged finalists to merge complementary strengths (e.g., Kartav's analysis + Walker Penguins' speed + Analytics' splicing detection) into unified platform. Problem is too complex for solo solutions.

  5. Real-Time vs. Post-Call Analysis Is a False Choice: Banks can flag high-risk calls (permission requests, data access) even at 3–5 second latency if accuracy is 95%+. Solution depends on use case (fraud prevention vs. forensic investigation).

Key Topics Covered

  • AI Voice Detection Technologies: Multiple approaches to distinguish human from AI-generated speech, including audio spectrum analysis, CNN models, and feature extraction
  • Indian Language Support: Solutions tested across 8-10+ Indian regional languages (Hindi, Telugu, Kannada, Chattisgari, Tulu, Malayalam, etc.)
  • Scam Prevention & Cybersecurity: Addressing voice spoofing, impersonation, and caller ID hacking as vectors for financial fraud
  • Edge Computing & Offline Processing: Emphasis on lightweight models deployable on mobile devices without GPU/TPU dependency
  • Technical Scalability: Challenges around REST API limitations, latency vs. accuracy trade-offs, and capacity planning for 500M+ users
  • Privacy & Data Ethics: Storage mechanisms, consent for voice data, encryption standards, and federated learning approaches
  • Integration with Banking & Telecom: Real-world deployment challenges within existing IVR systems, CRM platforms, and payment infrastructure
  • Policy & Regulation: Legal constraints (AIEL approval for call recording), caller ID authentication, and cross-border fraud sourcing
  • India-First Development: Recurring emphasis on building domestic AI models (Bhashini, other Indian LLMs) instead of depending on American platforms

Key Points & Insights

  1. Scale & Impact of Voice Scams: 47% of Indian adults hit by AI scams; ₹805 crores lost in UPI fraud alone (as of November 2024), with only 6% recovery rate—demonstrating urgent need for detection solutions.

  2. Multiple Viable Detection Approaches:

    • Kartav: REST API system using OpenAI Whisper + Gemini for analysis; detects emotional inconsistencies and naturalness artifacts
    • Walker Penguins: Lightweight 1.8–2MB CNN model trained on LFCC (Linear Frequency Cepstral Coefficients) audio spectrograms; 98% accuracy on test sets, 5ms latency
    • Analytics with Anand: Forensic splicing detection distinguishing between human and AI segments within same audio; handles partially AI-manipulated content
    • Sentinel Mavericks: Multi-layer system combining audio analysis with fraud keyword mapping and repetitive word detection
  3. Edge Computing Is Critical for India: Solutions must work on 2G/3G bandwidth and lower-end devices; GPU/TPU dependency is impractical for 500M+ Indian users. Judges emphasized need for CPU-only models.

  4. Chunking & Segmentation Challenges: Detecting where to split audio for analysis is complex—pauses don't always align with speech boundaries; handling continuous speech without unintended splits remains unsolved at scale.

  5. Regional Language Robustness: AI artifacts are largely language-independent (repetition, unnatural frequency characteristics); testing across 10+ Indian dialects shows models generalize beyond English-centric training.

  6. Accuracy vs. Speed Trade-off:

    • For banking/finance: 3–5 second latency acceptable if 95%+ accuracy
    • For real-time call screening: Need sub-1 second detection (not yet achieved)
    • Live recording less stable (~75–80% accuracy) than batch upload (~90%+ accuracy)
  7. Caller ID Spoofing Is Root Cause: Multiple judges noted scams often originate from hacked caller IDs (mimicking family/authority figures). Voice detection alone insufficient; requires telecom-level intervention (country code verification, SIM authentication).

  8. Indian Model Adoption Is Slow: Teams cite financial constraints preventing use of Indian LLMs (Bhashini); reliance on free Gemini credits creates vendor lock-in. Government support/Indian stack commitment needed.

  9. Data Privacy Concerns:

    • Base64 encoding insufficient (not encryption)
    • Federated learning proposed to avoid centralizing voice data
    • Consent mechanisms for voice training datasets unclear
  10. Integration Complexity: Real-world deployment requires API integration with existing banking IVRs, CRM systems, and telecom call recording pipelines—not trivial; post-call analysis more feasible than real-time intervention.


Notable Quotes or Statements

On India-First Development

"If the problem originated in India, the solution should too, and that is our duty." — Anurag Manik (Kartav)

"I believe in making in India, but let's transform from India also." — Anurag Manik

"Drop Gemini, man. Figure out a model from India... When will you end up using these American models?" — Jury Member (on vendor dependency)

"Let us create a model which runs on CPU itself so that we can get rid of this biggest problem... Why from India don't we create a large learning model using CPUs itself?" — Jury Member (PhD researcher at IIT Bombay)

On Scam Scale & Urgency

"47% of Indian adults are hit by AI scams, resulting in more than ₹805 crores lost in just UPI frauds till November last year, with only a 6% recovery rate." — Analytics with Anand team

On Root Causes vs. Symptoms

"The root cause is scams happening from outside the country... Why are you not identifying the telecom ID itself, the country ID, and give a notification?" — Jury Member

"The hacker will hack the caller ID itself... The problem is the caller ID itself is a problem." — Jury Member

On Collaboration Over Competition

"I think you know you all... make one team and work on the solution. The best brains need to come together to solve the problem." — Jury Member

"Our country is too big. We can have multiple companies. Nothing to worry. Our country is a huge country." — Government of India Ministry of Education representative

On Long-Term Thinking

"You are thinking for today... I am envisioning it for next two years... Whatever solution you give today will have no value after 3 months... Are you envisioning that after five years this thing will work?" — Jury Member

"Please don't create anything for today. You need to create for next 10 to 15 years." — Jury Member (to Sentinel Mavericks)

On Chunking Challenge

"When you hear Modi G... he will wait, mission doesn't know where to stop... Half of the chunk maybe one word... So chunking is one of the biggest challenges in audio." — Jury Member


Speakers & Organizations Mentioned

Finalist Teams & Key Members

  1. Kartav — Anurag Manik (solo developer, used ChatGPT/Perplexity/Claude/Gemini for ideation, built REST API in 40–50 hours)
  2. Walker Penguins — S Krishnan (team of 3; presenter built CNN model alone, teammates contributed data; name inspired by penguin meme)
  3. Analytics with Anand — Shubhham (lead GenAI engineer), Subrachi (Python developer), founder/CEO not named in final round; 34,000+ YouTube subscribers; EdTech startup (3 years old)
  4. Sentinel Mavericks — Team of students/young professionals (specific names not fully captured in transcript)

Government & Institutions

  • Government of India, Ministry of Education — Representative speaking about supporting all startups and students
  • IIT Bombay — Mentioned as institution where jury member conducts PhD research on CPU-only AI models
  • AIEL (Telecom Regulatory Authority) — Referenced regarding call recording approval requirements in India
  • ASV Spoof Competition — International audio detection benchmark used by Walker Penguins

Third-Party Services & Platforms

  • OpenAI: Whisper (transcription), GPT (not directly used by finalists)
  • Google: Gemini 2.5 Pro / Gemini 2.5 Flash (used by Kartav; free credits mentioned)
  • Anthropic: Claude (mentioned by Kartav for ideation)
  • 11 Labs: Leading voice generation/cloning service; tested by multiple teams for AI sample generation
  • Perplexity AI: Used by Kartav for ideation/research
  • TrueCall: Existing caller ID fraud detection app (gray area in India regarding call recording legality)
  • Replit: Platform used by Kartav for development/coding
  • Nvidia: Recent audio innovations mentioned by jury for potential chunking solutions

Indian AI/ML Models Referenced

  • Bhashini — Indian LLM mentioned as alternative to Gemini (financial constraints prevented adoption by teams)

Policy/Legal References

  • AIEL Approval — Required in India for live call recording and analysis
  • UPI Fraud Statistics — ₹805 crores lost (as of November 2024); basis for problem statement

Technical Concepts & Resources

Audio Processing & Feature Extraction

  • LFCC (Linear Frequency Cepstral Coefficients): Used by Walker Penguins to convert MP3 → audio spectrum image for CNN analysis
  • Spectral Analysis: Analyzing frequency characteristics, breathing patterns, background noise to identify AI vs. human
  • Chunking/Segmentation: Dividing audio into smaller pieces; challenge of pause detection vs. time-based splits
  • Speaker Diarization: Vector database approach to identify individual speakers among multiple voices (mentioned by Analytics with Anand)

Machine Learning Models & Architectures

  • CNN (Convolutional Neural Networks): Walker Penguins' 2MB lightweight model for binary classification
  • Wave2Vec: Pre-trained model used by Sentinel Mavericks (fine-tuned on 5 Indian languages)
  • Whisper (OpenAI): Speech-to-text transcription used by Kartav
  • Gemini 2.5 Pro / Gemini 2.5 Flash: Large language models used by Kartav for analysis fallback
  • 11 Labs TTS: Leading text-to-speech generator for synthetic voice training data
  • Federated Learning: Proposed approach to train models on device without centralizing voice data

Datasets & Training

  • ASV Spoof Dataset: International audio spoofing/deepfake detection competition dataset
  • 1 Lakh (100K+) TTS Samples: Generated locally by Analytics team for initial training
  • 10+ TTS Models: Trained on samples from ElevenLabs, custom generation, regional language data
  • 5 Indian Languages: Minimum target for models (Hindi, English, Malayalam, Telugu, Kannada noted)
  • 10+ Regional Dialects: Tested for robustness (Chattisgari, Tulu, Chhattisgarhi mentioned)

Performance Metrics & Benchmarks

  • 98% Accuracy (Walker Penguins, upload option; 90% on live recording)
  • 5ms Latency (Walker Penguins, 1.8–2MB model)
  • 3–5 Second Latency (Analytics with Anand, 50-second audio files)
  • ~90% Accuracy (Kartav, on 11 Labs samples; 18/20 detected as AI)
  • 75–80% Accuracy (Kartav, live recording option)

Privacy & Security Mechanisms

  • Base64 Encoding: Mentioned by Sentinel Mavericks (insufficient for privacy; encryption recommended)
  • Federated Learning: Proposed by jury member for local model updates without data centralization
  • Vector Databases: Encrypted speaker embeddings for diarization

Infrastructure & Deployment

  • REST API: Kartav's architecture (criticized for over/under-fetching issues and 2G/3G incompatibility)
  • Edge Deployment: All solutions targeting local/on-device processing to avoid cloud dependency
  • CPU-Only Targets: Judges emphasized avoiding GPU/TPU dependency for Indian scale
  • Replit: Development platform used by Kartav
  • IVR Integration: Target deployment point for banking solutions (real-time or post-call analysis)

Regulatory/Technical Standards Referenced

  • AIEL Call Recording Approval: Legal requirement in India for live analysis
  • Caller ID Spoofing Detection: Mentioned as root cause but not directly addressed by voice detection alone
  • SIM Authentication: Suggested as complementary to voice detection
  • Nvidia Recent Inventions: Jury member referenced recent extraordinary audio solutions from Nvidia (specifics not detailed in transcript)

Fallback & Robustness Strategies

  • Multiple TTS Model Coverage: Training on 10+ generators to handle new voice synthesis methods
  • Continuous Model Updates: Planned retraining as new voice generators emerge (11 Labs updates referenced)
  • Emotion/Intonation Analysis: Detecting monotonous/unnatural delivery patterns inconsistent with stated emotions
  • Keyword Mapping: Financial fraud terminology detection (used by Sentinel Mavericks)
  • Repetition Detection: Identifying repetitive patterns in generated speech

Open Problems & Challenges

  • Chunking Boundary Detection: Determining optimal split points without breaking semantic units
  • Similar Voice Discrimination: Distinguishing between two naturally similar voices
  • Robotic Human Voices: False positives when humans naturally sound mechanical
  • Continuous Learning: Keeping models current as synthesis technology evolves
  • Latency-Accuracy Trade-off: Real-time detection (< 1 second) vs. high accuracy (95%+)
  • Bandwidth Constraints: REST API inefficiency on 2G/3G; need for offline-first architecture

Additional Context

Event Context

  • India AI Impact Buildathon 2026 — Competitive hackathon/pitch event with 6 finalists selected from larger pool
  • Summit Theme: AI for Social Good & Cyber Safety
  • Timing: Presented as urgent national problem (scam crisis in India)
  • Judging Panel: Multiple jury members from academia (IIT Bombay), industry, and government

Recurring Themes in Jury Feedback

  1. Domestic AI First: Use Indian models (Bhashini, future Indian LLMs) instead of American platforms
  2. Edge/Offline Critical: CPU-based, 2G/3G-compatible solutions mandatory for India's scale
  3. Collaborate, Don't Compete: Merge teams to avoid siloed solutions; problem too large for individual teams
  4. Think Long-Term: Build for 10–15 years, not today; technology evolves rapidly
  5. Root Cause Analysis: Address caller ID spoofing + telecom regulation, not just voice detection
  6. Privacy & Consent: Encryption, federated learning, and explicit user consent required
  7. Real-World Integration: Solutions must plug into existing banking/telecom infrastructure seamlessly
  8. Capacity Planning: 500M+ concurrent users, 3–4 interactions/day = billions of requests; scalability stress-tested?

Implications & Recommendations (Synthesis)

For Startups/Builders:

  • Secure government/institutional support to access Indian AI stacks
  • Prioritize edge deployment and CPU-native models from day one
  • Partner