India AI Impact Summit 2026

About

This library was built entirely with AI — from pulling 500+ session recordings to generating structured summaries, topic tags, and sector deep dives.

01

Video Discovery

Over 500 session recordings from the IndiaAI Impact Summit 2026 were catalogued, capturing video IDs, titles, durations, and view counts into a structured dataset.

Session recordings playlist → videos.json
02

Transcription

Auto-generated subtitles were accessed for each session recording. Where subtitles weren't available, recordings were skipped. Raw subtitle files were cleaned of timestamps and formatting artifacts to produce plain-text transcripts.

Session recording subtitles → data/transcripts/
03

Summarization

Each transcript was passed to Claude Haiku with a structured prompt asking for an executive summary, key topics, key takeaways, notable speakers and organizations, and technical concepts covered. Summaries were generated concurrently across 5 workers to process all 500+ talks efficiently. The Leadership keynote was handled separately — see step 04.

Claude Haiku (claude-haiku-4-5) → data/summaries/
04

Leadership Talk Processing

The Leadership keynote — a 7-hour session covering two recordings — required a dedicated multi-pass pipeline. Each transcript was chunked and summarized with Claude Sonnet, then all chunk summaries were synthesized into a single master summary by Claude Opus. The two sessions were subsequently combined into one holistic brief covering the full opening day.

Claude Sonnet (chunks) + Claude Opus (synthesis) → data/summaries/WgW7cC-kHgY.json
05

Tagging & Classification

Talks were classified into topic tags across three buckets — Industry & Sectors, Technology & Infrastructure, and Governance & Society — using Claude Haiku in batches. An initial broad set of tags was refined through multiple classification passes, including splitting a broad "AI Governance & Ethics" tag into three focused sub-tags and clustering previously untagged talks to surface new sectors like Energy & Power, Semiconductors & Hardware, and Cybersecurity.

Claude Haiku batch classification → video_tags.md → Supabase
06

Sector Deep Dives

For each topic tag, all tagged talks were synthesized into a sector-level brief using Claude Sonnet. The brief covers an overview, key themes, notable initiatives, challenges, and recommendations. Each claim in the brief is linked to a specific source talk via inline citations — clicking a citation opens the source talk directly.

Claude Sonnet (claude-sonnet-4-6) → sector_summaries table
07

Glossary Generation

Technical concepts were extracted from the "Technical Concepts & Resources" section of every session summary in two passes. First, raw terms were pulled from all 500+ summaries. Then Claude Sonnet normalized names, merged duplicates, and wrote clean 1–2 sentence definitions. Leadership keynote terms were extracted separately in a dedicated pass and merged into the same glossary table.

Claude Sonnet (claude-sonnet-4-6) → glossary table
08

Search

Search is powered by Postgres full-text search via Supabase, running across talks, glossary terms, and sector deep dives simultaneously. Queries use websearch syntax, enabling natural phrase and keyword matching without any external embedding service.

Supabase FTS (websearch) → talks, glossary, sector_summaries
09

The Website

The session library was built with Next.js and Tailwind CSS, backed by Supabase for data storage. The site supports filtering by topic tag (grouped into three buckets), full-text search across talks, glossary, and sectors, per-talk summaries with collapsible sections, sector deep dives with clickable citations, a glossary of 500+ technical terms, and a talk detail modal to keep your place while exploring a sector deep dive.

Next.js + Tailwind + Supabase