1,000+ videos. 2,757 auto-generated links. $1.50 in API costs. Here’s how we built it.
By the Dotzlaw Team
The Achievement
We turned 1,000+ YouTube videos into structured, interconnected Obsidian notes — complete with YAML frontmatter, key takeaways, timestamped sections, and 2,757 bidirectional links that wire the entire vault into a navigable knowledge graph. The total API cost was $1.50. The total processing time was 30 minutes. Every note connects to 3-5 semantically similar notes, every tag maps to a curated taxonomy, and the whole thing runs from a single web interface where you paste a URL and get back a finished note.
This article is the overview of how that system works and why we made the choices we did. The four articles that follow go deep on each component.
The Numbers
| Metric | Value |
|---|---|
| Notes processed | 1,000+ |
| Auto-generated links | 2,757 |
| Curated tags | 1,040 (from 1,280 chaotic originals) |
| API cost savings | 50% (using Batch API) |
| Processing time (1,000 notes) | ~30 minutes |
| Total cost for vault cleanup | ~$1.50 |
The Problem
YouTube is arguably the world’s largest repository of technical knowledge — tutorials, conference talks, deep dives, expert interviews. The catch is that you have to watch it, and watching takes forever. Even when you do take notes, they end up scattered: a note here about LangGraph, another there about trading patterns, a third about Docker deployment. No connections. No context. Just isolated fragments that never surface when you need them.
We needed a system that could extract the knowledge from video transcripts, format it for Obsidian, and automatically wire each note into the rest of the vault.
Figure 1 — The information black hole: isolated clusters of notes with no semantic linking between them.
How It Works
The pipeline has four stages. A YouTube URL goes in; a fully-linked Obsidian note comes out.
Figure 2 — The four-stage content pipeline: from YouTube URL input through extraction, AI processing, semantic indexing, and tag resolution to a fully linked Obsidian note.
Transcript Extraction
The system uses yt-dlp to pull video metadata and transcripts directly from YouTube. No API keys are required for public videos. It works with both auto-generated captions and manual transcripts. The extractor captures the title, channel, duration, thumbnail URL, and the full transcript with timestamps — everything needed to produce a complete note without ever watching the video.
AI Processing
Raw transcripts are noisy. The AI processing stage uses Anthropic’s Claude to transform them into structured markdown notes that read like they were written by someone who actually watched the video. The processing uses a multi-turn conversation pattern: a system prompt establishes the persona of a note-taker, initial instructions set extraction rules (no summarizing, varied formatting, 5-7 minute timestamp spacing), and the model processes the transcript with context maintained across the entire conversation.
For single videos, the system uses synchronous API calls for immediate results. For batches of two or more videos, it switches to Anthropic’s Batch API, which cuts costs by 50%. Processing 1,000 notes through the Batch API cost roughly $1.50 — a number that made vault-wide reprocessing trivially cheap. The full Batch API architecture, including the indexing bug that nearly corrupted 782 files, is covered in Anthropic Batch API in Production.
KEY INSIGHT: Building for both interactive single-item and batch modes from the start means you never have to choose between fast iteration during development and cost efficiency at scale.
Semantic Indexing
Figure 3 — Vector search as the semantic glue: notes are automatically linked when their embedding similarity exceeds the 0.70 threshold.
Every note gets embedded using OpenAI’s text-embedding-3-small model and stored in Qdrant, a self-hosted vector database. When a new note is saved, the system finds semantically similar notes above a 0.70 similarity threshold, then adds bidirectional wiki-links: Note A links to Note B, and Note B links back to Note A. This bidirectional approach is what turns a folder of files into a dense knowledge graph — every note becomes discoverable from multiple entry points. The vector search implementation and auto-linking algorithm are detailed in Building a Semantic Note Network.
Tag Resolution
AI-generated tags are a liability without curation. Left unchecked, the same concept ends up tagged three different ways — machine-learning, ML, machine_learning — and the tag system becomes noise instead of signal. We maintain a curated taxonomy of 1,040+ hierarchical tags. When a note is created, suggested tags are matched against this taxonomy using semantic similarity, ensuring consistency across the entire vault. The full tag curation story, including how we collapsed 1,280 chaotic tags into a clean hierarchy in 30 minutes, is in Obsidian Vault Curation at Scale.
KEY INSIGHT: A curated tag taxonomy beats random AI-generated tags every time. Consistency across 1,000 notes matters more than precision on any single note.
Figure 4 — Curated taxonomy beats AI hallucination: unconstrained tags on the left produce duplicates and inconsistencies, while the curated hierarchy on the right enforces clean, consistent categorization across 1,040 tags.
Two Workflows, One System
The application serves two distinct needs through a single interface.
YouTube to Markdown converts video transcripts into complete notes. The AI generates both frontmatter and body content. Two processing modes are available: summary mode for quick overviews and detailed mode for comprehensive notes when you need to retain depth.
Obsidian Notes Processing adds metadata to existing notes you have already downloaded — web articles, saved references, anything already in your vault. The AI generates frontmatter only, preserving every word of the original content. This is how we retrofitted 1,000+ existing notes with proper tags, titles, and descriptions without touching the body text.
Both workflows feed into the same semantic indexing pipeline, so every note — whether generated from a YouTube video or enriched from an existing file — gets linked into the knowledge graph.
Figure 5 — The web interface: paste a YouTube URL, choose a workflow, and generate Obsidian-compatible markdown.
Figure 6 — Anatomy of a processed note: YAML frontmatter with clean tags (ai/agents, coding/python), an AI-generated description, and automatic bidirectional links connecting this note to 20+ semantically related notes.
The Tech Stack
This project started as a Python script and grew into a full-stack application. Every technology choice was driven by a specific need we hit during development.
Figure 7 — Full-stack type safety and developer experience: FastAPI with Pydantic validation, React + Vite + TypeScript for the frontend, React Query for server state, and Tailwind CSS v4 + shadcn/ui for styling.
FastAPI won the backend because of Pydantic. Defining a Pydantic model once gives you request validation, response serialization, and auto-generated Swagger docs at /docs — all for free. After years of Flask and Django, the combination of type hints everywhere, native async/await, and automatic documentation felt like a generational leap. The entire API contract lives in the models, which means the frontend team (also us) always knows exactly what the backend expects and returns.
React + TypeScript + Vite powers the frontend. Vite delivers instant hot reload during development and fast production builds. TypeScript catches errors before runtime, which matters when you are wiring up 50+ API endpoints. But the real win is React Query, which eliminated roughly 90% of the server state boilerplate we would have written by hand. Polling for batch job status, for example, is a single refetchInterval option — poll every 5 seconds, stop when the job completes, no manual cleanup. It handles caching, refetching, and loading states with almost no code.
Tailwind CSS v4 + shadcn/ui handles styling. shadcn/ui is not a component library in the traditional sense — it is a collection of copy-paste components built on Radix UI primitives. You run a CLI command, it drops the component source code into your project, and you own it. No npm dependency, no version conflicts, no waiting for a library maintainer to fix a bug. Copy-paste beats npm dependencies when you need to move fast and customize freely.
Dark mode first was a deliberate choice. Retrofitting dark mode onto a light-mode application means auditing every color, every border, every shadow. Starting dark and adding a light theme later is dramatically easier. We default to dark because most developers prefer it, and the Tailwind dark class makes toggling trivial.
The two-panel list-detail layout — list on the left, detail on the right — is the UI pattern that drives the entire application. Users see their vault files or batch results in a scrollable list and click to see details in the adjacent panel. It is a pattern people understand immediately from email clients and file managers, and it works well for any productivity application where you are scanning many items and drilling into one at a time.
PostgreSQL stores relational data: tags, batches, note metadata, and processing history. Qdrant handles vector search, self-hosted on Proxmox so we control the infrastructure and avoid per-query cloud costs. Anthropic Claude (Haiku 3.5) does the heavy AI lifting for cost efficiency, while OpenAI is used exclusively for embeddings via text-embedding-3-small.
On the performance side, three decisions made the biggest difference: processing up to 3 videos concurrently instead of sequentially for a 3x speedup, reducing polling frequency from 2 seconds to 5 seconds to cut API call volume by 60%, and incremental indexing that only re-embeds notes when their content actually changes.
The Knowledge Graph
Figure 8 — From isolated fragments to a neural network: 1,024 files connected by 2,757 auto-generated bidirectional links.
The real payoff is not individual notes — it is the connections between them. After processing the entire vault, 2,757 auto-generated links connected 1,024 files. Notes that previously existed in isolation now connect to 3-5 related notes on average. A note about LangGraph links to notes about agent architectures, tool calling, and graph-based workflows. A note about Docker deployment links to notes about CI/CD, container orchestration, and production debugging.
The system that makes this possible — vector embeddings, similarity thresholds, and the auto-linking algorithm — is the subject of Building a Semantic Note Network. The chatbot that lets you query this graph in natural language is covered in Ask Your Vault Anything.
The Series
This is Part 1 of a 5-part series on building an AI-powered knowledge management system:
- From YouTube to Knowledge Graph (this article) — Turning 1,000+ videos into an interconnected knowledge base for $1.50
- Anthropic Batch API in Production — 50% cost savings at scale, and the bug that almost corrupted everything
- Building a Semantic Note Network — Vector search turned 1,024 isolated notes into a dense knowledge graph
- Obsidian Vault Curation at Scale — Three years of tag chaos, fixed in 30 minutes for $1.50
- Ask Your Vault Anything — A RAG chatbot that answers from your notes in 2.5 seconds
Next: Anthropic Batch API in Production — How we cut API costs in half, and the indexing bug that nearly corrupted 782 files