Video Data Extraction API
Define a schema. Get structured JSON. Extract entities, insights, and facts from any video or uploaded file. Supports YouTube, Instagram, TikTok, Facebook, X, Rumble, Vimeo, and Dailymotion — powered by a 2-phase AI pipeline with intelligent prompt caching.
- What is the Video Data Extraction API?
- The Video Data Extraction API is a schema-driven endpoint that turns any video into Pydantic-validated structured JSON. You define the fields you need; VidNavigator transcribes, analyzes, and extracts them through a 2-phase AI pipeline with a 2-hour shared prompt cache — replacing brittle LLM prompt engineering with guaranteed schema conformance.
Why Extract API?
- Custom schema extraction — define exactly which fields you want back (String, Number, Boolean, Array, Object, Enum)
- Built-in auto-transcription — videos without platform captions (Instagram, TikTok, Facebook, X, etc.) are automatically transcribed via speech-to-text, no separate API call needed
- Video metadata included — every response includes video info (title, channel, duration, views, publish date) alongside your extracted data
- Prompt caching — your compiled extraction prompt is cached for 2 hours and reused across videos, making repeat extractions faster and cheaper
- Broad platform support — extract from online videos on YouTube, Instagram, TikTok, Facebook, X, Rumble, Vimeo, Dailymotion, and Loom, or use
/v1/extract/filefor uploaded audio/video files - Multilingual output — results are returned in the same language as your schema descriptions and field definitions, supporting 99+ languages
How It Works — 2-Phase Pipeline
Prompt Compilation
Cached- You send a schema + optional
what_to_extractinstruction - An AI prompt engineer generates an optimized system prompt and user prompt template
- The compiled plan is cached with a 2-hour TTL based on a fingerprint of your schema + instructions — within that window, identical schemas reuse the same plan instantly
Structured Extraction
Per video- The cached prompt is hydrated with the video transcript
- AI extracts data following your schema strictly (Pydantic-enforced output)
- You receive clean, validated JSON matching your exact schema
Use Case Templates
Define your extraction schema in JSON or YAML. Here are ready-to-use templates for common use cases.
Lead Generation
Market Research
Content & Creator Analysis
AI Pipeline / RAG Ingestion
Brand & E-Commerce Monitoring
Fact-Checking & Claim Extraction
Quickstart
Call the Extract API with a video URL and your schema. The response contains structured JSON matching your exact field definitions. Each call processes one video at a time — to extract from multiple videos, make one API call per video.
Extract from Uploaded Files
The /v1/extract/file endpoint works exactly like /v1/extract/video but takes a file_id instead of a video_url. Files must be uploaded and transcribed first using the transcribe endpoint.
Built-In Auto-Transcription
For non-YouTube platforms, the Extract API automatically transcribes the video audio when no platform transcript exists. This is enabled by default via the transcribe parameter — no separate API call needed.
- •Speech-to-text credits are charged based on the video's duration — same rate as
/v1/transcribe. - •Transcripts are cached — subsequent extractions on the same video reuse it at no extra cost.
- •If either transcription or extraction fails, all charges are reverted automatically.
- •Set
transcribe=falseto disable auto-transcription and require an existing transcript. - •YouTube videos rely on platform captions — auto-transcription is not available for YouTube.
Schema Rules
- Max 10 root fields
- Max 3 nesting levels (level 3 must be primitive)
- Max 10 subfields per Object
- Every field needs type and description
- Supported types: String, Number, Boolean, Integer, Array, Object, Enum
Example Response
The API returns clean, validated JSON that matches your schema exactly.
Prompt Caching Deep Dive
The compiled prompt (system prompt + user prompt template) is cached with a 2-hour TTL based on a fingerprint of your schema and instructions.
~2-3 s overhead to generate an optimized extraction prompt from your schema. Subsequent calls within the 2-hour cache window skip compilation entirely.
Within the 2-hour TTL window, every call with the same schema and instructions reuses the cached prompt instantly.
Changing the schema or instructions creates a fresh cached plan automatically.
Define once, extract from thousands of videos within the 2-hour cache window. Cached plans are not tied to a single video.
Pricing
Each extraction counts as 1 video analysis for standard-length videos. For longer transcripts, billing scales as ceil(total_tokens / 15,000) analysis credits. If auto-transcription is triggered, speech-to-text hours are also charged based on video duration. All charges are reverted if the request fails.