Video Data Extraction API
Define a schema. Get structured JSON. Extract entities, insights, and facts from any video or uploaded file — powered by a 2-phase AI pipeline with intelligent prompt caching.
Why Extract API?
- Custom schema extraction — define exactly which fields you want back (String, Number, Boolean, Array, Object, Enum)
- Prompt caching — your compiled extraction prompt is cached for 2 hours and reused across videos, making repeat extractions faster and cheaper
- Two endpoints —
/v1/extract/videofor online videos and/v1/extract/filefor uploaded audio/video files - Multilingual output — results are returned in the same language as your schema descriptions and field definitions, supporting 99+ languages
How It Works — 2-Phase Pipeline
Prompt Compilation
Cached- You send a schema + optional
what_to_extractinstruction - An AI prompt engineer generates an optimized system prompt and user prompt template
- The compiled plan is cached with a 2-hour TTL by fingerprint (SHA-256 of schema + instructions) — within that window, identical schemas reuse the same plan instantly
Structured Extraction
Per video- The cached prompt is hydrated with the video transcript
- AI extracts data following your schema strictly (Pydantic-enforced output)
- You receive clean, validated JSON matching your exact schema
Use Case Templates
Define your extraction schema in JSON or YAML. Here are ready-to-use templates for common use cases.
Lead Generation
Market Research
Content & Creator Analysis
AI Pipeline / RAG Ingestion
Brand & E-Commerce Monitoring
Quickstart
Call the Extract API with a video URL and your schema. The response contains structured JSON matching your exact field definitions. Each call processes one video at a time — to extract from multiple videos, make one API call per video.
Extract from Uploaded Files
The /v1/extract/file endpoint works exactly like /v1/extract/video but takes a file_id instead of a video_url. Files must be uploaded and transcribed first using the transcribe endpoint.
No Transcript? Use /transcribe First
The /v1/extract/video endpoint requires the video to already have a transcript. Many videos on platforms like Instagram, TikTok, and some Facebook pages don't have native captions or subtitles.
- 1.Call
/v1/transcribewith the video URL to generate a transcript via speech-to-text. - 2.The generated transcript is cached — you only pay for transcription once per video.
- 3.Call
/v1/extract/videowith the same URL and your schema — extraction now works as expected.
Schema Rules
- Max 10 root fields
- Max 3 nesting levels (level 3 must be primitive)
- Max 10 subfields per Object
- Every field needs type and description
- Supported types: String, Number, Boolean, Integer, Array, Object, Enum
Example Response
The API returns clean, validated JSON that matches your schema exactly.
Prompt Caching Deep Dive
The compiled prompt (system prompt + user prompt template) is cached with a 2-hour TTL based on a SHA-256 fingerprint of your schema and instructions.
~2-3 s overhead to generate an optimized extraction prompt from your schema. Subsequent calls within the 2-hour cache window skip compilation entirely.
Within the 2-hour TTL window, every call with the same schema and instructions reuses the cached prompt instantly.
Changing the schema or instructions creates a fresh cached plan automatically.
Define once, extract from thousands of videos within the 2-hour cache window. Cached plans are not tied to a single video.
Pricing
Each extraction counts as 1 video analysis. With VidNavigator, 1 credit = 100 video analyses, making large-scale extraction highly cost-effective.