API Solution

Video Data Extraction API

Define a schema. Get structured JSON. Extract entities, insights, and facts from any video or uploaded file. Supports YouTube, Instagram, TikTok, Facebook, X, Rumble, Vimeo, and Dailymotion — powered by a 2-phase AI pipeline with intelligent prompt caching.

What is the Video Data Extraction API?
The Video Data Extraction API is a schema-driven endpoint that turns any video into Pydantic-validated structured JSON. You define the fields you need; VidNavigator transcribes, analyzes, and extracts them through a 2-phase AI pipeline with a 2-hour shared prompt cache — replacing brittle LLM prompt engineering with guaranteed schema conformance.

Why Extract API?

  • Custom schema extraction — define exactly which fields you want back (String, Number, Boolean, Array, Object, Enum)
  • Built-in auto-transcription — videos without platform captions (Instagram, TikTok, Facebook, X, etc.) are automatically transcribed via speech-to-text, no separate API call needed
  • Video metadata included — every response includes video info (title, channel, duration, views, publish date) alongside your extracted data
  • Prompt caching — your compiled extraction prompt is cached for 2 hours and reused across videos, making repeat extractions faster and cheaper
  • Broad platform support — extract from online videos on YouTube, Instagram, TikTok, Facebook, X, Rumble, Vimeo, Dailymotion, and Loom, or use /v1/extract/file for uploaded audio/video files
  • Multilingual output — results are returned in the same language as your schema descriptions and field definitions, supporting 99+ languages

How It Works — 2-Phase Pipeline

1

Prompt Compilation

Cached
  1. You send a schema + optional what_to_extract instruction
  2. An AI prompt engineer generates an optimized system prompt and user prompt template
  3. The compiled plan is cached with a 2-hour TTL based on a fingerprint of your schema + instructions — within that window, identical schemas reuse the same plan instantly
2

Structured Extraction

Per video
  1. The cached prompt is hydrated with the video transcript
  2. AI extracts data following your schema strictly (Pydantic-enforced output)
  3. You receive clean, validated JSON matching your exact schema

Use Case Templates

Define your extraction schema in JSON or YAML. Here are ready-to-use templates for common use cases.

Lead Generation

json

Market Research

json

Content & Creator Analysis

json

AI Pipeline / RAG Ingestion

json

Brand & E-Commerce Monitoring

json

Fact-Checking & Claim Extraction

json

Quickstart

Call the Extract API with a video URL and your schema. The response contains structured JSON matching your exact field definitions. Each call processes one video at a time — to extract from multiple videos, make one API call per video.

bash

Extract from Uploaded Files

The /v1/extract/file endpoint works exactly like /v1/extract/video but takes a file_id instead of a video_url. Files must be uploaded and transcribed first using the transcribe endpoint.

bash

Built-In Auto-Transcription

For non-YouTube platforms, the Extract API automatically transcribes the video audio when no platform transcript exists. This is enabled by default via the transcribe parameter — no separate API call needed.

  • Speech-to-text credits are charged based on the video's duration — same rate as /v1/transcribe.
  • Transcripts are cached — subsequent extractions on the same video reuse it at no extra cost.
  • If either transcription or extraction fails, all charges are reverted automatically.
  • Set transcribe=false to disable auto-transcription and require an existing transcript.
  • YouTube videos rely on platform captions — auto-transcription is not available for YouTube.

Schema Rules

  • Max 10 root fields
  • Max 3 nesting levels (level 3 must be primitive)
  • Max 10 subfields per Object
  • Every field needs type and description
  • Supported types: String, Number, Boolean, Integer, Array, Object, Enum

Example Response

The API returns clean, validated JSON that matches your schema exactly.

json

Prompt Caching Deep Dive

The compiled prompt (system prompt + user prompt template) is cached with a 2-hour TTL based on a fingerprint of your schema and instructions.

First call compiles

~2-3 s overhead to generate an optimized extraction prompt from your schema. Subsequent calls within the 2-hour cache window skip compilation entirely.

🔁
Subsequent calls skip compilation

Within the 2-hour TTL window, every call with the same schema and instructions reuses the cached prompt instantly.

🆕
New schema = new plan

Changing the schema or instructions creates a fresh cached plan automatically.

📦
Shared across all videos

Define once, extract from thousands of videos within the 2-hour cache window. Cached plans are not tied to a single video.

Pricing

Each extraction counts as 1 video analysis for standard-length videos. For longer transcripts, billing scales as ceil(total_tokens / 15,000) analysis credits. If auto-transcription is triggered, speech-to-text hours are also charged based on video duration. All charges are reverted if the request fails.

Frequently Asked Questions

Related Solutions

Related Guides