API Solution

Video Data Extraction API

Define a schema. Get structured JSON. Extract entities, insights, and facts from any video or uploaded file — powered by a 2-phase AI pipeline with intelligent prompt caching.

Why Extract API?

  • Custom schema extraction — define exactly which fields you want back (String, Number, Boolean, Array, Object, Enum)
  • Prompt caching — your compiled extraction prompt is cached for 2 hours and reused across videos, making repeat extractions faster and cheaper
  • Two endpoints/v1/extract/video for online videos and /v1/extract/file for uploaded audio/video files
  • Multilingual output — results are returned in the same language as your schema descriptions and field definitions, supporting 99+ languages

How It Works — 2-Phase Pipeline

1

Prompt Compilation

Cached
  1. You send a schema + optional what_to_extract instruction
  2. An AI prompt engineer generates an optimized system prompt and user prompt template
  3. The compiled plan is cached with a 2-hour TTL by fingerprint (SHA-256 of schema + instructions) — within that window, identical schemas reuse the same plan instantly
2

Structured Extraction

Per video
  1. The cached prompt is hydrated with the video transcript
  2. AI extracts data following your schema strictly (Pydantic-enforced output)
  3. You receive clean, validated JSON matching your exact schema

Use Case Templates

Define your extraction schema in JSON or YAML. Here are ready-to-use templates for common use cases.

Lead Generation

json

Market Research

json

Content & Creator Analysis

json

AI Pipeline / RAG Ingestion

json

Brand & E-Commerce Monitoring

json

Quickstart

Call the Extract API with a video URL and your schema. The response contains structured JSON matching your exact field definitions. Each call processes one video at a time — to extract from multiple videos, make one API call per video.

bash

Extract from Uploaded Files

The /v1/extract/file endpoint works exactly like /v1/extract/video but takes a file_id instead of a video_url. Files must be uploaded and transcribed first using the transcribe endpoint.

bash

No Transcript? Use /transcribe First

The /v1/extract/video endpoint requires the video to already have a transcript. Many videos on platforms like Instagram, TikTok, and some Facebook pages don't have native captions or subtitles.

  1. 1.Call /v1/transcribe with the video URL to generate a transcript via speech-to-text.
  2. 2.The generated transcript is cached — you only pay for transcription once per video.
  3. 3.Call /v1/extract/video with the same URL and your schema — extraction now works as expected.

Schema Rules

  • Max 10 root fields
  • Max 3 nesting levels (level 3 must be primitive)
  • Max 10 subfields per Object
  • Every field needs type and description
  • Supported types: String, Number, Boolean, Integer, Array, Object, Enum

Example Response

The API returns clean, validated JSON that matches your schema exactly.

json

Prompt Caching Deep Dive

The compiled prompt (system prompt + user prompt template) is cached with a 2-hour TTL based on a SHA-256 fingerprint of your schema and instructions.

First call compiles

~2-3 s overhead to generate an optimized extraction prompt from your schema. Subsequent calls within the 2-hour cache window skip compilation entirely.

🔁
Subsequent calls skip compilation

Within the 2-hour TTL window, every call with the same schema and instructions reuses the cached prompt instantly.

🆕
New schema = new plan

Changing the schema or instructions creates a fresh cached plan automatically.

📦
Shared across all videos

Define once, extract from thousands of videos within the 2-hour cache window. Cached plans are not tied to a single video.

Pricing

Each extraction counts as 1 video analysis. With VidNavigator, 1 credit = 100 video analyses, making large-scale extraction highly cost-effective.

Frequently Asked Questions

Related Solutions