Five copy-and-run schemas that turn spoken video into validated rows in a table.
Video Data Extraction Examples: Five Real Schemas You Can Copy

Why schemas beat prompts
The traditional way to get structured data out of a video is to transcribe it, paste the transcript into an LLM, and prompt for JSON. The brittle way: you spend a day iterating on "please return valid JSON" until it mostly works, and then the model drifts on the 400th run and your ingestion pipeline silently corrupts a day of data.
Schema-driven extraction removes the prompt engineering step. You describe the shape you want — types, enums, required fields, per-field descriptions — and the API returns JSON that is validated against that shape. Fields cannot drift. Enum values cannot become free text. Downstream code stays simple.
Below are five schemas that cover categories people actually build on top of video data. Each is a full round-trip: schema, API call, validated output, takeaway. Copy any of them, change the URL, and the rest is the same contract you would get from any typed API.
Example 1 — Product reviews into a price-and-verdict database
Source: Creator review videos (YouTube, TikTok)
You want to index hundreds of product-review videos as rows in a structured table — product name, price, verdict, pros, cons — so users can filter and compare without watching every review end-to-end.
Schema
{
"type": "object",
"required": ["product_name", "verdict"],
"properties": {
"product_name": { "type": "string" },
"price_usd": { "type": "number", "description": "MSRP the reviewer cites, if any" },
"verdict": { "type": "string", "enum": ["buy", "skip", "situational"] },
"pros": { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
"cons": { "type": "array", "items": { "type": "string" }, "maxItems": 5 },
"timestamp_of_verdict": { "type": "number", "description": "seconds into the video where the verdict is stated" }
}
}API call
curl -X POST \
https://api.vidnavigator.com/v1/extract/video \
-H 'Content-Type: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-d '{
"video_url": "https://www.youtube.com/watch?v=REVIEW_ID",
"schema": { /* schema above */ }
}'Validated output
{
"status": "success",
"data": {
"product_name": "Sony WH-1000XM5",
"price_usd": 399,
"verdict": "buy",
"pros": [
"best-in-class noise cancelling",
"30-hour battery life",
"lightweight case"
],
"cons": [
"plastic build feels cheaper than XM4",
"no aptX Lossless support"
],
"timestamp_of_verdict": 742
}
}Takeaway: Every row in the table is validated against your schema, so the verdict column is guaranteed to be one of buy / skip / situational — downstream filters and comparisons never break on a free-text variant.
Example 2 — Earnings calls into a structured highlights record
Source: Publicly posted earnings-call recordings (YouTube, company IR pages)
Equity-research teams re-listen to earnings calls for forward guidance, reaffirmed or revised numbers, and named product initiatives. The same 45-minute call summarized by five different analysts produces five inconsistent memos. A schema fixes that.
Schema
{
"type": "object",
"required": ["ticker", "fiscal_period"],
"properties": {
"ticker": { "type": "string" },
"fiscal_period": { "type": "string", "description": "e.g. Q1-2026" },
"revenue_guidance_usd_m": { "type": "number", "description": "Midpoint of forward revenue guidance in millions" },
"guidance_direction": { "type": "string", "enum": ["raised", "reaffirmed", "cut", "not_given"] },
"named_initiatives": {
"type": "array",
"items": {
"type": "object",
"required": ["name"],
"properties": {
"name": { "type": "string" },
"description": { "type": "string" },
"timestamp": { "type": "number" }
}
}
},
"risk_flags": { "type": "array", "items": { "type": "string" } }
}
}API call
curl -X POST \
https://api.vidnavigator.com/v1/extract/video \
-H 'Content-Type: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-d '{
"video_url": "https://www.youtube.com/watch?v=CALL_ID",
"schema": { /* schema above */ }
}'Validated output
{
"status": "success",
"data": {
"ticker": "ACME",
"fiscal_period": "Q1-2026",
"revenue_guidance_usd_m": 1250,
"guidance_direction": "raised",
"named_initiatives": [
{
"name": "Enterprise AI Platform",
"description": "Launching GA next quarter; $50m committed pipeline",
"timestamp": 1284
},
{
"name": "APAC expansion",
"description": "Two new country launches, sales team hired",
"timestamp": 1910
}
],
"risk_flags": [
"FX headwind on EU revenue",
"GPU capacity constraint flagged for H2"
]
}
}Takeaway: The same schema applied to 40 calls a quarter gives you a consistent longitudinal dataset — compare guidance_direction across your portfolio with one SQL query instead of re-reading memos.
Example 3 — Conference panels into topic-tagged moment cards
Source: Podcast interviews, conference panels, fireside chats
Panels and interviews are still valuable extraction sources even without diarization. Instead of trying to guarantee who said what, extract the highest-signal moments with a quote, a topic tag, and timestamps so editors and researchers can jump straight to the evidence and add attribution manually when needed.
Schema
{
"type": "object",
"required": ["panel_topic", "moments"],
"properties": {
"panel_topic": { "type": "string" },
"moments": {
"type": "array",
"items": {
"type": "object",
"required": ["quote", "topic_tag", "start_sec"],
"properties": {
"quote": { "type": "string", "description": "High-signal quote or moment from the panel" },
"topic_tag": { "type": "string", "description": "Short label for the topic discussed" },
"summary": { "type": "string", "description": "One-sentence explanation of why this moment matters" },
"start_sec": { "type": "number" },
"end_sec": { "type": "number" }
}
}
}
}
}API call
curl -X POST \
https://api.vidnavigator.com/v1/extract/video \
-H 'Content-Type: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-d '{
"video_url": "https://www.youtube.com/watch?v=PANEL_ID",
"schema": { /* schema above */ }
}'Validated output
{
"status": "success",
"data": {
"panel_topic": "Building agentic AI products in 2026",
"moments": [
{
"quote": "Everyone says agents, nobody defines them. What counts as an agent on your team?",
"topic_tag": "definitions",
"summary": "The panel opens by framing the disagreement around what should count as an agent.",
"start_sec": 62,
"end_sec": 71
},
{
"quote": "An agent is a loop with memory and the ability to call tools — that's the minimum bar.",
"topic_tag": "definitions",
"summary": "A concise working definition of an agent for product teams.",
"start_sec": 78,
"end_sec": 89
},
{
"quote": "We stopped calling them agents and started calling them workflows with conditional branches. Investors hated it but our customers got it.",
"topic_tag": "positioning",
"summary": "A practical product-marketing lesson about language and buyer understanding.",
"start_sec": 105,
"end_sec": 120
}
]
}
}Takeaway: This pattern keeps the extraction grounded in what the transcript can support reliably today: the quote itself, the topic, and the exact moment in the video. It is ideal for newsletters, event recaps, research notes, and editorial workflows where a human can add speaker attribution later if needed.
Example 4 — Cooking videos into step-by-step JSON recipes
Source: Food YouTube, TikTok recipe creators, Instagram cooking reels
Recipe aggregators, meal-planning apps, and grocery-list integrations all need a normalized recipe shape — title, ingredients with quantities, ordered steps, timings. A 6-minute video scraped into unstructured transcript is useless; a validated recipe record is an asset.
Schema
{
"type": "object",
"required": ["title", "ingredients", "steps"],
"properties": {
"title": { "type": "string" },
"yield_servings": { "type": "number" },
"total_time_min": { "type": "number" },
"ingredients": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "quantity"],
"properties": {
"name": { "type": "string" },
"quantity": { "type": "string" },
"notes": { "type": "string" }
}
}
},
"steps": {
"type": "array",
"items": {
"type": "object",
"required": ["order", "instruction"],
"properties": {
"order": { "type": "integer" },
"instruction": { "type": "string" },
"duration_min":{ "type": "number" },
"timestamp": { "type": "number" }
}
}
}
}
}API call
curl -X POST \
https://api.vidnavigator.com/v1/extract/video \
-H 'Content-Type: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-d '{
"video_url": "https://www.youtube.com/watch?v=RECIPE_ID",
"schema": { /* schema above */ }
}'Validated output
{
"status": "success",
"data": {
"title": "Weeknight Miso Pasta",
"yield_servings": 2,
"total_time_min": 25,
"ingredients": [
{ "name": "dried spaghetti", "quantity": "200 g" },
{ "name": "white miso paste", "quantity": "2 tbsp" },
{ "name": "garlic", "quantity": "3 cloves", "notes": "minced" },
{ "name": "unsalted butter", "quantity": "30 g" },
{ "name": "pasta water", "quantity": "1/2 cup", "notes": "reserved" }
],
"steps": [
{ "order": 1, "instruction": "Boil pasta in salted water until al dente", "duration_min": 9, "timestamp": 40 },
{ "order": 2, "instruction": "Foam butter in pan and bloom the minced garlic", "duration_min": 2, "timestamp": 165 },
{ "order": 3, "instruction": "Whisk miso with a ladle of pasta water to loosen", "duration_min": 1, "timestamp": 220 },
{ "order": 4, "instruction": "Add drained pasta, toss with miso butter, emulsify with water","duration_min": 3, "timestamp": 260 }
]
}
}Takeaway: Drop the extracted record straight into a Recipe CMS or an import API — the schema matches schema.org's Recipe shape closely enough that you can also emit rich-result JSON-LD from the same payload.
Example 5 — Recorded customer calls flagged for compliance keywords
Source: Sales-call recordings, customer-support sessions, fintech onboarding videos
Regulated industries (fintech, healthcare, insurance) require either a human listener or a keyword spotter on every recorded conversation. A schema that returns risk-flag categories with timestamps replaces that manual review step with a reviewable audit trail, even when speaker attribution is not available.
Schema
{
"type": "object",
"required": ["call_id", "flags"],
"properties": {
"call_id": { "type": "string" },
"flags": {
"type": "array",
"items": {
"type": "object",
"required": ["category", "quote", "start_sec"],
"properties": {
"category": { "type": "string", "enum": ["guarantee", "personal_data", "complaint", "competitor_mention", "pricing_deviation"] },
"quote": { "type": "string" },
"start_sec": { "type": "number" },
"end_sec": { "type": "number" }
}
}
},
"overall_risk": { "type": "string", "enum": ["low", "medium", "high"] }
}
}API call
curl -X POST \
https://api.vidnavigator.com/v1/extract/video \
-H 'Content-Type: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-d '{
"video_url": "https://storage.example.com/calls/2026-04-17/call-8812.mp4",
"schema": { /* schema above */ }
}'Validated output
{
"status": "success",
"data": {
"call_id": "call-8812",
"flags": [
{
"category": "guarantee",
"quote": "I can promise you'll see returns of at least 8% in the first year",
"start_sec": 412,
"end_sec": 421
},
{
"category": "pricing_deviation",
"quote": "We can waive the setup fee if you sign today",
"start_sec": 560,
"end_sec": 568
}
],
"overall_risk": "high"
}
}Takeaway: Route records with overall_risk = "high" to a human reviewer automatically. Each flag carries its own timestamp so the reviewer jumps straight to the 9-second window in question — audit review time drops from minutes per call to seconds, without pretending the system can always identify the speaker correctly.
Patterns you will keep reusing
- Enums for categorical fields. Verdicts, risk tiers, guidance directions — if a field only has a handful of valid values, make it an enum. That single constraint prevents 90% of the "my downstream query broke" bugs.
- Timestamps on every evidence field. Add
start_sec/timestampeven when you think you will not use it. Your future self will want to render deep-links and run retrieval evaluations — the moment you need it and do not have it, re-indexing is painful. - Required vs optional, stated clearly. Required fields fail loudly when evidence is missing. That is what you want — silent nulls at ingestion time corrupt downstream joins and dashboards.
- Human-readable descriptions on each field. The extractor reads them as hints. A clear description improves recall more than any amount of prompt engineering on the caller side.
- Nested objects, not stringified JSON. If you need a structured sub-object (like the quotes array or the flags array above), express it as a real object in the schema. Do not fake structure with a string that later needs to be parsed again.
Recap
- Schema-driven extraction replaces prompt engineering for structured video data.
- Five starter schemas above cover reviews, earnings calls, panel moments, recipes, and compliance.
- Pattern: enums, timestamps, clear required/optional split, nested objects, field descriptions.
- Input is a public URL or an uploaded file — the call shape is the same either way.
- Costs scale per video, not per field; concurrency is handled server-side.
Read more about the underlying endpoint in the Video Data Extraction solution or in the Video Data Extraction API deep-dive.