Video tools, ready to be LLM tool-calls
Stop gluing platform scrapers and speech-to-text APIs into your agent graph. VidNavigator gives your agents one API key and eight clean primitives — transcript retrieval, speech-to-text, semantic search, analysis, structured extraction, tweet-claim analysis, cached follow-ups, and usage introspection — across YouTube, TikTok, Instagram, Facebook, X, and four more platforms.
- What is VidNavigator for AI agents?
- VidNavigator for AI agents is a video-intelligence API and MCP server purpose-built to be registered as tools in LLM agent frameworks. It gives agents eight first-class primitives — transcript retrieval, speech-to-text, semantic search, analysis, structured extraction, tweet-claim analysis, cached follow-ups, and usage introspection — across nine platforms, returning JSON that tool-use loops can reason over directly.
Eight tools your agent can call
Each primitive below is exposed both as a REST endpoint (for direct HTTP tool-calls) and as an MCP tool via the hosted MCP server at https://api.vidnavigator.com/mcp, so it snaps into Claude Desktop, Cursor, Windsurf, Continue, or any framework that speaks MCP.
get_video_transcript(video_url)Raw timestamped transcript for any public video URL — YouTube, TikTok, Instagram, Facebook, X, Vimeo, Rumble, Loom, Dailymotion. One endpoint, routed automatically, with optional language hint and metadata-only mode.
Explore the API →transcribe_video(video_url)Speech-to-text for non-YouTube videos (Instagram, TikTok, Facebook, X, Vimeo, Rumble, Loom, Dailymotion). Handles 99+ languages and Instagram carousels in a single call — no yt-dlp or GPU setup on your side.
Explore the API →search_videos(query, filters)AI-powered video discovery across the open web. The agent passes a natural-language query plus optional filters (year, duration, focus, purpose) and gets back ranked videos with transcripts, summaries, key subjects, and best-moment timestamps.
Explore the API →analyze_video(video_url, analysis_type)Summary or free-form Q&A over a previously-ingested video. Transcript is cached under the hood so follow-ups are cheap. Use analysis_type="question" with a custom question to grounded-answer over the spoken content.
Explore the API →extract_video_data(video_url, schema)Schema-driven structured extraction. Pass a JSON Schema or simplified field map — get typed fields like { product_name, price_usd, claims[], mentioned_people[] }. Auto-transcribes non-YouTube videos when no transcript exists.
Explore the API →get_tweet_statement(tweet_id)Given an X/Twitter tweet ID, pulls the tweet plus any attached video, transcribes the media, and extracts a concise claim as a statement_query. Feed it straight into search_videos with purpose="support" or "oppose" to fact-check the claim.
Explore the API →answer_followup_question(video_url, question)Cheap, stateless follow-up Q&A for a video the agent has already analyzed. Reuses the cached transcript and summary, so you can drive multi-turn reasoning loops without re-paying for ingestion on every question.
Explore the API →get_usage()Free diagnostic tool: monthly credits, per-service activity counts, storage, and channels indexed. Lets your agent graph reason about its own budget and back-off before hitting a 402 Payment Required at runtime.
Explore the API →Registering the transcript tool in LangChain
Ten lines to give your LangChain agent native video-URL ingestion. The same pattern translates to Autogen register_function, CrewAI BaseTool, and Claude/OpenAI tool declarations.
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field
import httpx, os
class TranscriptInput(BaseModel):
url: str = Field(description="URL of the video to transcribe")
def fetch_transcript(url: str) -> dict:
r = httpx.post(
"https://api.vidnavigator.com/v1/transcript/youtube",
headers={"X-API-Key": os.environ["VIDNAVIGATOR_API_KEY"]},
json={"video_url": url, "language": "en"},
timeout=120,
)
r.raise_for_status()
return r.json()["data"]
transcript_tool = StructuredTool.from_function(
name="fetch_transcript",
description="Fetch a timestamped transcript for any video URL.",
func=fetch_transcript,
args_schema=TranscriptInput,
)Proven agent patterns
Research agent
Browser tool discovers candidate videos → get_video_transcript fans out in parallel → synthesizer agent writes a cited summary with deep-links to timestamps.
Fact-check agent
Tweet in → get_tweet_statement extracts the claim → search_videos with purpose="support" or "oppose" finds supporting / contradicting timestamps → extract_video_data pulls structured quote + speaker into JSON.
Content-ops agent
search_videos maps a topic across the open web → extract_video_data pulls titles / hooks / CTA patterns into structured JSON → agent drafts scripts grounded in observed data.
Support / QA agent
User submits a bug-report video → transcribe_video turns it into text → extract_video_data pulls {os, error_message, steps_to_reproduce} against your schema → routed to the right queue.
Frequently asked questions
Give your agents video natively.
One API key. Eight primitives. Nine platforms. REST + MCP. Ready to register as tool-calls today.