ForAI Agents

Video tools, ready to be LLM tool-calls

Stop gluing platform scrapers and speech-to-text APIs into your agent graph. VidNavigator gives your agents one API key and eight clean primitives — transcript retrieval, speech-to-text, semantic search, analysis, structured extraction, tweet-claim analysis, cached follow-ups, and usage introspection — across YouTube, TikTok, Instagram, Facebook, X, and four more platforms.

What is VidNavigator for AI agents?
VidNavigator for AI agents is a video-intelligence API and MCP server purpose-built to be registered as tools in LLM agent frameworks. It gives agents eight first-class primitives — transcript retrieval, speech-to-text, semantic search, analysis, structured extraction, tweet-claim analysis, cached follow-ups, and usage introspection — across nine platforms, returning JSON that tool-use loops can reason over directly.

Eight tools your agent can call

Each primitive below is exposed both as a REST endpoint (for direct HTTP tool-calls) and as an MCP tool via the hosted MCP server at https://api.vidnavigator.com/mcp, so it snaps into Claude Desktop, Cursor, Windsurf, Continue, or any framework that speaks MCP.

get_video_transcript(video_url)

Raw timestamped transcript for any public video URL — YouTube, TikTok, Instagram, Facebook, X, Vimeo, Rumble, Loom, Dailymotion. One endpoint, routed automatically, with optional language hint and metadata-only mode.

Explore the API →
transcribe_video(video_url)

Speech-to-text for non-YouTube videos (Instagram, TikTok, Facebook, X, Vimeo, Rumble, Loom, Dailymotion). Handles 99+ languages and Instagram carousels in a single call — no yt-dlp or GPU setup on your side.

Explore the API →
search_videos(query, filters)

AI-powered video discovery across the open web. The agent passes a natural-language query plus optional filters (year, duration, focus, purpose) and gets back ranked videos with transcripts, summaries, key subjects, and best-moment timestamps.

Explore the API →
analyze_video(video_url, analysis_type)

Summary or free-form Q&A over a previously-ingested video. Transcript is cached under the hood so follow-ups are cheap. Use analysis_type="question" with a custom question to grounded-answer over the spoken content.

Explore the API →
extract_video_data(video_url, schema)

Schema-driven structured extraction. Pass a JSON Schema or simplified field map — get typed fields like { product_name, price_usd, claims[], mentioned_people[] }. Auto-transcribes non-YouTube videos when no transcript exists.

Explore the API →
get_tweet_statement(tweet_id)

Given an X/Twitter tweet ID, pulls the tweet plus any attached video, transcribes the media, and extracts a concise claim as a statement_query. Feed it straight into search_videos with purpose="support" or "oppose" to fact-check the claim.

Explore the API →
answer_followup_question(video_url, question)

Cheap, stateless follow-up Q&A for a video the agent has already analyzed. Reuses the cached transcript and summary, so you can drive multi-turn reasoning loops without re-paying for ingestion on every question.

Explore the API →
get_usage()

Free diagnostic tool: monthly credits, per-service activity counts, storage, and channels indexed. Lets your agent graph reason about its own budget and back-off before hitting a 402 Payment Required at runtime.

Explore the API →

Registering the transcript tool in LangChain

Ten lines to give your LangChain agent native video-URL ingestion. The same pattern translates to Autogen register_function, CrewAI BaseTool, and Claude/OpenAI tool declarations.

from langchain.tools import StructuredTool
from pydantic import BaseModel, Field
import httpx, os

class TranscriptInput(BaseModel):
    url: str = Field(description="URL of the video to transcribe")

def fetch_transcript(url: str) -> dict:
    r = httpx.post(
        "https://api.vidnavigator.com/v1/transcript/youtube",
        headers={"X-API-Key": os.environ["VIDNAVIGATOR_API_KEY"]},
        json={"video_url": url, "language": "en"},
        timeout=120,
    )
    r.raise_for_status()
    return r.json()["data"]

transcript_tool = StructuredTool.from_function(
    name="fetch_transcript",
    description="Fetch a timestamped transcript for any video URL.",
    func=fetch_transcript,
    args_schema=TranscriptInput,
)

Proven agent patterns

Research agent

Browser tool discovers candidate videos → get_video_transcript fans out in parallel → synthesizer agent writes a cited summary with deep-links to timestamps.

Fact-check agent

Tweet in → get_tweet_statement extracts the claim → search_videos with purpose="support" or "oppose" finds supporting / contradicting timestamps → extract_video_data pulls structured quote + speaker into JSON.

Content-ops agent

search_videos maps a topic across the open web → extract_video_data pulls titles / hooks / CTA patterns into structured JSON → agent drafts scripts grounded in observed data.

Support / QA agent

User submits a bug-report video → transcribe_video turns it into text → extract_video_data pulls {os, error_message, steps_to_reproduce} against your schema → routed to the right queue.

FAQ

Frequently asked questions

Give your agents video natively.

One API key. Eight primitives. Nine platforms. REST + MCP. Ready to register as tool-calls today.

Related