Video-first retrieval, without the infra bill
A cleanly-normalized transcript is the hardest part of a video-RAG system. We do that part — for YouTube, TikTok, Instagram, Facebook, X, Rumble, Vimeo, Dailymotion, and Loom — at unit economics that make indexing 100k videos actually feasible.
- What is VidNavigator for RAG pipelines?
- VidNavigator for RAG pipelines is a video-first ingestion and retrieval layer that converts any video URL into timestamped transcript segments, normalized across nine platforms and ready for your chunker, embedder, and vector database. Per-transcript pricing can be as little as $0.000025 on the $300 credit pack, keeping per-video cost near zero so video RAG stays economically viable at scale.
Where VidNavigator fits in your pipeline
Ingest
POST a video URL. Receive segmented transcript JSON with start / end timestamps, language, and metadata. Nine platforms covered behind one endpoint, 99+ languages supported, one consistent response shape whether the source is captioned or not.
Chunk
Our segments are naturally 2–4 seconds each — small enough for any chunker. Group them into 300–600 token windows with overlap and carry {video_id, start_sec, end_sec} as metadata on every chunk.
Embed
Plug into text-embedding-3-large, voyage-3, BGE-M3, or whatever your stack runs today. We deliberately return normalized plain text so embedder quality is the only moving piece.
Retrieve
Index in pgvector, Qdrant, Pinecone, Weaviate — whichever already lives in your stack. Your retrieval layer pairs naturally with BM25 because spoken content benefits from hybrid search.
Ground
Timestamps carry through every stage, so your generation prompt can cite [video_id:start_sec] and your UI can render a deep-link back into the exact second that produced the answer.
From URL to indexed chunks in ~40 lines
import httpx, os
from pgvector.psycopg import register_vector
import psycopg
from openai import OpenAI
client = OpenAI()
conn = psycopg.connect(os.environ["PG_URL"])
register_vector(conn)
cur = conn.cursor()
def ingest(video_url: str):
# 1. Transcript
r = httpx.post(
"https://api.vidnavigator.com/v1/transcript/youtube",
headers={"X-API-Key": os.environ["VIDNAVIGATOR_API_KEY"]},
json={"video_url": video_url, "language": "en"},
timeout=120,
)
data = r.json()["data"]
segments = data["segments"]
video_id = data["video_id"]
# 2. Chunk (500 tokens, ~50 token overlap)
chunks, buf, buf_start = [], [], segments[0]["start"]
for s in segments:
buf.append(s)
if sum(len(x["text"]) for x in buf) > 2000:
chunks.append({
"video_id": video_id,
"start_sec": buf_start,
"end_sec": buf[-1]["end"],
"text": " ".join(x["text"] for x in buf),
})
buf, buf_start = buf[-3:], buf[-3]["start"]
# 3. Embed + insert
for c in chunks:
emb = client.embeddings.create(
model="text-embedding-3-large",
input=c["text"],
).data[0].embedding
cur.execute(
"INSERT INTO video_chunks (video_id, start_sec, end_sec, text, embedding)"
" VALUES (%s, %s, %s, %s, %s)",
(c["video_id"], c["start_sec"], c["end_sec"], c["text"], emb),
)
conn.commit()Frequently asked questions
Drop video into your RAG stack without writing the ingestion layer.
One API key, nine platforms, segmented timestamped JSON that your existing chunker and vector DB speak natively.