Embeddings

Semantic search and vector representations for AI applications

Embedding models for semantic search, RAG, and clustering

Embedding models turn text — or sometimes images, code, or audio — into a fixed-length vector of floating-point numbers. Similar inputs land close together in the embedding space, dissimilar inputs land far apart. Reach for embeddings when building semantic search, retrieval-augmented generation (RAG), recommendations, or clustering.

Top embeddings picks

Hand-picked across four common criteria — resolved against the live catalog so the picks track price and performance changes.

Best overall
Text Embedding 3 Large

OpenAI's most powerful embedding model. 3072 dimensions for maximum accuracy.

Learn more
Cheapest
Jina Embeddings v3 (Multilingual)

Jina's frontier multilingual embedding model. 570M params, 8192 ctx, 89 languages, Matryoshka dims 128-1024.

Learn more
Highest dimensions
Voyage AI voyage-3

Voyage's general-purpose embedding model. 1024 dims, 32k context, strong retrieval performance.

Learn more
Fastest
Text Embedding 3 Small

OpenAI's compact embedding model. 1536 dimensions, great for semantic search and RAG.

Learn more

Pricing is per-token, similar to text generation but typically 10-100× cheaper. Flagship models (OpenAI text-embedding-3-large, Voyage 3, Cohere Embed v3) cost €0.05-€0.15 per million tokens. Open-weights options (Jina V3, BGE, MxBai) cost effectively nothing to run on your own infrastructure. A typical RAG corpus of 10 million tokens (around 20,000 documents) costs €0.50-€1.50 to embed once. Re-embedding on every model upgrade is the main long-tail cost.

The trade-off is dimension, recall, and price. Higher-dimensional embeddings (3,072 or 4,096 dims) capture more nuance but cost more to store and search. Lower-dimensional models (256-768 dims) cost ten times less and still recover the right document 90-95% of the time on most workloads. Use the high-dim flagship when retrieval quality is mission-critical (legal search, medical Q&A); use a budget model when you can tolerate the occasional missed result.

Watch out for chunk size: most embedding models perform best on chunks of 200-500 tokens. Embed an entire 50-page document as one vector and you lose the per-section meaning. Embed too small (under 50 tokens) and individual chunks become noisy. Pick a chunker that respects paragraph boundaries and adds a small overlap (10-20%) between chunks.

Watch out for multilingual mismatch: not every embedding model speaks every language equally. If your corpus is multilingual, pick a model whose training data covers your languages — Jina V3, Cohere Multilingual, and Voyage Multilingual are the safe defaults.

Top picks above cover the highest-recall flagship, the cheapest production model, the highest-dimensional option, and the fastest indexer.

Related comparisons

Side-by-side reviews of the most-compared models in this category.

Frequently asked questions

Start Building with AI

Access all models through a single API. Get free credits when you sign up — no credit card required.