275+ AI Models

AI Models

Browse and explore all available AI models.

All AI models, one API — your unified catalog

Railwail gives you a single OpenAI-compatible endpoint that talks to every major AI model on the market — GPT-5 and Claude 4.6 Sonnet for reasoning, Gemini 3 Pro for the longest contexts, FLUX 1.1 Pro for photorealistic images, Veo 3 for video with synced audio, Whisper Large V3 for speech-to-text, ElevenLabs V3 and Cartesia Sonic for voice, Voyage 3 for embeddings, π-0 and OpenVLA for robotics. You pick a model, change one parameter in your request, and ship. No new SDK, no new auth flow, no provider lock-in — the catalog above lists every model we route to, with live per-token prices in EUR and the SLAs we observe in production.

The pricing is transparent and on-demand: you see the input and output rate before you call, you pay per token (or per call, or per second for video and audio), and there are no monthly minimums, no seat fees, and no surprise overage charges. Every new account starts with free credits so you can run real workloads — not just hello-world prompts — before deciding which model fits your product. Switching between flagships is a one-line change: replace `model: "gpt-5"` with `model: "claude-4-6-sonnet"` or `model: "gemini-3-pro"` and the rest of your code keeps working. That same surface covers cheap, fast budget tiers like GPT-5 Mini, Claude Haiku, Gemini Flash, DeepSeek V3, and Qwen 2.5 Coder when latency or unit cost matters more than peak quality.

The infrastructure runs in EU data centers under a DPA — DSGVO-compliant by default, no training on customer prompts, and per-provider data-residency guarantees listed on every model card so your compliance team can sign off without a six-week review. Compared with OpenRouter or Together AI, the differentiation is European hosting, EUR billing, and provider-failover routing that automatically reroutes a request to a healthy backend when a single provider has a regional incident. The catalog covers eight categories — text, image, video, audio, speech-to-text, embeddings, code, multimodal, and vision-language-action robotics — so a single integration handles your chatbot, your image pipeline, your transcripts, and your RAG retriever without juggling five SDKs.

Top picks across all categories

Cheapest
GPT-5.4 Mini

€1.00 / 1M input tokens

Learn more
Fastest
Text Embedding 3 Small

500ms p50 latency

Learn more
Most popular
Claude Opus 4

Featured this month

Learn more

275 models available

Claude Opus 4

Text & ChatAnthropic
NewPopular

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free5.0s
flagshipreasoningagentic

Claude Opus 4.7

MultimodalAnthropic
NewPopular

Anthropic's April 2026 flagship. 87.6% on SWE-bench Verified, 3x higher image resolution, output self-verification, vision + reasoning.

Free
anthropicflagshipreasoning

Claude Sonnet 4

Text & ChatAnthropic
Popular

Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.

Free3.0s
popularcodinganalysis

Claude Sonnet 4.6

MultimodalAnthropic
NewPopular

Anthropic's balanced mid-tier model from February 2026. Best price/performance for production workloads: 5x cheaper than Opus, near-flagship quality.

Free
anthropicbalancedproduction

Codestral

CodeMistral AI
NewPopular

Mistral's code-specialized model. Optimized for code generation, completion, and understanding across 80+ languages.

Free1.5s
codingfastmultilanguage

DeepSeek V3.1

Text & ChatDeepSeek
Popular

DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.

Free
deepseekopen-weightsmoe

DeepSeek V4 Pro

Text & ChatDeepSeek
NewPopular

DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.

Free
deepseekopen-weightsmoe

Depth Anything v2

MultimodalReplicate
Popular

Monocular depth-estimation model trained on 595k labeled and 62M unlabeled images. Strong zero-shot generalization in indoor and outdoor scenes.

€0.005
replicatedepthvision-understanding

ElevenLabs Multilingual V2

TTSElevenLabs
Popular

ElevenLabs' most natural-sounding TTS model. Supports 29 languages with emotional range.

€1.003.0s
naturalmultilingualpopular

Flux 1.1 Pro Ultra

ImageBlack Forest Labs
Popular

FLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.

€0.6015.0s
high-qualityphotorealistic

Flux Dev

ImageBlack Forest Labs
Popular

Black Forest Labs' development model. Fast, high-quality image generation with LoRA support.

€0.5010.0s
popularfastlora

Gemini 2.0 Flash

Text & ChatGoogle DeepMind
NewPopular

Google's fastest multimodal model. Supports text, images, audio, and video input.

Free1.2s
fastmultimodalaffordable

Gemini 2.5 Pro

Text & ChatGoogle DeepMind
NewPopular

Google's latest thinking model. Excels at reasoning, coding, math, and science with massive context window.

Free4.0s
reasoningcodingmultimodal

Gemini 3 Flash

MultimodalGoogle DeepMind
NewPopular

Google's April 2026 fast multimodal model. Combines Gemini 3 Pro's reasoning with Flash-tier latency and price. Default model in the Gemini app.

Free
googledeepmindbalanced

Gemini 3.1 Pro

MultimodalGoogle DeepMind
NewPopular

Google DeepMind's February 2026 flagship. 2M-token context, native multimodal (text/image/audio/video), Deep Think reasoning.

Free
googledeepmindflagship

Google Imagen 4

ImageGoogle DeepMind
Popular

Google's Imagen 4. Text-to-image with strong photorealism and improved typography support.

€0.04
googleimagentext-to-image

Google Imagen 4 Ultra

ImageGoogle DeepMind
Popular

Premium Imagen 4 tier. Highest fidelity, prompt adherence and typography quality from Google.

€0.06
googleimagentext-to-image

Google Veo 2

VideoGoogle DeepMind
Popular

Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.

€5.00120.0s
high-qualitypopular

Google Veo 3

VideoGoogle DeepMind
Popular

Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.

€0.7592.0s
googleveotext-to-video

Google Veo 3.1

VideoGoogle DeepMind
NewPopular

Latest Veo with image-to-video and context-aware audio

€6.0092.0s
popularaudioi2v

GPT-4.1

Text & ChatOpenAI
NewPopular

OpenAI's newest flagship model. Improved reasoning, instruction following, and coding over GPT-4o.

Free2.5s
popularcodingreasoning

GPT-4o

Text & ChatOpenAI
Popular

OpenAI's most capable multimodal model. Excellent for complex reasoning, coding, and creative tasks.

Free2.0s
popularfastmultimodal

GPT-5.4

MultimodalOpenAI
NewPopular

OpenAI's unified flagship combining GPT and o-series reasoning into one model. 1M context, multimodal, top SWE-Bench Pro and OSWorld scores.

Free
openaiflagshipreasoning

GPT-5.4 Mini

MultimodalOpenAI
NewPopular

OpenAI's efficient mid-tier model. 2x faster than its predecessor, 400k context, approaches GPT-5.4 quality on SWE-Bench Pro at a fraction of the cost.

Free
openaibalancedcost-efficient

Grok 4

Text & ChatxAI
Popular

xAI's flagship reasoning model with vision and tool use. 256k context, strong at complex reasoning and STEM tasks.

Free
xaiflagshipreasoning

Grok 4.3

MultimodalxAI
NewPopular

xAI's May 2026 flagship. 1M context, vision, always-on reasoning, real-time X/web retrieval via DeepSearch.

Free
xaiflagshipreasoning

Ideogram 3.0

ImageIdeogram
Popular

Ideogram's flagship text-to-image model with industry-leading text rendering and prompt adherence.

€0.0915.0s
ideogramtext-to-imagetypography

Kimi K2 (Moonshot)

Text & ChatCustom
Popular

Moonshot AI's 1T-parameter MoE model. Industry-leading agentic coding and tool-use benchmarks.

Free
moonshotkimimoe

Kling v3

VideoReplicate
NewPopular

Cinematic video up to 15s with multi-shot and native audio

€2.00120.0s
popularaudioi2v

Kling v3 Omni

VideoReplicate
NewPopular

Most versatile: multi-reference images, video editing, native audio

€2.50120.0s
popularaudioi2v

Midjourney V7

ImageReplicate
NewPopular

The latest Midjourney model. Industry-leading aesthetic quality and prompt adherence for image generation.

€3.0030.0s
high-qualityaestheticpopular

MiniMax-01

Text & ChatMinimax
Popular

MiniMax's 456B hybrid lightning-attention model with native 4M-token context. Industry-leading long-context.

Free
minimaxlong-contextlightning-attention

MusicGen

AudioMeta
Popular

Meta's music generation model. Generate up to 1 minute of music from text descriptions.

€1.5030.0s
musicpopular

o3-mini

Text & ChatOpenAI
NewPopular

OpenAI's reasoning model optimized for STEM tasks, coding, and math. Uses chain-of-thought reasoning.

Free10.0s
reasoningcodingmath

OpenAI Sora 2

VideoOpenAI
Popular

OpenAI's second-generation Sora video model. Realistic motion, improved physics, audio support.

€0.50
openaisoratext-to-video

Perplexity Sonar Pro

Text & ChatCustom
Popular

Perplexity's premium web-grounded search model with multi-step reasoning over live sources.

Free
perplexityweb-searchcitations

Qwen 3 235B Instruct

Text & ChatAlibaba / Qwen
Popular

Alibaba's Qwen 3 flagship MoE: 235B total / 22B active. Strong reasoning and tool use, open-weights.

Free
qwenalibabamoe

Runway Gen 4.5

VideoReplicate
NewPopular

Top-ranked for motion quality and visual fidelity

€1.0030.0s
populartop-quality

SAM 2 (Segment Anything 2)

MultimodalMeta
Popular

Meta Segment Anything 2. Promptable segmentation across images and video with temporal memory. Zero-shot, point/box/mask prompts, fast on a single H100.

€0.01
replicatesegmentationmeta

Sora

VideoOpenAI
NewPopular

OpenAI video generation model. Create realistic and imaginative videos from text prompts up to 20 seconds.

€1.00180.0s
popularhigh-qualityopenai

Text Embedding 3 Large

EmbeddingOpenAI
Popular

OpenAI's most powerful embedding model. 3072 dimensions for maximum accuracy.

Free600ms
high-quality

Voyage AI voyage-3

EmbeddingCustom
Popular

Voyage's general-purpose embedding model. 1024 dims, 32k context, strong retrieval performance.

Free
voyageembeddingretrieval

Whisper Large V3

STTOpenAI
Popular

OpenAI's Whisper model. State-of-the-art speech recognition supporting 99+ languages.

€0.305.0s
multilingualpopular

Whisper Large v3 Turbo

STTOpenAI
Popular

OpenAI's distilled Whisper Large v3. ~216x realtime, 99+ languages, MIT-licensed weights.

€0.006
openaiwhisperstt

AI21 Jamba 1.5 Large

Text & ChatCustom

AI21's flagship hybrid Mamba-Transformer model with a 256k context window for long-document tasks.

Free
ai21long-contextmamba

AI21 Jamba 1.5 Mini

Text & ChatCustom

Cost-efficient hybrid Mamba-Transformer model with 256k context. Tuned for high-throughput RAG.

Free
ai21long-contextmamba

AnimateDiff

VideoCommunity

Plug-and-play motion module that animates personalized Stable Diffusion models without further training. 16-frame clips at 512x512.

€0.04
replicateanimationanimatediff

AnimateDiff Evolved

VideoReplicate

Community fork of AnimateDiff with improved motion modules, beta scheduler control and ControlNet integration for richer animation control.

€0.05
replicateanimationanimatediff

AnimateDiff Lightning

VideoByteDance

ByteDance distillation of AnimateDiff. 4-step sampling for over 10x faster inference at comparable quality to multi-step base model.

€0.02
replicateanimationbytedance

AudioCraft

TTSReplicate

Meta's AudioCraft framework wrapping MusicGen, AudioGen and EnCodec. Unified text-to-audio research toolkit for music and sound effects.

€0.01
metamusic-generationsound-effects

AudioLDM 2

TTSAudioLDM

Latent-diffusion model for general-purpose text-to-audio. Generates speech, music, and sound effects with a unified prior.

€0.01
audioldmmusic-generationdiffusion

AuraFlow v0.3

ImageFal.ai

fal.ai's fully open-source 6.8B flow-based text-to-image model. Up to 1536x1536 resolution.

Free
auraflowtext-to-imageopen-weights

Bark

AudioSuno

Suno's text-to-audio model. Generates realistic speech, music, and sound effects.

€0.5015.0s
speechsound-effects

BRIA RMBG-1.4

ImageReplicate

BRIA's first commercial-safe background-removal model. Trained on fully-licensed data, suitable for production e-commerce and design pipelines.

€0.03
replicatebackground-removalbria

BRIA RMBG-2.0

ImageReplicate

BRIA's professional background-removal model trained on fully-licensed data. Commercial-safe.

€0.04
briaimage-editbackground-removal

Cartesia Sonic

TTSCustom

Cartesia's ultra-low-latency TTS (~90ms TTFB). State-space model with voice cloning support.

Free
cartesiattslow-latency

CCSR (Content-Consistent SR)

ImageReplicate

Content-Consistent Super-Resolution model. Reduces hallucination compared to typical diffusion-based upscalers while keeping perceptual quality high.

€0.04
replicateupscalingimage-restore

Champ Human Animation

VideoCommunity

Champ controllable human image animation. Uses 3D parametric guidance (SMPL) for realistic full-body motion transfer from a single reference image.

€0.12
replicateanimationhuman-motion

Clarity Upscaler

ImageCommunity

High-resolution image upscaler with creative detail re-imagination via SD-based hallucination. Strong for photography and product shots.

€0.04
replicateupscalingcreative

Claude Haiku 3.5

Text & ChatAnthropic

Anthropic's fast and affordable model. Great for quick tasks, summarization, and simple coding.

Free1.0s
fastaffordable

Claude Haiku 4.5

MultimodalAnthropic
New

Anthropic's fastest and cheapest 4.x model. Strong vision and tool use at ultra-low latency, ideal for high-concurrency workloads.

Free
anthropiccost-efficientlow-latency

CodeFormer

ImageCommunity

Robust face-restoration model using a transformer-based codebook prior. Handles severe degradation, occlusion, and old-photo restoration with adjustable fidelity-quality tradeoff.

€0.002
replicateface-restoreupscaling

CogVideoX-5B (open)

VideoReplicate

Zhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.

Free
zhiputsinghuacogvideox

CogVLM2 19B

MultimodalReplicate

Tsinghua CogVLM2 19B with Llama-3 8B base plus 11B vision expert. Strong document understanding and visual reasoning, 8k context.

€0.01
replicatemultimodalvision-understanding

Cohere Aya 23 35B

Text & ChatCustom

Open-weights multilingual research model from Cohere covering 23 languages. 35B parameters.

Free
coheremultilingualopen-weights

Cohere Command Light (legacy)

Text & ChatCohere

Cohere's fast lightweight chat model (deprecated Sep 2025). Kept as comparison tombstone.

Free
coherelegacydeprecated

Cohere Command R (08-2024)

Text & ChatCohere

Cohere's mid-tier RAG/tool model. Cost-efficient sibling of Command R+ with 128k context.

Free
cohereragtools

Cohere Command R+ (08-2024)

Text & ChatCohere

Cohere's flagship RAG- and tool-optimized chat model. 128k context, refreshed August 2024.

Free
cohereragtools

Cohere embed-multilingual-v3

EmbeddingCustom

Cohere's multilingual embedding model. Supports 100+ languages with separate search and classification modes.

Free
cohereembeddingmultilingual

ControlNet Canny

ImageReplicate

ControlNet conditioned on Canny edge maps. Preserves composition and outlines while restyling with Stable Diffusion 1.5 or SDXL backbones.

€0.01
replicatestyle-transferimage-edit

ControlNet Depth

ImageReplicate

ControlNet conditioned on depth maps. Preserves the 3D scene layout while letting the prompt change style, lighting and content.

€0.01
replicatestyle-transferimage-edit

DALL-E 3

ImageOpenAI

OpenAI's latest image generation model. Excellent at following complex prompts with high fidelity.

€4.0015.0s
high-qualityprompt-following

Deepgram Nova-3

STTCustom

Deepgram's flagship STT. First to offer realtime multilingual transcription with self-serve customization.

€0.004
deepgramstttranscription

DeepSeek Coder V2

CodeDeepSeek

DeepSeek's specialized coding model. Excellent at code generation, debugging, and explanation.

Free2.0s
codingaffordable

DeepSeek R1

Text & ChatDeepSeek
New

DeepSeek's reasoning model with chain-of-thought capabilities. Excellent for complex problem-solving.

Free8.0s
reasoningmath

DeepSeek V3

Text & ChatDeepSeek

Powerful open-weight model from DeepSeek. Strong at coding, math, and Chinese/English tasks.

Free2.0s
affordablecoding

DeepSeek V4 Flash

Text & ChatDeepSeek
New

Efficiency-optimized variant of DeepSeek V4. 284B MoE / 13B active, 1M context, ultra-low pricing for high-throughput workloads.

Free
deepseekopen-weightsmoe

DeepSeek-VL 7B

MultimodalReplicate

DeepSeek-VL 7B chat model. Vision-language model with hybrid vision encoder and strong real-world visual question answering performance.

€0.008
replicatemultimodalvision-understanding

Detectron2

MultimodalReplicate

Meta Detectron2 object-detection and segmentation toolkit. Mask R-CNN, Cascade R-CNN, panoptic FPN and many other model variants in one wrapper.

€0.008
replicatesegmentationvision-understanding

DINOv2

MultimodalReplicate

Meta DINOv2 self-supervised vision backbone. Pretrained features for classification, segmentation and depth without task-specific fine-tuning.

€0.005
replicatesegmentationvision-understanding

Donut Document

MultimodalReplicate

Naver CLOVA Donut OCR-free document-understanding transformer. End-to-end JSON extraction from forms, receipts and invoices without explicit OCR.

€0.008
replicateocrvision-understanding

Dots OCR

MultimodalReplicate

Rednote Hilab Dots OCR. End-to-end document parsing model with layout, text and reading-order prediction in one transformer.

€0.008
replicateocrvision-understanding

DreamGaussian

ImageReplicate

Generative Gaussian-splatting model for fast image-to-3D synthesis. Produces textured meshes in two minutes via differentiable rasterization.

€0.09
replicate3d-generationimage-to-3d

DreamGaussian 4D

VideoReplicate

4D Gaussian-splatting generator extending DreamGaussian to video. Image-conditioned dynamic 3D scenes with view-consistent motion.

€0.18
replicateanimation4d

DWPose

MultimodalReplicate

DWPose whole-body 2D pose estimator. Two-stage knowledge-distilled model with strong accuracy on face, hands and body keypoints simultaneously.

€0.005
replicateposevision-understanding

DynamiCrafter

VideoCommunity

Tencent DynamiCrafter. Animates still images into short videos preserving texture and structure, with strong open-domain coverage.

€0.09
replicateanimationimage-to-video

EasyOCR

MultimodalReplicate

JaidedAI EasyOCR. Simple Python OCR wrapper supporting 80+ languages with deep-learning text detection and recognition.

€0.002
replicateocrvision-understanding

EchoMimic

VideoReplicate

Ant Group EchoMimic. Lifelike audio-driven portrait animation with editable landmark conditioning for fine-grained motion control.

€0.10
replicatelipsyncant-group

Edge TTS

TTSCustom

Microsoft Edge neural voices accessed via the open-source edge-tts wrapper. 400+ voices across 100+ locales, suitable for batch generation.

Free
microsoftttsmultilingual

ElevenLabs Scribe v1

STTElevenLabs

ElevenLabs' STT. 99 languages, word-level timestamps, speaker diarization, audio-event tagging.

€0.004
elevenlabsscribestt

ElevenLabs v3 (alpha)

TTSElevenLabs

ElevenLabs' v3 alpha TTS. Most expressive voice model with audio tags and laughter, higher latency.

Free
elevenlabsttsexpressive

ESRGAN Classic

ImageReplicate

Enhanced Super-Resolution GAN, the original 2018 architecture. Produces sharp 4x upscales with strong perceptual quality on natural images.

€0.001
replicateupscalingesrgan

F5-TTS

TTSReplicate

Open-source flow-matching TTS with strong zero-shot voice cloning. Code MIT, weights CC-BY-NC.

Free
f5ttsopen-weights

FILM Frame Interpolation

VideoGoogle Research

Google FILM frame interpolation. Synthesizes high-quality intermediate frames between near-duplicate inputs, designed for large motion gaps.

€0.01
replicateupscaleframe-interpolation

Florence-2 Large

MultimodalMicrosoft

Microsoft Florence-2 Large. Unified prompt-based vision foundation model for captioning, detection, segmentation and OCR with a single 770M-param backbone.

€0.008
replicatemultimodalvision-understanding

Florence-2 Segmentation

MultimodalCommunity

Microsoft Florence-2 unified vision model with referring expression segmentation. Text-prompted region and mask generation in one model.

€0.009
replicatesegmentationvision-understanding

Flux Schnell

ImageBlack Forest Labs

The fastest Flux model. Generate images in under 2 seconds. Great for prototyping.

€0.032.0s
fastaffordable

FLUX.1 [Schnell]

ImageBlack Forest Labs

Black Forest Labs' fastest open-weights image model. Apache-2.0 licensed, ~1-4 step inference.

€0.003
fluxblack-forest-labsopen-weights

FLUX.1 Canny

ImageReplicate

FLUX structural control via Canny edge maps. Preserve composition while restyling.

€0.05
fluxblack-forest-labsimage-edit

FLUX.1 Depth

ImageReplicate

FLUX structural control via depth maps. Keep 3D scene layout while changing style/content.

€0.05
fluxblack-forest-labsimage-edit

FLUX.1 Fill

ImageReplicate

Black Forest Labs' inpainting/outpainting model for FLUX. Fill masked regions with prompt-guided content.

€0.05
fluxblack-forest-labsimage-edit

FLUX.1 Redux

ImageReplicate

FLUX image-variation adapter. Generate variations and remixes from a reference image.

€0.03
fluxblack-forest-labsimage-edit

Gemini Robotics (2025)

RoboticsGoogle DeepMind

Google DeepMind's vision-language-action model based on Gemini 2.0. Generalist robot policy with strong dexterity.

Free
googledeepmindgemini

Gemini Robotics-ER

RoboticsGoogle DeepMind

Embodied-reasoning variant of Gemini Robotics. Enhanced 3D spatial reasoning and trajectory planning.

Free
googledeepmindgemini

Get3D (NVIDIA)

ImageCustom

NVIDIA GET3D generative model for textured 3D shapes. Trained on category-specific datasets producing meshes with high-quality textures.

Free
nvidia3d-generationopen-weights

GFPGAN v1.4

ImageTencent ARC

Tencent ARC face-restoration GAN. Reconstructs realistic facial detail in low-quality or compressed photos using a pretrained StyleGAN2 prior.

€0.002
replicateface-restoreupscaling

GLPN Depth

MultimodalReplicate

Global-Local Path Networks depth-estimation model. Combines hierarchical transformer encoder with selective feature fusion for sharp boundaries.

€0.004
replicatedepthvision-understanding

Google RT-2-X

RoboticsGoogle DeepMind

Google's VLA from RT-X collaboration. Trained on Open-X-Embodiment (22 robots, 527 skills), positive transfer.

Free
googlevlarobotics

Google Veo 3 Fast

VideoGoogle DeepMind
New

Faster cheaper Veo 3 with audio

€3.2059.0s
fastaudio

Google Veo 3.1 Fast

VideoGoogle DeepMind
New

Faster Veo 3.1 with image-to-video and audio

€3.2059.0s
fastaudioi2v

GOT-OCR 2.0

MultimodalReplicate

StepFun GOT-OCR 2.0. Unified end-to-end OCR-2.0 model handling text, formulas, charts, sheet music and geometric shapes in one architecture.

€0.009
replicateocrvision-understanding

GPT-4o Mini

Text & ChatOpenAI

Small, fast, and affordable model for lightweight tasks. Great balance of speed and capability.

Free800ms
fastaffordable

GPT-5.4 Nano

MultimodalOpenAI
New

OpenAI's smallest and cheapest GPT-5.4 variant. Built for high-volume classification, extraction and coding subagents at edge-grade latency.

Free
openaicost-efficientlow-latency

Granite Code 20B

CodeReplicate

IBM Granite 20B Code Instruct. Larger Granite code model balancing quality and inference cost for enterprise CI/CD code-review automation.

€0.006
replicatecode-generationibm

Granite Code 34B

CodeReplicate

IBM Granite 34B Code Instruct. Largest Granite code-instruction model. Top-tier among Apache-2.0 code LLMs on HumanEval, MBPP and MultiPL-E.

€0.01
replicatecode-generationibm

Granite Code 3B

CodeReplicate

IBM Granite 3B Code Instruct. Apache-2.0 small code-instruction model. Strong on Python, Java, JavaScript and Go for enterprise IDE integrations.

€0.002
replicatecode-generationibm

Granite Code 8B

CodeReplicate

IBM Granite 8B Code Instruct. Trained on permissively-licensed code, strong on multi-language code completion and instruction-following.

€0.004
replicatecode-generationibm

Grok 2 Vision

MultimodalxAI

xAI's vision-capable Grok 2 snapshot. Image-in, text-out with strong multilingual instruction following.

Free
xaivisionlegacy

Grok 3

Text & ChatxAI
New

xAI's flagship model. Strong at reasoning, coding, and real-time knowledge with web search capabilities.

Free3.0s
reasoningreal-time

Grok 4.1 Fast

MultimodalxAI
New

xAI's cost-efficient high-throughput model. 2M context, optional reasoning, optimized for agentic loops and real-time apps.

Free
xaicost-efficientvision

Grok Imagine Video

VideoReplicate
New

xAI video with native audio and lip-sync, up to 15s

€1.5090.0s
audioi2vxai

Grounded-SAM

MultimodalReplicate

Grounding DINO plus SAM. Open-vocabulary text-prompted detection and segmentation in one pipeline for fully-automatic mask generation.

€0.01
replicatesegmentationvision-understanding

Hailuo / MiniMax Video-01

VideoCustom

MiniMax's Hailuo video-01. 6s 1280x720 clips with strong cinematic motion and physical realism.

€0.43
minimaxhailuotext-to-video

Hailuo 2.3

VideoMinimax
New

Minimax model for realistic human motion and VFX

€0.5060.0s
i2v1080p

HRNet Pose

MultimodalReplicate

Microsoft HRNet high-resolution pose-estimation backbone. Parallel multi-resolution streams yield strong accuracy on COCO keypoint benchmarks.

€0.005
replicateposevision-understanding

Hunyuan3D 2.0

ImageTencent

Tencent's Hunyuan3D 2.0 image-to-3D pipeline. Two-stage shape and texture generation producing high-resolution textured meshes.

€0.21
replicate3d-generationimage-to-3d

Hunyuan3D 2.1

ImageTencent
New

Refreshed Hunyuan3D 2.1 with improved texture fidelity and PBR-material support. Image-to-3D with textured GLB output.

€0.24
replicate3d-generationimage-to-3d

HunyuanVideo

VideoTencent

Tencent's 13B open-weights video diffusion transformer. SOTA among open video models at release.

Free
tencenthunyuantext-to-video

HunyuanVideo

VideoTencent

Tencent's open-source video generation model. Strong visual quality with diverse style support.

€2.00120.0s
open-source

Idefics3 8B

MultimodalReplicate

Hugging Face Idefics3 8B. Llama-3 based open-source vision-language model with strong document QA and chart-understanding performance.

€0.007
replicatemultimodalvision-understanding

Ideogram 2.0 Turbo

ImageIdeogram

Ideogram's fast text-to-image variant. Strong typography and logo rendering at low latency.

€0.05
ideogramtext-to-imagetypography

InstantMesh

ImageReplicate

Image-to-3D mesh generator from sparse-view diffusion. Produces textured meshes in under one minute on a single A100.

€0.12
replicate3d-generationimage-to-3d

InstructPix2Pix

ImageReplicate

Berkeley InstructPix2Pix. Edits an image from natural-language instructions in a single forward pass. Trained on GPT-3 plus Stable Diffusion synthetic pairs.

€0.01
replicatestyle-transferimage-edit

InternVL 2.5

MultimodalReplicate

OpenGVLab InternVL 2.5 78B. Open-source vision-language model approaching GPT-4o on MMMU, OCRBench and Math-Vista benchmarks.

€0.03
replicatemultimodalvision-understanding

IP-Adapter FaceID Plus v2

ImageReplicate

Tencent's face-identity conditioning adapter for SD/SDXL. Face embedding + CLIP for ID-consistent generation.

Free
tencentimage-editface-id

Janus Pro 7B

ImageReplicate

DeepSeek's unified multimodal model. Decouples vision encoding for both understanding and generation tasks.

Free
deepseekjanusopen-weights

Jina Embeddings v3 (Multilingual)

EmbeddingCustom

Jina's frontier multilingual embedding model. 570M params, 8192 ctx, 89 languages, Matryoshka dims 128-1024.

Free
jinaembeddingmultilingual

Kling 1.6 Pro

VideoKuaishou

Kuaishou's Kling 1.6 Pro. Premium cinematic motion and physics realism, ~$0.07/sec.

€0.35
kuaishouklingtext-to-video

Kokoro TTS 82M

TTSReplicate

Open-weights 82M-parameter TTS. Punches above its size class on naturalness benchmarks at a fraction of the inference cost of larger models.

€0.002
kokorottsopen-weights

Kuaishou Kolors

ImageReplicate

Kuaishou's bilingual (CN/EN) latent diffusion text-to-image model with strong text rendering.

Free
kuaishoutext-to-imageopen-weights

LayoutLMv3

MultimodalMicrosoft

Microsoft LayoutLMv3 multimodal document model. Unified text/image masking pretraining for form understanding, receipts and document QA.

€0.007
replicateocrvision-understanding

LeRobot SmolVLA

RoboticsCustom

HuggingFace's 450M VLA pretrained on 487 community LeRobot datasets. Runs on consumer GPUs.

Free
huggingfacelerobotvla

LivePortrait

VideoCommunity

Kuaishou LivePortrait. Efficient portrait animation driven by reference videos with stitching, retargeting and motion-control parameters.

€0.08
replicatelipsynckuaishou

Llama 3.2 90B Vision (multimodal)

MultimodalMeta

Meta's flagship vision-language model. 90B parameters, image understanding + chat, strong VQA performance.

Free
metallamamultimodal

Llama 3.2 Vision 90B

MultimodalMeta

Meta Llama 3.2 90B Vision. Largest open-weights Llama vision model. Strong visual reasoning, chart, OCR and document understanding.

€0.02
replicatemultimodalvision-understanding

Llama 3.3 70B

Text & ChatMeta

Meta's open-source 70B parameter model. Strong all-around performance with multilingual support.

Free2.5s
open-sourcepopular

LLaVA-OneVision 72B

MultimodalReplicate

LMMs-Lab LLaVA-OneVision 72B. Unified single-image, multi-image and video instruction-tuned VLM with task-transfer across modalities.

€0.02
replicatemultimodalvision-understanding

Lotus-G

MultimodalReplicate

Lotus generative depth model. Treats depth as a generation task using a diffusion model, producing higher-fidelity depth on textured surfaces.

€0.01
replicatedepthvision-understanding

LTX-Video (Lightricks)

VideoReplicate

Lightricks' 2B DiT video model. Realtime generation on consumer GPUs (~6s @ H100, 24fps).

Free
lightricksltxtext-to-video

Luma Dream Machine v1.6

VideoCustom

Luma's Dream Machine 1.6. 720p text/image-to-video with strong motion and camera control.

€0.40
lumatext-to-videoimage-to-video

Luma Ray Flash 2

VideoLuma AI
New

Fast affordable video with I2V support

€0.5045.0s
fastbudgeti2v

M2M-100 12B

Text & ChatMeta

Meta M2M-100 12B many-to-many translation model. Direct translation between 100 languages without pivoting through English.

€0.006
replicatetranslationmeta

MADLAD-400 3B

Text & ChatGoogle DeepMind

Google MADLAD-400 3B multilingual translation model. 419 languages supported, trained on a 5T-token multilingual corpus with strong low-resource performance.

€0.004
replicatetranslationgoogle

MagicAnimate

VideoCommunity

ByteDance MagicAnimate. Temporally consistent human-image animation driven by a DensePose motion sequence with strong identity preservation.

€0.10
replicateanimationhuman-motion

Magicoder S CL 7B

CodeReplicate

UIUC Magicoder S CL 7B. CodeLlama-7B fine-tuned with OSS-Instruct synthetic data. Strong HumanEval Plus and MBPP Plus performance per parameter.

€0.003
replicatecode-generationopen-weights

MAGNeT MusicGen

TTSReplicate

Meta MAGNeT non-autoregressive music generator. Up to 7x faster than MusicGen with comparable quality via masked generative transformers.

€0.007
metamusic-generationmagnet

Magnific-Style Upscaler

ImageReplicate

Detail-hallucinating upscaler in the Magnific style. Adds plausible high-frequency texture using a Stable Diffusion refiner conditioned on the low-res input.

€0.06
replicateupscalingcreative

Marigold

MultimodalReplicate

ETH Zurich Marigold. Diffusion-based monocular depth-estimation model fine-tuned from Stable Diffusion with strong fine-detail recovery.

€0.01
replicatedepthvision-understanding

Marker PDF Extract

MultimodalReplicate

Marker PDF-to-Markdown conversion pipeline. Combines layout, OCR and equation models to produce clean Markdown with preserved tables and formulas.

€0.008
replicateocrvision-understanding

Mask2Former

MultimodalReplicate

Meta Mask2Former universal image-segmentation transformer. Single architecture for panoptic, instance and semantic segmentation tasks.

€0.009
replicatesegmentationvision-understanding

mBART 50 Many-to-Many

Text & ChatMeta

Meta mBART-50 many-to-many translation model. 50 supported languages with strong performance on news and conversational text.

€0.003
replicatetranslationmeta

MediaPipe Pose

MultimodalGoogle DeepMind

Google MediaPipe Pose. Lightweight on-device-friendly 33-keypoint 3D pose estimator with optional segmentation mask output.

€0.003
replicateposevision-understanding

Microsoft Phi-3.5 MoE Instruct

Text & ChatMicrosoft

Mixture-of-experts Phi-3.5: 42B total / 6.6B active params. 128k context, multilingual.

Free
microsoftopen-weightsmoe

MiDaS v3.1

MultimodalReplicate

Intel MiDaS v3.1 relative depth-estimation model. Robust zero-shot single-image depth across diverse domains and resolutions.

€0.004
replicatedepthvision-understanding

MiniCPM-V 2.6

MultimodalReplicate

OpenBMB MiniCPM-V 2.6. 8B vision-language model with strong single-image, multi-image and video understanding plus OCR capabilities.

€0.008
replicatemultimodalvision-understanding

Minimax Video

VideoMinimax

MiniMax's video generation model. Fast, high-quality video output with text-to-video capabilities.

€2.5090.0s
fastaffordable

Mistral Large

Text & ChatMistral AI

Mistral's flagship model. Strong reasoning, multilingual, and coding capabilities.

Free2.5s
multilingualcoding

Mistral OCR

MultimodalMistral AI

Mistral OCR API. Document-understanding model with strong table and equation extraction, and structured JSON output.

€0.001
mistralocrvision-understanding

Mistral Pixtral Large (124B)

MultimodalMistral AI

Mistral's 124B multimodal flagship. 123B decoder + 1B vision encoder, 128k ctx, up to 30 images per request.

Free
mistralpixtralmultimodal

MMPose

MultimodalReplicate

OpenMMLab MMPose toolbox. Wraps RTMPose, HRNet, HigherHRNet and many other pose models behind a unified inference API.

€0.006
replicateposevision-understanding

Mochi 1

VideoGenmo

Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.

Free
genmomochitext-to-video

MOFA-Video

VideoReplicate

Motion-Field-Adapter video generator. Controllable image animation from trajectories, keypoints or audio with a strong identity preservation prior.

€0.10
replicatelipsyncanimation

MuseTalk

VideoCommunity

Tencent MuseTalk real-time lip-sync model. Audio-driven mouth-region editing in latent space at 30+ fps on a single GPU.

€0.06
replicatelipsynctencent

MusicGen Large

TTSMeta

Meta's 3.3B-parameter MusicGen Large. Text-conditioned music generation with single-stage autoregressive transformer, supports melody conditioning.

€0.02
metamusic-generationopen-weights

MusicGen Medium

TTSMeta

Meta MusicGen Medium (1.5B params). Strong quality-to-speed tradeoff for text-to-music with optional melody guidance.

€0.01
metamusic-generationopen-weights

MusicGen Small

TTSMeta

Meta MusicGen Small (300M params). Fast text-to-music generation suitable for prototyping and low-latency demos.

€0.006
metamusic-generationopen-weights

mxbai-embed-large-v1

EmbeddingCustom

Mixedbread's open-source 335M embedding model. Top MTEB benchmark for English retrieval at release.

Free
mixedbreadembeddingopen-weights

NLLB-200 3B

Text & ChatMeta

Meta's No Language Left Behind 3.3B translation model. Direct translation between any pair of 200+ languages including many low-resource African and Asian languages.

€0.003
replicatetranslationmeta

NLLB-200 Distilled 600M

Text & ChatMeta

Meta's distilled 600M NLLB. Same 200-language coverage as the 3B model with a fraction of the parameters, ideal for edge or high-throughput deployment.

€0.002
replicatetranslationmeta

Nous Hermes 3 405B

Text & ChatTogether AI

Full-parameter fine-tune of Llama 3.1 405B by Nous Research. Steerable, uncensored, strong tool use.

Free
nousopen-weightstools

Nous Hermes 3 70B

Text & ChatTogether AI

Llama-3.1-70B fine-tune from Nous Research with strong tool/agent capabilities and uncensored alignment.

Free
nousopen-weightstools

NVIDIA Cosmos-Predict-1

RoboticsCustom

NVIDIA's world foundation model for physical AI. Diffusion-based video prediction for robotics simulation.

Free
nvidiacosmosvla

Octo Base

RoboticsUC Berkeley

Berkeley/Stanford 93M transformer diffusion policy. Pretrained on 800k Open-X-Embodiment episodes.

Free
berkeleystanfordvla

Octo Small

RoboticsUC Berkeley

Compact 27M variant of Octo. Faster inference on consumer GPUs, designed for low-latency control.

Free
berkeleyvlarobotics

olmOCR

MultimodalReplicate

Allen AI olmOCR. Open-source 7B vision-language model fine-tuned for high-fidelity document parsing including math, code and tables.

€0.01
replicateocrvision-understanding

OpenAI TTS-1

TTSOpenAI

OpenAI's text-to-speech model. Six built-in voices with natural intonation.

€0.602.0s
fastaffordable

OpenAI TTS-1 HD

TTSOpenAI

OpenAI's high-definition TTS model. Better quality for production use cases.

€1.204.0s
high-quality

OpenPose

MultimodalReplicate

CMU OpenPose multi-person 2D pose estimator. Real-time keypoint detection for body, hand, face and foot using Part Affinity Fields.

€0.005
replicateposevision-understanding

OpenVLA-7B

RoboticsOpenVLA

Stanford/Berkeley open VLA trained on 970k Open-X-Embodiment episodes. Supports LoRA fine-tuning.

Free
stanfordberkeleyvla

OpenVoice v1

TTSReplicate

MyShell OpenVoice v1. Cross-lingual voice cloning with flexible style control: emotion, accent, rhythm, pauses, and intonation.

€0.004
myshellttsvoice-cloning

OpenVoice v2

TTSReplicate

MyShell OpenVoice v2. Multilingual zero-shot voice cloning with accurate tone-color reproduction and style/emotion control.

€0.004
myshellttsvoice-cloning

PaddleOCR v3

MultimodalReplicate

Baidu PaddleOCR v3 PP-OCR pipeline. Lightweight detector plus recognizer optimized for production use with 80+ language support.

€0.003
replicateocrvision-understanding

Parler-TTS

TTSReplicate

Hugging Face Parler-TTS Mini. Lightweight TTS conditioned on a natural-language style description for fine-grained control over voice characteristics.

€0.003
parlerttshuggingface

Parler-TTS Large

TTSReplicate

Parler-TTS Large v1. 2.2B parameters, natural-language style prompting and improved prosody over the Mini variant.

€0.005
parlerttshuggingface

Perplexity Sonar

Text & ChatCustom

Perplexity's fastest and cheapest web-grounded chat model. Live-source citations included.

Free
perplexityweb-searchcitations

Perplexity Sonar Reasoning

Text & ChatCustom

Perplexity's reasoning model with chain-of-thought and integrated web search.

Free
perplexityweb-searchreasoning

Phi-3.5 Vision

MultimodalMicrosoft

Microsoft Phi-3.5 Vision Instruct. Small (4.2B) multimodal model with strong document, OCR and multi-image reasoning at low cost.

€0.005
replicatemultimodalvision-understanding

Phind CodeLlama 34B v2

CodeReplicate

Phind CodeLlama 34B v2. Highly tuned CodeLlama variant focused on retrieval-augmented developer assistant workflows.

€0.009
replicatecode-generationphind

PhotoMaker

ImageTencent ARC

Tencent ARC PhotoMaker. Identity-preserving stylized photo generation from a stacked-ID embedding. Realistic re-styling of a subject in seconds.

€0.03
replicatestyle-transferimage-edit

Physical Intelligence Pi-0-FAST

RoboticsPhysical Intelligence

Autoregressive π-0 variant using FAST action tokenizer. Faster inference at competitive task success.

Free
physical-intelligencevlarobotics

Physical Intelligence π-0

RoboticsPhysical Intelligence

Physical Intelligence's flagship VLA flow-matching policy. Generalist robot control, pretrained on 10k+ hrs robot data.

Free
physical-intelligencevlarobotics

Physical Intelligence π-0.5

RoboticsPhysical Intelligence

Upgraded π-0 with open-world generalization via knowledge insulation. Weights and fine-tuning open-sourced.

Free
physical-intelligencevlarobotics

Pika 2.0 (Official)

VideoPika

Pika Labs' 2.0 release. Cinematic text/image-to-video with scene composition controls.

€0.20
pikatext-to-videoimage-to-video

PixVerse v5.6

VideoReplicate
New

Physics-accurate video generation up to 1080p

€0.5060.0s
i2v1080pphysics

Playground v3 (Design)

ImagePlayground AI

Playground's text-to-image model focused on graphic design aesthetics and embedded typography.

Free
playgroundtext-to-imagedesign

PlayHT 2.0

TTSCustom

PlayHT's 2.0 generative voice model. Multi-lingual expressive speech synthesis with sub-second latency and high-fidelity voice cloning.

Free
playhtttsvoice-cloning

Point-E

ImageOpenAI

OpenAI Point-E text-to-point-cloud system. Fast 3D point-cloud generation from text, optionally lifted to a mesh via marching cubes.

€0.03
replicate3d-generationopenai

Qwen 2.5 72B

Text & ChatAlibaba / Qwen

Alibaba's powerful open-source model. Excellent at coding, math, and multilingual tasks.

Free2.5s
open-sourcecodingmultilingual

Qwen 2.5-Max

Text & ChatCustom

Alibaba's flagship pretrained MoE model. Top-tier reasoning and code performance via DashScope API.

Free
qwenalibabamoe

Qwen2-VL-72B Instruct

MultimodalAlibaba / Qwen

Alibaba's 72B vision-language model with M-RoPE and dynamic resolution. Strong document and video understanding.

Free
qwenalibabamultimodal

RDT-1B

RoboticsCustom

Tsinghua's 1B diffusion-transformer bimanual manipulation policy. Predicts next 64 actions per inference.

Free
tsinghuavlarobotics

Real-CUGAN

VideoCommunity

Real-CUGAN anime-focused upscaler. 2x/3x/4x super-resolution tuned for animation, line-art, and illustrated content.

€0.01
replicateupscaleanime

Real-ESRGAN 4x

ImageCommunity

AI-Upscaler that increases image resolution up to 4x while preserving texture and detail. Trained on synthetic and real data to reduce common ESRGAN artifacts.

€0.001
replicateupscalingimage-restore

Real-ESRGAN Anime 4x

ImageReplicate

Real-ESRGAN variant fine-tuned for anime, manga, and illustrated artwork. 4x upscaling with cartoon-aware artifact suppression.

€0.001
replicateupscalinganime

Recraft V3

ImageReplicate
New

State-of-the-art image generation optimized for design and branding. SVG vector output support.

€0.6012.0s
designvectorbranding

Recraft V3 Realistic

ImageRecraft

Recraft's high-prompt-adherence raster image model. Strong layout control and brand-style consistency.

€0.04
recrafttext-to-imagedesign

Recraft V3 SVG

ImageRecraft

Recraft's vector/SVG generation model. Editable illustrations and icons from text.

€0.08
recrafttext-to-svgvector

Reka Core

MultimodalCustom

Reka's frontier multimodal model supporting text, image, video and audio inputs.

Free
rekamultimodalvideo-understanding

Reka Edge

MultimodalCustom

Reka's small on-device-friendly multimodal model. ~7B parameters, 16k context.

Free
rekamultimodaledge

Reka Flash

MultimodalCustom

Reka's 21B dense multimodal model balancing speed and quality. Up to 128k context.

Free
rekamultimodalcost-efficient

Rembg

ImageCommunity

Open-source background-removal tool wrapping U2Net. Produces alpha mattes for photos, products and people with no manual masking.

€0.001
replicatebackground-removalmatting

RIFE Frame Interpolation

VideoReplicate

Real-Time Intermediate Flow Estimation. Doubles or quadruples FPS of an existing video via learned optical-flow-based frame interpolation.

€0.01
replicateupscaleframe-interpolation

Riffusion

TTSRiffusion

Stable-Diffusion-based real-time music generator. Operates on spectrogram images then resynthesizes audio, enables seamless transitions and looping.

€0.008
riffusionmusic-generationopen-weights

Runway Gen-3 Alpha Turbo

VideoCustom

Runway's faster, cheaper Gen-3 variant. Image-to-video at 5 credits/sec (~$0.05/sec).

€0.05
runwayimage-to-videofast

RVC Voice Conversion

TTSCommunity

Retrieval-based Voice Conversion. Converts a source recording into a target speaker's voice, preserving pitch, prosody and rhythm.

€0.006
rvcvoice-conversionvoice-cloning

SadTalker

VideoCommunity

Stylized audio-driven talking-head generator. Synthesizes 3D motion coefficients from audio to animate a single portrait image with natural head movements.

€0.07
replicatelipsynctalking-head

SAM HQ

MultimodalReplicate

ETH Zurich SAM-HQ. High-quality mask refinement on top of SAM. Sharper edges and finer structure than the original Segment Anything model.

€0.01
replicatesegmentationvision-understanding

SeamlessM4T v2 Large (Speech)

STTMeta

Meta SeamlessM4T v2 Large speech mode. Speech-to-speech, speech-to-text, and text-to-speech translation across 100+ languages in a single unified model.

€0.01
replicatetranslationmeta

SeamlessM4T v2 Large (Text)

Text & ChatMeta

Meta SeamlessM4T v2 Large. Universal multilingual translation across 100+ languages with text-to-text mode for documents and chat.

€0.006
replicatetranslationmeta

Seedance Lite

VideoByteDance
New

Budget ByteDance video, fast and cheap

€0.5070.0s
budgeti2vfast

Seedance Pro

VideoByteDance
New

ByteDance video with T2V and I2V, up to 1080p

€1.0095.0s
i2v1080p

Segformer B5

MultimodalReplicate

NVIDIA SegFormer-B5 semantic segmentation. Hierarchical transformer encoder with lightweight MLP decoder, strong ADE20k and Cityscapes results.

€0.007
replicatesegmentationvision-understanding

Shap-E (OpenAI)

ImageOpenAI

OpenAI Shap-E text/image to 3D. Generates implicit neural representations renderable as textured meshes or NeRFs.

€0.04
replicate3d-generationopenai

Snowflake Arctic Instruct

Text & ChatCustom

Snowflake's open MoE model: 480B total / 17B active params with dense+MoE hybrid architecture.

Free
snowflakemoeopen-weights

Spark TTS

TTSReplicate

Spark efficient TTS with disentangled control over speaker, content and style. Strong cross-lingual zero-shot performance.

€0.004
sparkttsvoice-cloning

Stable Audio 2

TTSUdio

Stability AI's Stable Audio 2.0. Text-to-music up to 3 minutes of full-length, structured tracks at 44.1 kHz.

Free
stabilitymusic-generationpricing-tbd

Stable Diffusion 3.5 Large (Stability)

ImageCustom

Stability AI's 8B-parameter flagship SD3.5 model. Strong prompt adherence and aesthetic quality.

€0.07
stabilitytext-to-imageopen-weights

Stable Diffusion 3.5 Large Turbo

ImageCustom

Distilled 4-step variant of SD3.5 Large. 8B params, ~4x faster inference at competitive quality.

€0.04
stabilitytext-to-imageopen-weights

Stable Diffusion 3.5 Medium

ImageCustom

Stability AI's 2.5B-parameter SD3.5 with strong quality/speed trade-off. Consumer-GPU friendly.

€0.04
stabilitytext-to-imageopen-weights

Stable Diffusion XL

ImageStability AI

Stability AI's SDXL model via Replicate. High-quality image generation with extensive customization.

€0.208.0s
open-sourcecustomizable

StarCoder2 15B

CodeBigCode

BigCode StarCoder2 15B code-generation flagship. Trained on 4T tokens of Stack v2 data with grouped-query attention and 16k context.

€0.005
replicatecode-generationbigcode

StarCoder2 3B

CodeBigCode

BigCode StarCoder2 3B code-generation model. Trained on The Stack v2, supports 600+ programming languages. Apache-2.0 licensed for commercial use.

€0.002
replicatecode-generationbigcode

StarCoder2 7B

CodeBigCode

BigCode StarCoder2 7B code-generation model. 16k context, 600+ programming languages, strong fill-in-the-middle (FIM) performance.

€0.003
replicatecode-generationbigcode

StreamingT2V

VideoReplicate

Picsart StreamingT2V. Generates long, consistent videos by chaining short autoregressive clips with motion and appearance memory.

€0.15
replicateanimationlong-form

StyleTTS 2

TTSReplicate

Style-based TTS using diffusion and adversarial training. Human-level naturalness in zero-shot voice synthesis from a 3-5s reference clip.

€0.004
stylettsttsvoice-cloning

Suno Bark

TTSSuno

Suno's text-prompted generative audio model. Speech, music, ambient sound and effects with non-verbal cues like laughter or sighs.

€0.01
sunobarkmusic-generation

SUPIR Upscaler

ImageCommunity

SUPIR (Scaling-Up Image Restoration) photo-real restoration model. Combines SDXL prior with language-guided controls for severely degraded inputs.

€0.06
replicateupscalingimage-restore

Swin2SR

ImageReplicate

Transformer-based image super-resolution using Swin-V2 attention. Handles classical, lightweight, real-world, and compressed-input variants with 2x/4x upscaling.

€0.002
replicateupscalingtransformer

SwinIR Video

VideoCommunity

SwinIR transformer-based super-resolution and denoising applied per-frame to video. Handles classic, real-world and lightweight upscaling.

€0.02
replicateupscaletransformer

T2I-Adapter Color

ImageReplicate

Tencent T2I-Adapter color-guided generation for SDXL. Lightweight adapter that conditions image generation on a color reference image.

€0.009
replicatestyle-transferimage-edit

Text Embedding 3 Small

EmbeddingOpenAI

OpenAI's compact embedding model. 1536 dimensions, great for semantic search and RAG.

Free500ms
affordablecompact

TII Falcon 180B Chat

Text & ChatTogether AI

TII's 180B causal decoder chat model fine-tuned on Ultrachat, Platypus and Airoboros.

Free
tiiopen-weightslegacy

ToonCrafter

VideoCommunity

Tencent ToonCrafter generative cartoon interpolation model. Synthesizes smooth in-between frames between two cartoon keyframes.

€0.08
replicateanimationtooncrafter

Tortoise TTS

TTSCommunity

Multi-voice expressive TTS. Slow but high-quality with strong prosody and natural intonation. Trained for long-form narration use cases.

€0.01
tortoisettsexpressive

TowerInstruct 13B

Text & ChatReplicate

Unbabel TowerInstruct 13B. Llama-2-based multilingual translation and post-editing model. Strong terminology consistency for enterprise localization.

€0.005
replicatetranslationunbabel

Transparent Background

ImageReplicate

PyTorch background-removal tool supporting multiple modes: base, fast and high-quality. Produces RGBA outputs and is suitable for batch processing.

€0.001
replicatebackground-removalopen-source

TRELLIS (3D)

ImageReplicate

Microsoft TRELLIS image-to-3D model. Generates textured 3D assets in GLB or Gaussian-splat format from a single reference image.

€0.18
replicate3d-generationimage-to-3d

TripoSR

ImageReplicate

Stability AI and Tripo single-image 3D reconstruction model. Generates 3D meshes from a single image in roughly half a second.

€0.03
replicate3d-generationimage-to-3d

TrOCR Large

MultimodalMicrosoft

Microsoft TrOCR large transformer-based OCR. End-to-end visual encoder plus text decoder, trained on synthetic and printed real-world data.

€0.004
replicateocrvision-understanding

U2Net Saliency

ImageReplicate

Salient-object detection network used for background removal and matting. Nested U-Net architecture trained on DUTS-TR for general scenes.

€0.001
replicatebackground-removalsaliency

Udio V1.5

AudioReplicate
New

AI music generation with studio-quality output. Generate full songs with vocals, instruments, and production.

€2.0060.0s
musicvocalshigh-quality

V-Express

VideoTencent

Tencent V-Express. Audio-driven portrait animation with progressive training, weak-condition learning, and expressive lip sync.

€0.09
replicatelipsynctencent

VideoCrafter

VideoCommunity

Tencent VideoCrafter latent video diffusion. Text-to-video and image-to-video generation up to 2s at 1024x576 with strong motion fidelity.

€0.07
replicateupscalevideo-generation

ViTPose

MultimodalReplicate

ViTPose plain-vision-transformer pose estimator. State-of-the-art keypoint accuracy on MS-COCO with a minimal architecture.

€0.006
replicateposevision-understanding

Voyage AI voyage-code-3

EmbeddingCustom

Voyage's code-specialized embedding model. Up to 32k context, Matryoshka 256-2048 dims, int8/binary support.

Free
voyageembeddingcode

Wan 2.1 (Alibaba)

VideoReplicate

Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.

Free
alibabawantext-to-video

Wan 2.2 Image-to-Video

VideoReplicate
New

Ultra-cheap I2V. Upload image and animate it.

€0.1030.0s
budgeti2vfast

Wan 2.2 Text-to-Video

VideoReplicate
New

Ultra-cheap T2V for pennies

€0.1030.0s
budgetfast

Wav2Lip

VideoCommunity

Lip-sync model that re-syncs a target video's lip movement to an arbitrary audio track. Robust to identity and language with a lip-sync discriminator loss.

€0.05
replicatelipsyncvideo-edit

WizardCoder 33B

CodeReplicate

WizardLM WizardCoder 33B v1.1. Evol-Instruct fine-tune of DeepSeek-Coder-33B with strong code-generation benchmark performance.

€0.009
replicatecode-generationwizardlm

XTTS v2

TTSCommunity

Coqui's XTTS v2 multilingual TTS with voice cloning from 6 seconds of reference audio. Supports 17 languages and emotion transfer.

€0.005
coquittsvoice-cloning

Yi Large

Text & ChatCustom

01.AI's larger general-purpose chat model with 32k context window and strong bilingual performance.

Free
01aichinesebilingual

Yi-Coder 9B

Code01.AI

01.AI Yi-Coder 9B chat model. Strong multilingual code completion and chat, 128k context, competitive with code-specialized models 2x its size.

€0.004
replicatecode-generation01ai

Yi-VL 34B

Multimodal01.AI

01.AI Yi-VL 34B vision-language model. Bilingual (CN/EN) image understanding, strong CMMMU and MMMU performance among open-weights VLMs.

€0.02
replicatemultimodalvision-understanding

ZoeDepth

MultimodalReplicate

Intel ZoeDepth metric depth-estimation model. Combines relative-depth pretraining with metric fine-tuning for absolute distance in real units.

€0.005
replicatedepthvision-understanding

Frequently asked questions

Start Building with AI

Access all models through a single API. OpenAI-compatible, no vendor lock-in.