Replicate

San Francisco, USAFounded 2019
166 models

Replicate is an open-model hosting platform that serves thousands of open-source models including Flux, Stable Diffusion, Llama, and Whisper variants via a unified API.

166 models from Replicate on Railwail

Access every Replicate model through Railwail's OpenAI-compatible API.

166 models available

Depth Anything v2

MultimodalReplicate
Popular

Monocular depth-estimation model trained on 595k labeled and 62M unlabeled images. Strong zero-shot generalization in indoor and outdoor scenes.

€0.005
replicatedepthvision-understanding

Flux 1.1 Pro Ultra

ImageBlack Forest Labs
Popular

FLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.

€0.6015.0s
high-qualityphotorealistic

Flux Dev

ImageBlack Forest Labs
Popular

Black Forest Labs' development model. Fast, high-quality image generation with LoRA support.

€0.5010.0s
popularfastlora

Google Veo 2

VideoGoogle DeepMind
Popular

Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.

€5.00120.0s
high-qualitypopular

Google Veo 3.1

VideoGoogle DeepMind
NewPopular

Latest Veo with image-to-video and context-aware audio

€6.0092.0s
popularaudioi2v

Kling v3

VideoReplicate
NewPopular

Cinematic video up to 15s with multi-shot and native audio

€2.00120.0s
popularaudioi2v

Kling v3 Omni

VideoReplicate
NewPopular

Most versatile: multi-reference images, video editing, native audio

€2.50120.0s
popularaudioi2v

Midjourney V7

ImageReplicate
NewPopular

The latest Midjourney model. Industry-leading aesthetic quality and prompt adherence for image generation.

€3.0030.0s
high-qualityaestheticpopular

MusicGen

AudioMeta
Popular

Meta's music generation model. Generate up to 1 minute of music from text descriptions.

€1.5030.0s
musicpopular

Runway Gen 4.5

VideoReplicate
NewPopular

Top-ranked for motion quality and visual fidelity

€1.0030.0s
populartop-quality

SAM 2 (Segment Anything 2)

MultimodalMeta
Popular

Meta Segment Anything 2. Promptable segmentation across images and video with temporal memory. Zero-shot, point/box/mask prompts, fast on a single H100.

€0.01
replicatesegmentationmeta

AnimateDiff

VideoCommunity

Plug-and-play motion module that animates personalized Stable Diffusion models without further training. 16-frame clips at 512x512.

€0.04
replicateanimationanimatediff

AnimateDiff Evolved

VideoReplicate

Community fork of AnimateDiff with improved motion modules, beta scheduler control and ControlNet integration for richer animation control.

€0.05
replicateanimationanimatediff

AnimateDiff Lightning

VideoByteDance

ByteDance distillation of AnimateDiff. 4-step sampling for over 10x faster inference at comparable quality to multi-step base model.

€0.02
replicateanimationbytedance

AudioCraft

TTSReplicate

Meta's AudioCraft framework wrapping MusicGen, AudioGen and EnCodec. Unified text-to-audio research toolkit for music and sound effects.

€0.01
metamusic-generationsound-effects

AudioLDM 2

TTSAudioLDM

Latent-diffusion model for general-purpose text-to-audio. Generates speech, music, and sound effects with a unified prior.

€0.01
audioldmmusic-generationdiffusion

AuraFlow v0.3

ImageFal.ai

fal.ai's fully open-source 6.8B flow-based text-to-image model. Up to 1536x1536 resolution.

Free
auraflowtext-to-imageopen-weights

Bark

AudioSuno

Suno's text-to-audio model. Generates realistic speech, music, and sound effects.

€0.5015.0s
speechsound-effects

BRIA RMBG-1.4

ImageReplicate

BRIA's first commercial-safe background-removal model. Trained on fully-licensed data, suitable for production e-commerce and design pipelines.

€0.03
replicatebackground-removalbria

BRIA RMBG-2.0

ImageReplicate

BRIA's professional background-removal model trained on fully-licensed data. Commercial-safe.

€0.04
briaimage-editbackground-removal

CCSR (Content-Consistent SR)

ImageReplicate

Content-Consistent Super-Resolution model. Reduces hallucination compared to typical diffusion-based upscalers while keeping perceptual quality high.

€0.04
replicateupscalingimage-restore

Champ Human Animation

VideoCommunity

Champ controllable human image animation. Uses 3D parametric guidance (SMPL) for realistic full-body motion transfer from a single reference image.

€0.12
replicateanimationhuman-motion

Clarity Upscaler

ImageCommunity

High-resolution image upscaler with creative detail re-imagination via SD-based hallucination. Strong for photography and product shots.

€0.04
replicateupscalingcreative

CodeFormer

ImageCommunity

Robust face-restoration model using a transformer-based codebook prior. Handles severe degradation, occlusion, and old-photo restoration with adjustable fidelity-quality tradeoff.

€0.002
replicateface-restoreupscaling

CogVideoX-5B (open)

VideoReplicate

Zhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.

Free
zhiputsinghuacogvideox

CogVLM2 19B

MultimodalReplicate

Tsinghua CogVLM2 19B with Llama-3 8B base plus 11B vision expert. Strong document understanding and visual reasoning, 8k context.

€0.01
replicatemultimodalvision-understanding

ControlNet Canny

ImageReplicate

ControlNet conditioned on Canny edge maps. Preserves composition and outlines while restyling with Stable Diffusion 1.5 or SDXL backbones.

€0.01
replicatestyle-transferimage-edit

ControlNet Depth

ImageReplicate

ControlNet conditioned on depth maps. Preserves the 3D scene layout while letting the prompt change style, lighting and content.

€0.01
replicatestyle-transferimage-edit

DeepSeek-VL 7B

MultimodalReplicate

DeepSeek-VL 7B chat model. Vision-language model with hybrid vision encoder and strong real-world visual question answering performance.

€0.008
replicatemultimodalvision-understanding

Detectron2

MultimodalReplicate

Meta Detectron2 object-detection and segmentation toolkit. Mask R-CNN, Cascade R-CNN, panoptic FPN and many other model variants in one wrapper.

€0.008
replicatesegmentationvision-understanding

DINOv2

MultimodalReplicate

Meta DINOv2 self-supervised vision backbone. Pretrained features for classification, segmentation and depth without task-specific fine-tuning.

€0.005
replicatesegmentationvision-understanding

Donut Document

MultimodalReplicate

Naver CLOVA Donut OCR-free document-understanding transformer. End-to-end JSON extraction from forms, receipts and invoices without explicit OCR.

€0.008
replicateocrvision-understanding

Dots OCR

MultimodalReplicate

Rednote Hilab Dots OCR. End-to-end document parsing model with layout, text and reading-order prediction in one transformer.

€0.008
replicateocrvision-understanding

DreamGaussian

ImageReplicate

Generative Gaussian-splatting model for fast image-to-3D synthesis. Produces textured meshes in two minutes via differentiable rasterization.

€0.09
replicate3d-generationimage-to-3d

DreamGaussian 4D

VideoReplicate

4D Gaussian-splatting generator extending DreamGaussian to video. Image-conditioned dynamic 3D scenes with view-consistent motion.

€0.18
replicateanimation4d

DWPose

MultimodalReplicate

DWPose whole-body 2D pose estimator. Two-stage knowledge-distilled model with strong accuracy on face, hands and body keypoints simultaneously.

€0.005
replicateposevision-understanding

DynamiCrafter

VideoCommunity

Tencent DynamiCrafter. Animates still images into short videos preserving texture and structure, with strong open-domain coverage.

€0.09
replicateanimationimage-to-video

EasyOCR

MultimodalReplicate

JaidedAI EasyOCR. Simple Python OCR wrapper supporting 80+ languages with deep-learning text detection and recognition.

€0.002
replicateocrvision-understanding

EchoMimic

VideoReplicate

Ant Group EchoMimic. Lifelike audio-driven portrait animation with editable landmark conditioning for fine-grained motion control.

€0.10
replicatelipsyncant-group

ESRGAN Classic

ImageReplicate

Enhanced Super-Resolution GAN, the original 2018 architecture. Produces sharp 4x upscales with strong perceptual quality on natural images.

€0.001
replicateupscalingesrgan

F5-TTS

TTSReplicate

Open-source flow-matching TTS with strong zero-shot voice cloning. Code MIT, weights CC-BY-NC.

Free
f5ttsopen-weights

FILM Frame Interpolation

VideoGoogle Research

Google FILM frame interpolation. Synthesizes high-quality intermediate frames between near-duplicate inputs, designed for large motion gaps.

€0.01
replicateupscaleframe-interpolation

Florence-2 Large

MultimodalMicrosoft

Microsoft Florence-2 Large. Unified prompt-based vision foundation model for captioning, detection, segmentation and OCR with a single 770M-param backbone.

€0.008
replicatemultimodalvision-understanding

Florence-2 Segmentation

MultimodalCommunity

Microsoft Florence-2 unified vision model with referring expression segmentation. Text-prompted region and mask generation in one model.

€0.009
replicatesegmentationvision-understanding

Flux Schnell

ImageBlack Forest Labs

The fastest Flux model. Generate images in under 2 seconds. Great for prototyping.

€0.032.0s
fastaffordable

FLUX.1 [Schnell]

ImageBlack Forest Labs

Black Forest Labs' fastest open-weights image model. Apache-2.0 licensed, ~1-4 step inference.

€0.003
fluxblack-forest-labsopen-weights

FLUX.1 Canny

ImageReplicate

FLUX structural control via Canny edge maps. Preserve composition while restyling.

€0.05
fluxblack-forest-labsimage-edit

FLUX.1 Depth

ImageReplicate

FLUX structural control via depth maps. Keep 3D scene layout while changing style/content.

€0.05
fluxblack-forest-labsimage-edit

FLUX.1 Fill

ImageReplicate

Black Forest Labs' inpainting/outpainting model for FLUX. Fill masked regions with prompt-guided content.

€0.05
fluxblack-forest-labsimage-edit

FLUX.1 Redux

ImageReplicate

FLUX image-variation adapter. Generate variations and remixes from a reference image.

€0.03
fluxblack-forest-labsimage-edit

GFPGAN v1.4

ImageTencent ARC

Tencent ARC face-restoration GAN. Reconstructs realistic facial detail in low-quality or compressed photos using a pretrained StyleGAN2 prior.

€0.002
replicateface-restoreupscaling

GLPN Depth

MultimodalReplicate

Global-Local Path Networks depth-estimation model. Combines hierarchical transformer encoder with selective feature fusion for sharp boundaries.

€0.004
replicatedepthvision-understanding

Google Veo 3 Fast

VideoGoogle DeepMind
New

Faster cheaper Veo 3 with audio

€3.2059.0s
fastaudio

Google Veo 3.1 Fast

VideoGoogle DeepMind
New

Faster Veo 3.1 with image-to-video and audio

€3.2059.0s
fastaudioi2v

GOT-OCR 2.0

MultimodalReplicate

StepFun GOT-OCR 2.0. Unified end-to-end OCR-2.0 model handling text, formulas, charts, sheet music and geometric shapes in one architecture.

€0.009
replicateocrvision-understanding

Granite Code 20B

CodeReplicate

IBM Granite 20B Code Instruct. Larger Granite code model balancing quality and inference cost for enterprise CI/CD code-review automation.

€0.006
replicatecode-generationibm

Granite Code 34B

CodeReplicate

IBM Granite 34B Code Instruct. Largest Granite code-instruction model. Top-tier among Apache-2.0 code LLMs on HumanEval, MBPP and MultiPL-E.

€0.01
replicatecode-generationibm

Granite Code 3B

CodeReplicate

IBM Granite 3B Code Instruct. Apache-2.0 small code-instruction model. Strong on Python, Java, JavaScript and Go for enterprise IDE integrations.

€0.002
replicatecode-generationibm

Granite Code 8B

CodeReplicate

IBM Granite 8B Code Instruct. Trained on permissively-licensed code, strong on multi-language code completion and instruction-following.

€0.004
replicatecode-generationibm

Grok Imagine Video

VideoReplicate
New

xAI video with native audio and lip-sync, up to 15s

€1.5090.0s
audioi2vxai

Grounded-SAM

MultimodalReplicate

Grounding DINO plus SAM. Open-vocabulary text-prompted detection and segmentation in one pipeline for fully-automatic mask generation.

€0.01
replicatesegmentationvision-understanding

Hailuo 2.3

VideoMinimax
New

Minimax model for realistic human motion and VFX

€0.5060.0s
i2v1080p

HRNet Pose

MultimodalReplicate

Microsoft HRNet high-resolution pose-estimation backbone. Parallel multi-resolution streams yield strong accuracy on COCO keypoint benchmarks.

€0.005
replicateposevision-understanding

Hunyuan3D 2.0

ImageTencent

Tencent's Hunyuan3D 2.0 image-to-3D pipeline. Two-stage shape and texture generation producing high-resolution textured meshes.

€0.21
replicate3d-generationimage-to-3d

Hunyuan3D 2.1

ImageTencent
New

Refreshed Hunyuan3D 2.1 with improved texture fidelity and PBR-material support. Image-to-3D with textured GLB output.

€0.24
replicate3d-generationimage-to-3d

HunyuanVideo

VideoTencent

Tencent's 13B open-weights video diffusion transformer. SOTA among open video models at release.

Free
tencenthunyuantext-to-video

HunyuanVideo

VideoTencent

Tencent's open-source video generation model. Strong visual quality with diverse style support.

€2.00120.0s
open-source

Idefics3 8B

MultimodalReplicate

Hugging Face Idefics3 8B. Llama-3 based open-source vision-language model with strong document QA and chart-understanding performance.

€0.007
replicatemultimodalvision-understanding

InstantMesh

ImageReplicate

Image-to-3D mesh generator from sparse-view diffusion. Produces textured meshes in under one minute on a single A100.

€0.12
replicate3d-generationimage-to-3d

InstructPix2Pix

ImageReplicate

Berkeley InstructPix2Pix. Edits an image from natural-language instructions in a single forward pass. Trained on GPT-3 plus Stable Diffusion synthetic pairs.

€0.01
replicatestyle-transferimage-edit

InternVL 2.5

MultimodalReplicate

OpenGVLab InternVL 2.5 78B. Open-source vision-language model approaching GPT-4o on MMMU, OCRBench and Math-Vista benchmarks.

€0.03
replicatemultimodalvision-understanding

IP-Adapter FaceID Plus v2

ImageReplicate

Tencent's face-identity conditioning adapter for SD/SDXL. Face embedding + CLIP for ID-consistent generation.

Free
tencentimage-editface-id

Janus Pro 7B

ImageReplicate

DeepSeek's unified multimodal model. Decouples vision encoding for both understanding and generation tasks.

Free
deepseekjanusopen-weights

Kokoro TTS 82M

TTSReplicate

Open-weights 82M-parameter TTS. Punches above its size class on naturalness benchmarks at a fraction of the inference cost of larger models.

€0.002
kokorottsopen-weights

Kuaishou Kolors

ImageReplicate

Kuaishou's bilingual (CN/EN) latent diffusion text-to-image model with strong text rendering.

Free
kuaishoutext-to-imageopen-weights

LayoutLMv3

MultimodalMicrosoft

Microsoft LayoutLMv3 multimodal document model. Unified text/image masking pretraining for form understanding, receipts and document QA.

€0.007
replicateocrvision-understanding

LivePortrait

VideoCommunity

Kuaishou LivePortrait. Efficient portrait animation driven by reference videos with stitching, retargeting and motion-control parameters.

€0.08
replicatelipsynckuaishou

Llama 3.2 Vision 90B

MultimodalMeta

Meta Llama 3.2 90B Vision. Largest open-weights Llama vision model. Strong visual reasoning, chart, OCR and document understanding.

€0.02
replicatemultimodalvision-understanding

LLaVA-OneVision 72B

MultimodalReplicate

LMMs-Lab LLaVA-OneVision 72B. Unified single-image, multi-image and video instruction-tuned VLM with task-transfer across modalities.

€0.02
replicatemultimodalvision-understanding

Lotus-G

MultimodalReplicate

Lotus generative depth model. Treats depth as a generation task using a diffusion model, producing higher-fidelity depth on textured surfaces.

€0.01
replicatedepthvision-understanding

LTX-Video (Lightricks)

VideoReplicate

Lightricks' 2B DiT video model. Realtime generation on consumer GPUs (~6s @ H100, 24fps).

Free
lightricksltxtext-to-video

Luma Ray Flash 2

VideoLuma AI
New

Fast affordable video with I2V support

€0.5045.0s
fastbudgeti2v

M2M-100 12B

Text & ChatMeta

Meta M2M-100 12B many-to-many translation model. Direct translation between 100 languages without pivoting through English.

€0.006
replicatetranslationmeta

MADLAD-400 3B

Text & ChatGoogle DeepMind

Google MADLAD-400 3B multilingual translation model. 419 languages supported, trained on a 5T-token multilingual corpus with strong low-resource performance.

€0.004
replicatetranslationgoogle

MagicAnimate

VideoCommunity

ByteDance MagicAnimate. Temporally consistent human-image animation driven by a DensePose motion sequence with strong identity preservation.

€0.10
replicateanimationhuman-motion

Magicoder S CL 7B

CodeReplicate

UIUC Magicoder S CL 7B. CodeLlama-7B fine-tuned with OSS-Instruct synthetic data. Strong HumanEval Plus and MBPP Plus performance per parameter.

€0.003
replicatecode-generationopen-weights

MAGNeT MusicGen

TTSReplicate

Meta MAGNeT non-autoregressive music generator. Up to 7x faster than MusicGen with comparable quality via masked generative transformers.

€0.007
metamusic-generationmagnet

Magnific-Style Upscaler

ImageReplicate

Detail-hallucinating upscaler in the Magnific style. Adds plausible high-frequency texture using a Stable Diffusion refiner conditioned on the low-res input.

€0.06
replicateupscalingcreative

Marigold

MultimodalReplicate

ETH Zurich Marigold. Diffusion-based monocular depth-estimation model fine-tuned from Stable Diffusion with strong fine-detail recovery.

€0.01
replicatedepthvision-understanding

Marker PDF Extract

MultimodalReplicate

Marker PDF-to-Markdown conversion pipeline. Combines layout, OCR and equation models to produce clean Markdown with preserved tables and formulas.

€0.008
replicateocrvision-understanding

Mask2Former

MultimodalReplicate

Meta Mask2Former universal image-segmentation transformer. Single architecture for panoptic, instance and semantic segmentation tasks.

€0.009
replicatesegmentationvision-understanding

mBART 50 Many-to-Many

Text & ChatMeta

Meta mBART-50 many-to-many translation model. 50 supported languages with strong performance on news and conversational text.

€0.003
replicatetranslationmeta

MediaPipe Pose

MultimodalGoogle DeepMind

Google MediaPipe Pose. Lightweight on-device-friendly 33-keypoint 3D pose estimator with optional segmentation mask output.

€0.003
replicateposevision-understanding

MiDaS v3.1

MultimodalReplicate

Intel MiDaS v3.1 relative depth-estimation model. Robust zero-shot single-image depth across diverse domains and resolutions.

€0.004
replicatedepthvision-understanding

MiniCPM-V 2.6

MultimodalReplicate

OpenBMB MiniCPM-V 2.6. 8B vision-language model with strong single-image, multi-image and video understanding plus OCR capabilities.

€0.008
replicatemultimodalvision-understanding

Minimax Video

VideoMinimax

MiniMax's video generation model. Fast, high-quality video output with text-to-video capabilities.

€2.5090.0s
fastaffordable

MMPose

MultimodalReplicate

OpenMMLab MMPose toolbox. Wraps RTMPose, HRNet, HigherHRNet and many other pose models behind a unified inference API.

€0.006
replicateposevision-understanding

Mochi 1

VideoGenmo

Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.

Free
genmomochitext-to-video

MOFA-Video

VideoReplicate

Motion-Field-Adapter video generator. Controllable image animation from trajectories, keypoints or audio with a strong identity preservation prior.

€0.10
replicatelipsyncanimation

MuseTalk

VideoCommunity

Tencent MuseTalk real-time lip-sync model. Audio-driven mouth-region editing in latent space at 30+ fps on a single GPU.

€0.06
replicatelipsynctencent

MusicGen Large

TTSMeta

Meta's 3.3B-parameter MusicGen Large. Text-conditioned music generation with single-stage autoregressive transformer, supports melody conditioning.

€0.02
metamusic-generationopen-weights

MusicGen Medium

TTSMeta

Meta MusicGen Medium (1.5B params). Strong quality-to-speed tradeoff for text-to-music with optional melody guidance.

€0.01
metamusic-generationopen-weights

MusicGen Small

TTSMeta

Meta MusicGen Small (300M params). Fast text-to-music generation suitable for prototyping and low-latency demos.

€0.006
metamusic-generationopen-weights

NLLB-200 3B

Text & ChatMeta

Meta's No Language Left Behind 3.3B translation model. Direct translation between any pair of 200+ languages including many low-resource African and Asian languages.

€0.003
replicatetranslationmeta

NLLB-200 Distilled 600M

Text & ChatMeta

Meta's distilled 600M NLLB. Same 200-language coverage as the 3B model with a fraction of the parameters, ideal for edge or high-throughput deployment.

€0.002
replicatetranslationmeta

olmOCR

MultimodalReplicate

Allen AI olmOCR. Open-source 7B vision-language model fine-tuned for high-fidelity document parsing including math, code and tables.

€0.01
replicateocrvision-understanding

OpenPose

MultimodalReplicate

CMU OpenPose multi-person 2D pose estimator. Real-time keypoint detection for body, hand, face and foot using Part Affinity Fields.

€0.005
replicateposevision-understanding

OpenVoice v1

TTSReplicate

MyShell OpenVoice v1. Cross-lingual voice cloning with flexible style control: emotion, accent, rhythm, pauses, and intonation.

€0.004
myshellttsvoice-cloning

OpenVoice v2

TTSReplicate

MyShell OpenVoice v2. Multilingual zero-shot voice cloning with accurate tone-color reproduction and style/emotion control.

€0.004
myshellttsvoice-cloning

PaddleOCR v3

MultimodalReplicate

Baidu PaddleOCR v3 PP-OCR pipeline. Lightweight detector plus recognizer optimized for production use with 80+ language support.

€0.003
replicateocrvision-understanding

Parler-TTS

TTSReplicate

Hugging Face Parler-TTS Mini. Lightweight TTS conditioned on a natural-language style description for fine-grained control over voice characteristics.

€0.003
parlerttshuggingface

Parler-TTS Large

TTSReplicate

Parler-TTS Large v1. 2.2B parameters, natural-language style prompting and improved prosody over the Mini variant.

€0.005
parlerttshuggingface

Phi-3.5 Vision

MultimodalMicrosoft

Microsoft Phi-3.5 Vision Instruct. Small (4.2B) multimodal model with strong document, OCR and multi-image reasoning at low cost.

€0.005
replicatemultimodalvision-understanding

Phind CodeLlama 34B v2

CodeReplicate

Phind CodeLlama 34B v2. Highly tuned CodeLlama variant focused on retrieval-augmented developer assistant workflows.

€0.009
replicatecode-generationphind

PhotoMaker

ImageTencent ARC

Tencent ARC PhotoMaker. Identity-preserving stylized photo generation from a stacked-ID embedding. Realistic re-styling of a subject in seconds.

€0.03
replicatestyle-transferimage-edit

PixVerse v5.6

VideoReplicate
New

Physics-accurate video generation up to 1080p

€0.5060.0s
i2v1080pphysics

Point-E

ImageOpenAI

OpenAI Point-E text-to-point-cloud system. Fast 3D point-cloud generation from text, optionally lifted to a mesh via marching cubes.

€0.03
replicate3d-generationopenai

Real-CUGAN

VideoCommunity

Real-CUGAN anime-focused upscaler. 2x/3x/4x super-resolution tuned for animation, line-art, and illustrated content.

€0.01
replicateupscaleanime

Real-ESRGAN 4x

ImageCommunity

AI-Upscaler that increases image resolution up to 4x while preserving texture and detail. Trained on synthetic and real data to reduce common ESRGAN artifacts.

€0.001
replicateupscalingimage-restore

Real-ESRGAN Anime 4x

ImageReplicate

Real-ESRGAN variant fine-tuned for anime, manga, and illustrated artwork. 4x upscaling with cartoon-aware artifact suppression.

€0.001
replicateupscalinganime

Recraft V3

ImageReplicate
New

State-of-the-art image generation optimized for design and branding. SVG vector output support.

€0.6012.0s
designvectorbranding

Rembg

ImageCommunity

Open-source background-removal tool wrapping U2Net. Produces alpha mattes for photos, products and people with no manual masking.

€0.001
replicatebackground-removalmatting

RIFE Frame Interpolation

VideoReplicate

Real-Time Intermediate Flow Estimation. Doubles or quadruples FPS of an existing video via learned optical-flow-based frame interpolation.

€0.01
replicateupscaleframe-interpolation

Riffusion

TTSRiffusion

Stable-Diffusion-based real-time music generator. Operates on spectrogram images then resynthesizes audio, enables seamless transitions and looping.

€0.008
riffusionmusic-generationopen-weights

RVC Voice Conversion

TTSCommunity

Retrieval-based Voice Conversion. Converts a source recording into a target speaker's voice, preserving pitch, prosody and rhythm.

€0.006
rvcvoice-conversionvoice-cloning

SadTalker

VideoCommunity

Stylized audio-driven talking-head generator. Synthesizes 3D motion coefficients from audio to animate a single portrait image with natural head movements.

€0.07
replicatelipsynctalking-head

SAM HQ

MultimodalReplicate

ETH Zurich SAM-HQ. High-quality mask refinement on top of SAM. Sharper edges and finer structure than the original Segment Anything model.

€0.01
replicatesegmentationvision-understanding

SeamlessM4T v2 Large (Speech)

STTMeta

Meta SeamlessM4T v2 Large speech mode. Speech-to-speech, speech-to-text, and text-to-speech translation across 100+ languages in a single unified model.

€0.01
replicatetranslationmeta

SeamlessM4T v2 Large (Text)

Text & ChatMeta

Meta SeamlessM4T v2 Large. Universal multilingual translation across 100+ languages with text-to-text mode for documents and chat.

€0.006
replicatetranslationmeta

Seedance Lite

VideoByteDance
New

Budget ByteDance video, fast and cheap

€0.5070.0s
budgeti2vfast

Seedance Pro

VideoByteDance
New

ByteDance video with T2V and I2V, up to 1080p

€1.0095.0s
i2v1080p

Segformer B5

MultimodalReplicate

NVIDIA SegFormer-B5 semantic segmentation. Hierarchical transformer encoder with lightweight MLP decoder, strong ADE20k and Cityscapes results.

€0.007
replicatesegmentationvision-understanding

Shap-E (OpenAI)

ImageOpenAI

OpenAI Shap-E text/image to 3D. Generates implicit neural representations renderable as textured meshes or NeRFs.

€0.04
replicate3d-generationopenai

Spark TTS

TTSReplicate

Spark efficient TTS with disentangled control over speaker, content and style. Strong cross-lingual zero-shot performance.

€0.004
sparkttsvoice-cloning

Stable Diffusion XL

ImageStability AI

Stability AI's SDXL model via Replicate. High-quality image generation with extensive customization.

€0.208.0s
open-sourcecustomizable

StarCoder2 15B

CodeBigCode

BigCode StarCoder2 15B code-generation flagship. Trained on 4T tokens of Stack v2 data with grouped-query attention and 16k context.

€0.005
replicatecode-generationbigcode

StarCoder2 3B

CodeBigCode

BigCode StarCoder2 3B code-generation model. Trained on The Stack v2, supports 600+ programming languages. Apache-2.0 licensed for commercial use.

€0.002
replicatecode-generationbigcode

StarCoder2 7B

CodeBigCode

BigCode StarCoder2 7B code-generation model. 16k context, 600+ programming languages, strong fill-in-the-middle (FIM) performance.

€0.003
replicatecode-generationbigcode

StreamingT2V

VideoReplicate

Picsart StreamingT2V. Generates long, consistent videos by chaining short autoregressive clips with motion and appearance memory.

€0.15
replicateanimationlong-form

StyleTTS 2

TTSReplicate

Style-based TTS using diffusion and adversarial training. Human-level naturalness in zero-shot voice synthesis from a 3-5s reference clip.

€0.004
stylettsttsvoice-cloning

Suno Bark

TTSSuno

Suno's text-prompted generative audio model. Speech, music, ambient sound and effects with non-verbal cues like laughter or sighs.

€0.01
sunobarkmusic-generation

SUPIR Upscaler

ImageCommunity

SUPIR (Scaling-Up Image Restoration) photo-real restoration model. Combines SDXL prior with language-guided controls for severely degraded inputs.

€0.06
replicateupscalingimage-restore

Swin2SR

ImageReplicate

Transformer-based image super-resolution using Swin-V2 attention. Handles classical, lightweight, real-world, and compressed-input variants with 2x/4x upscaling.

€0.002
replicateupscalingtransformer

SwinIR Video

VideoCommunity

SwinIR transformer-based super-resolution and denoising applied per-frame to video. Handles classic, real-world and lightweight upscaling.

€0.02
replicateupscaletransformer

T2I-Adapter Color

ImageReplicate

Tencent T2I-Adapter color-guided generation for SDXL. Lightweight adapter that conditions image generation on a color reference image.

€0.009
replicatestyle-transferimage-edit

ToonCrafter

VideoCommunity

Tencent ToonCrafter generative cartoon interpolation model. Synthesizes smooth in-between frames between two cartoon keyframes.

€0.08
replicateanimationtooncrafter

Tortoise TTS

TTSCommunity

Multi-voice expressive TTS. Slow but high-quality with strong prosody and natural intonation. Trained for long-form narration use cases.

€0.01
tortoisettsexpressive

TowerInstruct 13B

Text & ChatReplicate

Unbabel TowerInstruct 13B. Llama-2-based multilingual translation and post-editing model. Strong terminology consistency for enterprise localization.

€0.005
replicatetranslationunbabel

Transparent Background

ImageReplicate

PyTorch background-removal tool supporting multiple modes: base, fast and high-quality. Produces RGBA outputs and is suitable for batch processing.

€0.001
replicatebackground-removalopen-source

TRELLIS (3D)

ImageReplicate

Microsoft TRELLIS image-to-3D model. Generates textured 3D assets in GLB or Gaussian-splat format from a single reference image.

€0.18
replicate3d-generationimage-to-3d

TripoSR

ImageReplicate

Stability AI and Tripo single-image 3D reconstruction model. Generates 3D meshes from a single image in roughly half a second.

€0.03
replicate3d-generationimage-to-3d

TrOCR Large

MultimodalMicrosoft

Microsoft TrOCR large transformer-based OCR. End-to-end visual encoder plus text decoder, trained on synthetic and printed real-world data.

€0.004
replicateocrvision-understanding

U2Net Saliency

ImageReplicate

Salient-object detection network used for background removal and matting. Nested U-Net architecture trained on DUTS-TR for general scenes.

€0.001
replicatebackground-removalsaliency

Udio V1.5

AudioReplicate
New

AI music generation with studio-quality output. Generate full songs with vocals, instruments, and production.

€2.0060.0s
musicvocalshigh-quality

V-Express

VideoTencent

Tencent V-Express. Audio-driven portrait animation with progressive training, weak-condition learning, and expressive lip sync.

€0.09
replicatelipsynctencent

VideoCrafter

VideoCommunity

Tencent VideoCrafter latent video diffusion. Text-to-video and image-to-video generation up to 2s at 1024x576 with strong motion fidelity.

€0.07
replicateupscalevideo-generation

ViTPose

MultimodalReplicate

ViTPose plain-vision-transformer pose estimator. State-of-the-art keypoint accuracy on MS-COCO with a minimal architecture.

€0.006
replicateposevision-understanding

Wan 2.1 (Alibaba)

VideoReplicate

Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.

Free
alibabawantext-to-video

Wan 2.2 Image-to-Video

VideoReplicate
New

Ultra-cheap I2V. Upload image and animate it.

€0.1030.0s
budgeti2vfast

Wan 2.2 Text-to-Video

VideoReplicate
New

Ultra-cheap T2V for pennies

€0.1030.0s
budgetfast

Wav2Lip

VideoCommunity

Lip-sync model that re-syncs a target video's lip movement to an arbitrary audio track. Robust to identity and language with a lip-sync discriminator loss.

€0.05
replicatelipsyncvideo-edit

WizardCoder 33B

CodeReplicate

WizardLM WizardCoder 33B v1.1. Evol-Instruct fine-tune of DeepSeek-Coder-33B with strong code-generation benchmark performance.

€0.009
replicatecode-generationwizardlm

XTTS v2

TTSCommunity

Coqui's XTTS v2 multilingual TTS with voice cloning from 6 seconds of reference audio. Supports 17 languages and emotion transfer.

€0.005
coquittsvoice-cloning

Yi-Coder 9B

Code01.AI

01.AI Yi-Coder 9B chat model. Strong multilingual code completion and chat, 128k context, competitive with code-specialized models 2x its size.

€0.004
replicatecode-generation01ai

Yi-VL 34B

Multimodal01.AI

01.AI Yi-VL 34B vision-language model. Bilingual (CN/EN) image understanding, strong CMMMU and MMMU performance among open-weights VLMs.

€0.02
replicatemultimodalvision-understanding

ZoeDepth

MultimodalReplicate

Intel ZoeDepth metric depth-estimation model. Combines relative-depth pretraining with metric fine-tuning for absolute distance in real units.

€0.005
replicatedepthvision-understanding

Frequently asked questions

How is Replicate pricing handled on Railwail?
Railwail uses transparent per-call or per-token credit pricing for all Replicate models. You pay only for what you use — no monthly minimums, no upfront commitments. Pricing for every individual Replicate model is shown on its detail page.
Are there rate limits when using Replicate via Railwail?
Default rate limits depend on your account tier and the underlying Replicate capacity. Free-tier accounts get sensible defaults for development; paid accounts can request higher limits. Contact support if you need dedicated throughput or burst capacity.
Which regions does Replicate support through Railwail?
Replicate models are served from Railwail's globally distributed edge infrastructure. EU, US, and Asia-Pacific traffic is automatically routed to the nearest available provider region. GDPR-compliant EU-only routing is available on request.
Is there a sandbox or free tier to test Replicate models?
Yes — every new Railwail account receives free credits that work across all providers, including Replicate. No credit card is required to start. You can try every model in the catalog before committing to a paid plan.
Categories Replicate works in

Start building with Replicate today

Free credits on sign-up. No credit card required. Access Replicate and 27+ other providers through a single API.