AI Models

BAAI (Beijing Academy of AI) open-weight English embedding model with 335M parameters. Returns 1024-dim vectors and was a top MTEB English retrieval model on release. The v1.5 update improved similarity distribution so it works well without a query instruction prefix for symmetric tasks. A widely used open alternative to hosted embeddings.

BGE-M3 (Multilingual)

BAAI multilingual embedding model covering 100+ languages with an 8192-token context. M3 stands for its multi-functionality (dense, sparse and ColBERT-style multi-vector retrieval), multilinguality and multi-granularity over long documents. Returns 1024-dim dense vectors and is a strong open choice for cross-lingual and long-text retrieval.

Bio_ClinicalBERT

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

Biomedical NER (all entities)

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

BLIP

MultimodalSalesforce

Salesforce BLIP. Vision-language model for image captioning and visual question answering. Given an image it writes a short natural-language caption, or answers a question about the image when one is supplied. A widely used baseline for automatic captioning.

replicateblipcaptioning

Claude 3.5 Sonnet (vision)

Anthropic Claude 3.5 Sonnet with image input. 200k context, strong on dense documents, tables, charts and handwriting. Reliable structured extraction from screenshots and scans.

anthropicvisionmultimodal

Claude Opus 4

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free5.0s

flagshipreasoningagentic

Claude Opus 4.7

Anthropic's April 2026 flagship. 87.6% on SWE-bench Verified, 3x higher image resolution, output self-verification, vision + reasoning.

anthropicflagshipreasoning

Claude Opus 4.8

Anthropic's most capable Opus-tier model. State of the art on long-horizon agentic work, coding and knowledge tasks, with a 1M-token context window at standard pricing.

anthropicclaudeopus

Claude Sonnet 4

Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.

Free3.0s

popularcodinganalysis

Claude Sonnet 4.6

Anthropic's balanced mid-tier model from February 2026. Best price/performance for production workloads: 5x cheaper than Opus, near-flagship quality.

anthropicbalancedproduction

CLIP Interrogator

pharmapsychotic's CLIP Interrogator. Takes an image and produces a Stable-Diffusion-style text prompt by combining BLIP captioning with CLIP to rank likely subjects, artists, mediums and styles. Commonly used to reverse-engineer a prompt from an existing picture.

replicateclip-interrogatorcaptioning

Codestral

CodeMistral AI

Mistral's code-specialized model. Optimized for code generation, completion, and understanding across 80+ languages.

Free1.5s

codingfastmultilanguage

DeepSeek V3.1

DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.

deepseekopen-weightsmoe

DeepSeek V4 Pro

DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.

deepseekopen-weightsmoe

Depth Anything v2

Monocular depth-estimation model trained on 595k labeled and 62M unlabeled images. Strong zero-shot generalization in indoor and outdoor scenes.

€0.005

ElevenLabs Multilingual V2

TTSElevenLabs

naturalmultilingualpopular

ElevenLabs' most natural-sounding TTS model. Supports 29 languages with emotional range.

€1.003.0s

ESM-2 650M (Protein Embeddings)

Meta AI 650M-parameter protein language model trained on UniRef50 sequences. Feed it an amino-acid sequence and the per-residue hidden states act as learned protein embeddings, used for structure prediction, variant-effect and function tasks. This 33-layer checkpoint is the common balance of quality and cost in the ESM-2 family.

FLUX 1.1 Pro

Black Forest Labs' flagship text-to-image model. Faster generation than FLUX.1 Pro at higher prompt adherence, with strong photorealism and reliable spatial composition. Runs as a hosted Replicate model.

Flux 1.1 Pro Ultra

high-qualityphotorealistic

FLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.

€0.6015.0s

FLUX 1.1 Pro Ultra

FLUX 1.1 Pro in Ultra mode by Black Forest Labs. Generates up to 4 megapixel images with a raw mode for less processed, more natural-looking photography. Best FLUX option when output resolution and fine detail matter.

Flux Dev

Black Forest Labs' development model. Fast, high-quality image generation with LoRA support.

€0.5010.0s

popularfastlora

Gemini 1.5 Pro (vision)

Google Gemini 1.5 Pro with native multimodal input. Reads images, long PDFs, audio and video in up to a 2M-token context, useful for whole-document and long-video understanding.

Text & ChatGoogle DeepMind

googlegeminivision

Gemini 2.0 Flash

Text & ChatGoogle DeepMind

Google's fastest multimodal model. Supports text, images, audio, and video input.

Free1.2s

fastmultimodalaffordable

Gemini 2.5 Pro

reasoningcodingmultimodal

Google's latest thinking model. Excels at reasoning, coding, math, and science with massive context window.

Free4.0s

Gemini 3 Flash

Google's April 2026 fast multimodal model. Combines Gemini 3 Pro's reasoning with Flash-tier latency and price. Default model in the Gemini app.

googledeepmindbalanced

Gemini 3.1 Pro

Google DeepMind's February 2026 flagship. 2M-token context, native multimodal (text/image/audio/video), Deep Think reasoning.

googledeepmindflagship

Google Imagen 4

ImageGoogle DeepMind

Google DeepMind's Imagen 4 text-to-image model, hosted on Replicate. Sharp detail, accurate text rendering, and strong prompt adherence across photographic and illustrated styles. Outputs up to 2K resolution.

replicategoogleimagen

Google Imagen 4

ImageGoogle DeepMind

Google's Imagen 4. Text-to-image with strong photorealism and improved typography support.

googleimagentext-to-image

Google Imagen 4 Ultra

ImageGoogle DeepMind

Premium Imagen 4 tier. Highest fidelity, prompt adherence and typography quality from Google.

googleimagentext-to-image

Google Veo 2

Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.

€5.00120.0s

high-qualitypopular

Google Veo 3

Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.

€0.7592.0s

googleveotext-to-video

Google Veo 3 (Replicate)

Google's Veo 3 served via Replicate. Text-to-video with native synchronized audio generation. High-fidelity motion and scene coherence in short clips.

€8.00

replicategoogleveo

Google Veo 3.1

Latest Veo with image-to-video and context-aware audio

€6.0092.0s

popularaudioi2v

GPT-4.1

OpenAI's newest flagship model. Improved reasoning, instruction following, and coding over GPT-4o.

popularcodingreasoning

GPT-4o

OpenAI's most capable multimodal model. Excellent for complex reasoning, coding, and creative tasks.

Free2.0s

popularfastmultimodal

GPT-4o (vision)

OpenAI's GPT-4o with native image input. Handles text and images in a single context, 128k window, strong on chart reading, document QA, screenshots and visual reasoning.

openaivisionmultimodal

GPT-5.4

OpenAI's unified flagship combining GPT and o-series reasoning into one model. 1M context, multimodal, top SWE-Bench Pro and OSWorld scores.

openaiflagshipreasoning

GPT-5.4 Mini

OpenAI's efficient mid-tier model. 2x faster than its predecessor, 400k context, approaches GPT-5.4 quality on SWE-Bench Pro at a fraction of the cost.

openaibalancedcost-efficient

GPT-5.5

OpenAI's current flagship chat model (released April 2026). Strongest general reasoning, coding and tool use in the GPT-5 line, with vision input and a large context window.

openaigpt-5flagship

Grok 4

xAI's flagship reasoning model with vision and tool use. 256k context, strong at complex reasoning and STEM tasks.

xaiflagshipreasoning

Grok 4.20 Reasoning

xAI's Grok 4.20 reasoning snapshot. Runs an extended thinking pass before answering for multi-step analysis, math and STEM, with a 1M token context window and strong agentic tool calling.

xaigrokreasoning

Grok 4.3

MultimodalxAI

xAI's May 2026 flagship. 1M context, vision, always-on reasoning, real-time X/web retrieval via DeepSearch.

xaiflagshipreasoning

HunyuanVideo

VideoTencent

Tencent's HunyuanVideo, a 13B open-weights text-to-video diffusion transformer. Produces high-motion, photorealistic clips with smooth temporal consistency and was one of the first open models to rival closed systems on motion quality.

€5.00120.0s

replicatetencenthunyuan

Icons (SDXL Flat Pop)

SDXL fine-tune by galleri5 for slick flat icons and pop constructivist graphics with thick edges. Trained on Bing generations, it produces clean single-subject icon art that suits app icons, badges and UI glyphs. Raster output, not true vector.

replicateiconlogo

Ideogram 3.0

ideogramtext-to-imagetypography

Ideogram's flagship text-to-image model with industry-leading text rendering and prompt adherence.

€0.0915.0s

Ideogram v3 Quality

The highest-quality tier of Ideogram v3. Improved photorealism and prompt adherence over v2 while keeping Ideogram's best-in-class text rendering. Supports style references and inline text layout.

replicateideogramtext-to-image

Incredibly Fast Whisper

Whisper Large v3 wrapped with Hugging Face Transformers optimizations (batched inference, flash attention) for very high throughput. Transcribes hours of audio in minutes on a single GPU. Maintained by Vaibhav Srivastav. Good when you need bulk transcription fast.

replicatewhisperstt

InstantID

InstantID makes realistic portraits of a real person from a single reference photo without per-user training. Combines a face encoder with an IdentityNet adapter on SDXL to keep identity and pose while following a text prompt, so it is fast and tuning-free.

avatarportraitinstant-id

Kather100K Colorectal Tissue Classifier (ResNet50)

ResNet50 from the TIA Toolbox model zoo, trained on the Kather100K dataset of 100,000 hematoxylin-and-eosin colorectal histology patches. It classifies a tissue tile into one of nine categories such as tumor epithelium, stroma, lymphocytes, mucus, smooth muscle, debris, adipose, background and normal mucosa. Research use only, not a diagnostic device.

Kimi K2 (Moonshot)

Moonshot AI's 1T-parameter MoE model. Industry-leading agentic coding and tool-use benchmarks.

moonshotkimimoe

Kling v2.1

Kuaishou's Kling v2.1, generating 5 and 10 second videos at 720p or 1080p from text or an image. Known for cinematic camera work and realistic physical motion, available on Replicate via the official KwaiVGI account.

€6.00

replicatekuaishoukling

Kling v2.1 Master

Kuaishou's premium Kling v2.1 Master. Generates 1080p 5s and 10s clips from text or an image with strong dynamics and prompt adherence. The top tier of the Kling 2.1 family.

€6.00

replicatekuaishoukling

Kling v3

Cinematic video up to 15s with multi-shot and native audio

€2.00120.0s

popularaudioi2v

Kling v3 Omni

Most versatile: multi-reference images, video editing, native audio

€2.50120.0s

popularaudioi2v

Medical NER (DeBERTa)

Token-classification model that extracts 41 medical entity types from clinical text, such as disease, medication, dosage, frequency, lab test, sign and symptom. Fine-tuned on a DeBERTa v3 base, which gives more accurate spans than older BERT-based taggers on the same corpus.

Midjourney V7

high-qualityaestheticpopular

The latest Midjourney model. Industry-leading aesthetic quality and prompt adherence for image generation.

€3.0030.0s

MiniMax Hailuo 02

VideoMinimax

MiniMax Hailuo 02 on Replicate. Text-to-video and image-to-video producing 6s or 10s clips at 768p standard or 1080p pro. Known for accurate real-world physics and stable motion.

replicateminimaxhailuo

MiniMax-01

Text & ChatMinimax

MiniMax's 456B hybrid lightning-attention model with native 4M-token context. Industry-leading long-context.

minimaxlong-contextlightning-attention

MusicGen

AudioMeta

Meta's music generation model. Generate up to 1 minute of music from text descriptions.

€1.5030.0s

musicpopular

Nomic Embed Text v1.5

Nomic AI open embedding model with a fully reproducible training pipeline (open weights, data and code). Supports an 8192-token context and Matryoshka representation learning, so you can truncate the 768-dim output down to 64 dims with graceful quality loss. Uses task prefixes like search_query and search_document.

o3-mini

OpenAI's reasoning model optimized for STEM tasks, coding, and math. Uses chain-of-thought reasoning.

Free10.0s

reasoningcodingmath

OpenAI Sora 2

VideoOpenAI

OpenAI's second-generation Sora video model. Realistic motion, improved physics, audio support.

€0.50

openaisoratext-to-video

OpenAI text-embedding-3-large

EmbeddingOpenAI

OpenAI's highest-quality embedding model. Returns 3072-dim vectors by default and supports reducing dimensions via the dimensions parameter. Outperforms text-embedding-3-small and the older ada-002 on MTEB and multilingual MIRACL retrieval benchmarks, for cases where accuracy matters more than cost.

Free600ms

openaiembeddingretrieval

OpenAI text-embedding-3-small

EmbeddingOpenAI

OpenAI's small, low-cost embedding model. Returns 1536-dim vectors by default and supports shortening output dimensions via the dimensions parameter without retraining. Replaced text-embedding-ada-002 with better retrieval quality at a fraction of the price, and is the default choice for general-purpose semantic search and RAG.

Free500ms

openaiembeddingretrieval

Perplexity Sonar Pro

Perplexity's premium web-grounded search model with multi-step reasoning over live sources.

perplexityweb-searchcitations

Professional Headshot (FLUX Kontext)

Turns any single selfie into a clean professional headshot using FLUX Kontext image editing. Keeps the person's face while swapping to business attire, a studio background and even lighting. Aimed at LinkedIn-style profile photos.

avatarportraitheadshot

PubMedBERT Embeddings (NeuML)

Sentence-transformers model fine-tuned from Microsoft PubMedBERT on PubMed title-abstract pairs by the NeuML team. Produces 768-dim sentence embeddings tuned for biomedical semantic search and similarity, and is the embedding backbone behind the paperai and txtai medical search tools.

Recraft 20B SVG

Recraft's faster, cheaper vector model. Outputs editable SVG paths instead of raster pixels, so logos, icons and flat illustrations scale to any size without blur. Defaults to a vector_illustration style and supports line art and engraving looks. Hosted API only.

replicaterecraftsvg

Recraft V3

Recraft's text-to-image model that topped the Hugging Face text-to-image arena at release. Strong long-text rendering, brand-style consistency, and precise control over image dimensions and color palettes.

replicaterecrafttext-to-image

Recraft Vectorize

Recraft's raster-to-vector converter. Takes a PNG or JPG and traces it into a clean SVG with precise vector paths, aimed at logos, icons and graphics that need to scale. Image-to-SVG counterpart to Recraft's text-to-SVG models.

svgvectorrecraft

Runway Gen 4.5

Top-ranked for motion quality and visual fidelity

€1.0030.0s

populartop-quality

Runway Gen-4 Turbo

Runway's Gen-4 Turbo on Replicate. Fast image-to-video generation producing 5s and 10s clips at 720p with strong character and scene consistency across shots.

replicaterunwaygen-4

SAM 2 (Segment Anything 2)

MultimodalMeta

Meta Segment Anything 2. Promptable segmentation across images and video with temporal memory. Zero-shot, point/box/mask prompts, fast on a single H100.

replicatesegmentationmeta

Sora

VideoOpenAI

popularhigh-qualityopenai

OpenAI video generation model. Create realistic and imaginative videos from text prompts up to 20 seconds.

€1.00180.0s

SPECTER (Scientific Paper Embeddings)

AllenAI document-level embedding model for scientific papers. Built on SciBERT and trained on the citation graph so that papers citing each other land close together. Feed it a title plus abstract and it returns one 768-dim vector per paper, useful for recommendation, clustering and citation-based retrieval.

Stable Diffusion XL

Stability AI's SDXL 1.0 with the optional refiner. The 3.5B base plus 6.6B ensemble UNet that became the default open image model before FLUX. Good for fine-tuning and LoRAs, broad community support.

replicatesdxlstability-ai

Sticker Maker

fofr's sticker generator that outputs graphics with transparent backgrounds, so the result drops straight into chat apps or print sheets. Runs an SDXL-based pipeline at high speed (default 17 steps) and returns die-cut style art without manual background removal.

replicatestickertransparent

ViT Chest X-ray Classifier

Vision Transformer (ViT) fine-tuned on chest x-ray images for multi-class thoracic findings. Given a single frontal chest radiograph it returns class probabilities across several disease categories. One of the more downloaded chest x-ray classifiers on the Hugging Face Hub. Research and education only, not a diagnostic tool.

Voyage AI voyage-3

Voyage's general-purpose embedding model. 1024 dims, 32k context, strong retrieval performance.

voyageembeddingretrieval

Whisper

STTOpenAI

OpenAI's Whisper running on Replicate. General-purpose speech recognition trained on 680k hours of multilingual audio. Transcribes and translates 99 languages, robust to accents and background noise, and outputs plain text, segments, or word-level timestamps.

replicateopenaiwhisper

Whisper Large V3

STTOpenAI

OpenAI's Whisper model. State-of-the-art speech recognition supporting 99+ languages.

€0.305.0s

multilingualpopular

Whisper Large v3 Turbo

STTOpenAI

OpenAI's distilled Whisper Large v3. ~216x realtime, 99+ languages, MIT-licensed weights.

openaiwhisperstt

851-Labs Background Remover

Background removal model from 851-Labs that outputs a clean cutout with a transparent alpha channel. One of the most-run background removers on Replicate, handles people, products and objects on busy backgrounds.

background-removal851-labscutout

Ad Inpaint (Product Photo)

Product advertising photo generator. You upload a cut-out product shot and a prompt describing the scene; it places the product on a new generated background with matching lighting and shadows, so a plain packshot becomes an ecommerce or ad-ready hero image without a photo studio.

productecommerceproduct-photo

AI21 Jamba 1.5 Large

AI21's flagship hybrid Mamba-Transformer model with a 256k context window for long-document tasks.

ai21long-contextmamba

AI21 Jamba 1.5 Mini

Cost-efficient hybrid Mamba-Transformer model with 256k context. Tuned for high-throughput RAG.

ai21long-contextmamba

AnimateDiff

Plug-and-play motion module that animates personalized Stable Diffusion models without further training. 16-frame clips at 512x512.

replicateanimationanimatediff

AnimateDiff Lightning

ByteDance distillation of AnimateDiff. 4-step sampling for over 10x faster inference at comparable quality to multi-step base model.

replicateanimationbytedance

AudioLDM 2

TTSAudioLDM

Latent-diffusion model for general-purpose text-to-audio. Generates speech, music, and sound effects with a unified prior.

audioldmmusic-generationdiffusion

AuraFlow v0.3

fal.ai's fully open-source 6.8B flow-based text-to-image model. Up to 1536x1536 resolution.

auraflowtext-to-imageopen-weights

Bark

AudioSuno

Suno's text-to-audio model. Generates realistic speech, music, and sound effects.

€0.5015.0s

speechsound-effects

BioBERT Disease NER (NCBI)

BioBERT fine-tuned on the NCBI Disease corpus for disease-name recognition. Given biomedical text it tags spans that mention a disease or condition, using BIO labels. Useful for pulling diagnoses and disease mentions out of abstracts, case reports and clinical notes.

BioBERT v1.2 (Biomedical Embeddings)

DMIS-Lab (Korea University) BERT-base initialized from English BERT and further pretrained on PubMed abstracts. Used as a feature extractor it yields 768-dim contextual embeddings tuned for biomedical text mining tasks such as NER, relation extraction and biomedical question answering.

BiomedBERT (PubMedBERT abstract)

Microsoft BiomedBERT (formerly PubMedBERT) pretrained from scratch on PubMed abstracts with a domain-specific vocabulary, rather than adapting a general model. As a feature extractor it gives 768-dim biomedical embeddings and set the original state of the art on the BLURB biomedical NLP benchmark.

BiRefNet Background Removal

BiRefNet high-resolution dichotomous image segmentation for background removal. Bilateral reference network that produces sharp matting on fine detail like hair, fur and thin structures, often cleaner than older U2Net or rembg models.

background-removalbirefnetsegmentation

BLIP Image Captioning Large

Multimodalhuggingface

Salesforce BLIP large checkpoint for image captioning, served through Hugging Face Inference. Given a photo it returns a short English caption. The large variant gives more accurate captions than the base model and is a common drop-in for alt-text and image indexing.

huggingfaceblipcaptioning

Bone Fracture Detection (X-ray)

Image classifier by prithivMLmods that labels a bone x-ray as Fractured or Not Fractured. Given a single radiograph it returns binary class scores. One of the more downloaded fracture classifiers on the Hub. Research and education only, not a diagnostic tool.

BRIA Remove Background

BRIA AI's commercial background removal model trained on fully licensed data. Produces accurate cutouts for e-commerce and design, with attention to clean edges around products and people.

background-removalbriaecommerce

BRIA RMBG-1.4

BRIA's first commercial-safe background-removal model. Trained on fully-licensed data, suitable for production e-commerce and design pipelines.

replicatebackground-removalbria

BRIA RMBG-2.0

BRIA's professional background-removal model trained on fully-licensed data. Commercial-safe.

briaimage-editbackground-removal

Bringing Old Photos Back to Life

ImageMicrosoft

Microsoft Research pipeline by Ziyu Wan et al. that restores scanned old photos, removing scratches, dust and fading and optionally enhancing faces in one pass.

restoreold-photoscratch-removal

ByteDance Seedance 1 Pro

VideoByteDance

ByteDance's Seedance 1 Pro on Replicate. Text-to-video and image-to-video producing 5s or 10s clips at 480p or 1080p. Strong motion quality and prompt following from the Seedance family.

replicatebytedanceseedance

Cartesia Sonic

TTSCustom

Cartesia's ultra-low-latency TTS (~90ms TTFB). State-space model with voice cloning support.

cartesiattslow-latency

Cartoonify

catacolabs Cartoonify turns a photo into a flat cartoon illustration. Takes a single image and returns a stylized cartoon version with clean shapes and bold outlines. Straightforward one-input model for avatars and profile pictures.

avatarportraitcartoon

CCSR (Content-Consistent SR)

Content-Consistent Super-Resolution model. Reduces hallucination compared to typical diffusion-based upscalers while keeping perceptual quality high.

replicateupscalingimage-restore

Champ Human Animation

replicateanimationhuman-motion

Champ controllable human image animation. Uses 3D parametric guidance (SMPL) for realistic full-body motion transfer from a single reference image.

€0.12

Chatterbox

Resemble AI's open Chatterbox TTS. Zero-shot voice cloning from a short audio prompt with an exaggeration control for emotion intensity, plus CFG weight to balance pacing and fidelity.

replicateresemble-aitts

Clarity Upscaler

High-resolution image upscaler with creative detail re-imagination via SD-based hallucination. Strong for photography and product shots.

replicateupscalingcreative

Claude Haiku 3.5

Anthropic's fast and affordable model. Great for quick tasks, summarization, and simple coding.

Free1.0s

Claude Haiku 4.5

Anthropic's fastest and cheapest 4.x model. Strong vision and tool use at ultra-low latency, ideal for high-concurrency workloads.

anthropiccost-efficientlow-latency

Clinical Assertion and Negation BERT

Text-classification model from Betty van Aken that decides whether a medical condition mentioned in a clinical note is present, absent or possible. The target entity is marked in the input with [entity] tags, and the model returns the assertion status. Built on Bio_ClinicalBERT and trained on the 2010 i2b2/VA assertion data.

Clinical NER (problem, test, treatment)

Token-classification model that tags the three core i2b2 clinical entity types in patient notes: problem, test and treatment. Given a sentence from a discharge summary or progress note it marks which spans are medical problems, which are diagnostic tests and which are treatments or medications.

Code Llama 13B Instruct

Meta's 13B Code Llama tuned for instruction following. A faster mid-size option for code generation and completion, supporting infilling for inserting code at a cursor position. Served on Replicate per call.

Code Llama 34B Instruct

Meta's 34B Code Llama tuned for instruction following. A balance of size and quality for code generation, completion, and explanation, with strong coverage of Python, JavaScript, and other common languages. Runs on Replicate per call.

Code Llama 70B Instruct

Meta's largest Code Llama, a 70B Llama-2 derivative specialized for programming and tuned to follow instructions in chat form. Handles code generation, completion, and explanation across common languages. Served on Replicate as a per-call endpoint.

Code Llama 7B Instruct

Meta's smallest Code Llama at 7B parameters, tuned for instruction following. The cheapest and fastest member of the family for quick code generation, completion, and infilling. Served on Replicate per call.

CodeFormer

Robust face-restoration model using a transformer-based codebook prior. Handles severe degradation, occlusion, and old-photo restoration with adjustable fidelity-quality tradeoff.

replicateface-restoreupscaling

CodeGen 350M Mono

350M autoregressive code generation model from Salesforce, the smallest of the original CodeGen family. The mono variant was further trained on Python so it is well suited for short Python completions and program synthesis from a natural-language or code prompt.

codehuggingfacesalesforce

CogVideoX-5B

CogVideoX-5B from Tsinghua/Zhipu AI, an open 5B-parameter text-to-video diffusion transformer. Generates 6-second 720p clips with coherent motion and is widely used in research for its open weights and reproducibility.

replicatecogvideoxzhipu

CogVideoX-5B (open)

Zhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.

zhiputsinghuacogvideox

CogVLM2 19B

Tsinghua CogVLM2 19B with Llama-3 8B base plus 11B vision expert. Strong document understanding and visual reasoning, 8k context.

Cohere Aya 23 35B

Open-weights multilingual research model from Cohere covering 23 languages. 35B parameters.

coheremultilingualopen-weights

Cohere Command Light (legacy)

Text & ChatCohere

Cohere's fast lightweight chat model (deprecated Sep 2025). Kept as comparison tombstone.

coherelegacydeprecated

Cohere Command R (08-2024)

Text & ChatCohere

Cohere's mid-tier RAG/tool model. Cost-efficient sibling of Command R+ with 128k context.

cohereragtools

Cohere Command R+ (08-2024)

Text & ChatCohere

Cohere's flagship RAG- and tool-optimized chat model. 128k context, refreshed August 2024.

cohereragtools

Cohere embed-multilingual-v3

Cohere's multilingual embedding model. Supports 100+ languages with separate search and classification modes.

cohereembeddingmultilingual

Consistent Character

fofr's model generates the same character in many poses and angles from one reference image. Useful for building an avatar set or character sheet where the face and design stay consistent across outputs. Can produce a grid or individual images.

avatarportraitcharacter

ControlNet Canny

ControlNet conditioned on Canny edge maps. Preserves composition and outlines while restyling with Stable Diffusion 1.5 or SDXL backbones.

ControlNet Depth

ControlNet conditioned on depth maps. Preserves the 3D scene layout while letting the prompt change style, lighting and content.

high-qualityprompt-following

DALL-E 3

ImageOpenAI

OpenAI's latest image generation model. Excellent at following complex prompts with high fidelity.

€4.0015.0s

DDColor

DDColor by Xiaoyang Kang et al. colorizes black-and-white photos using dual decoders that jointly learn pixel colors and semantic color queries, giving vivid and natural results on old images.

colorizerestoreddcolor

Deepgram Nova-3

STTCustom

Deepgram's flagship STT. First to offer realtime multilingual transcription with self-serve customization.

deepgramstttranscription

DeepSeek Coder 1.3B Instruct

1.3B instruction-tuned code model from DeepSeek, trained on 2 trillion tokens of code and natural language across 87 languages with a 16k context window. One of the strongest tiny coders for its size, handling generation, completion and short coding instructions.

codehuggingfacedeepseek

DeepSeek Coder 33B Instruct (GGUF)

Quantized GGUF build of DeepSeek's 33B code model, trained on roughly 2T tokens that are about 87 percent code. Designed for repository-level completion and project-aware generation thanks to a 16k context window. Runs on Replicate as a per-call endpoint.

deepseekcodinginstruct

DeepSeek Coder V2

CodeDeepSeek

DeepSeek's specialized coding model. Excellent at code generation, debugging, and explanation.

Free2.0s

codingaffordable

DeepSeek R1

DeepSeek's reasoning model with chain-of-thought capabilities. Excellent for complex problem-solving.

Free8.0s

reasoningmath

DeepSeek V3

Powerful open-weight model from DeepSeek. Strong at coding, math, and Chinese/English tasks.

Free2.0s

affordablecoding

DeepSeek V4 Flash

Efficiency-optimized variant of DeepSeek V4. 284B MoE / 13B active, 1M context, ultra-low pricing for high-throughput workloads.

deepseekopen-weightsmoe

DeepSeek-VL 7B

DeepSeek-VL 7B chat model. Vision-language model with hybrid vision encoder and strong real-world visual question answering performance.

DINOv2 Skin Disease Classifier

DINOv2-base backbone fine-tuned for skin-disease image classification across 31 conditions, including basal cell carcinoma, lichen planus, lupus, herpes simplex, impetigo, leprosy variants and several genodermatoses. Broader than melanoma-only models. Research and educational use only, not a diagnostic.

Donut Document

Naver CLOVA Donut OCR-free document-understanding transformer. End-to-end JSON extraction from forms, receipts and invoices without explicit OCR.

Dots OCR

Rednote Hilab Dots OCR. End-to-end document parsing model with layout, text and reading-order prediction in one transformer.

DreamGaussian

Generative Gaussian-splatting model for fast image-to-3D synthesis. Produces textured meshes in two minutes via differentiable rasterization.

€0.09

DynamiCrafter

replicateanimationimage-to-video

Tencent DynamiCrafter. Animates still images into short videos preserving texture and structure, with strong open-domain coverage.

€0.09

EasyOCR

JaidedAI EasyOCR. Simple Python OCR wrapper supporting 80+ languages with deep-learning text detection and recognition.

EchoMimic

replicatelipsyncant-group

Ant Group EchoMimic. Lifelike audio-driven portrait animation with editable landmark conditioning for fine-grained motion control.

€0.10

Ecommerce Virtual Try-On

Try-on pipeline aimed at ecommerce listings. You give it a photo containing clothing on a body pose plus a separate face image; it composes a person wearing that clothing with the supplied face, controllable by prompt, CFG, and output size. Useful for generating on-model product shots from a flat garment image.

productvirtual-try-onvton

Edge TTS

TTSCustom

Microsoft Edge neural voices accessed via the open-source edge-tts wrapper. 400+ voices across 100+ locales, suitable for batch generation.

microsoftttsmultilingual

ESRGAN Classic

Enhanced Super-Resolution GAN, the original 2018 architecture. Produces sharp 4x upscales with strong perceptual quality on natural images.

replicateupscalingesrgan

F5-TTS

Open-source flow-matching TTS with strong zero-shot voice cloning. Code MIT, weights CC-BY-NC.

f5ttsopen-weights

F5-TTS

F5-TTS, a flow-matching TTS that clones a voice from a reference clip plus its transcript and reads new text in that voice. Fast non-autoregressive synthesis with optional silence removal.

replicatef5-ttstts

Face to Many

fofr's face stylizer converts a face photo into 3D render, emoji, pixel art, video-game character, claymation or toy styles. Uses InstantID plus style LoRAs on SDXL to keep the likeness while applying a chosen art style. Popular for fun avatars.

avatarportraitstylize

Face to Sticker

fofr's model turns a face photo into a die-cut sticker with a white border and transparent background. Uses InstantID to hold the likeness and outputs a clean PNG suitable for chat stickers or print. Simple single-image input.

avatarportraitsticker

FILM Frame Interpolation

VideoGoogle Research

Google FILM frame interpolation. Synthesizes high-quality intermediate frames between near-duplicate inputs, designed for large motion gaps.

replicateupscaleframe-interpolation

Florence-2 Large

Microsoft Florence-2 Large. Unified prompt-based vision foundation model for captioning, detection, segmentation and OCR with a single 770M-param backbone.

Florence-2 Segmentation

Microsoft Florence-2 unified vision model with referring expression segmentation. Text-prompted region and mask generation in one model.

FLUX PuLID

PuLID identity customization running on FLUX.1-dev. Inserts a face from one reference photo into prompt-driven scenes using contrastive alignment, giving higher likeness and detail than SDXL-era ID adapters. Good for realistic avatars and character portraits.

avatarportraitpulid

Flux Schnell

The fastest Flux model. Generate images in under 2 seconds. Great for prototyping.

€0.032.0s

FLUX.1 [dev]

The open-weight 12B rectified-flow transformer from Black Forest Labs. Close to FLUX Pro quality with a guidance-distilled checkpoint released under a non-commercial license. The most widely fine-tuned base in the FLUX family.

FLUX.1 [schnell]

The fastest FLUX model from Black Forest Labs, distilled to produce images in 1 to 4 steps. Apache 2.0 licensed for commercial use. Built for high-volume generation and real-time previews.

FLUX.1 [Schnell]

Black Forest Labs' fastest open-weights image model. Apache-2.0 licensed, ~1-4 step inference.

fluxblack-forest-labsopen-weights

FLUX.1 Canny

FLUX structural control via Canny edge maps. Preserve composition while restyling.

FLUX.1 Canny [dev]

Open-weight edge-guided FLUX model from Black Forest Labs. Extracts Canny edges from a control image and regenerates it from your prompt while holding the original composition and outlines, so you can restyle a scene without changing its structure.

FLUX.1 Depth

FLUX structural control via depth maps. Keep 3D scene layout while changing style/content.

FLUX.1 Depth [dev]

Open-weight depth-guided FLUX model from Black Forest Labs. Derives a depth map from the control image and regenerates from your prompt while preserving 3D spatial layout, useful for re-texturing rooms, products, or scenes without moving objects.

FLUX.1 Fill

Black Forest Labs' inpainting/outpainting model for FLUX. Fill masked regions with prompt-guided content.

FLUX.1 Fill [dev]

Black Forest Labs' open-weight inpainting and outpainting model, guidance-distilled from FLUX.1 Fill [pro]. You supply an image plus a mask and a prompt; it fills the masked region or extends the canvas with prompt-guided content that matches lighting and texture.

FLUX.1 Kontext [dev]

Open-weight version of FLUX.1 Kontext by Black Forest Labs. Instruction-based editing: pass an input image and a plain text edit ('change the jacket to red', 'remove the person on the left') and it applies the change while keeping the rest of the scene and identity consistent.

fluxkontextblack-forest-labs

FLUX.1 Redux

FLUX image-variation adapter. Generate variations and remixes from a reference image.

FLUX.1-dev Inpainting

FLUX.1-dev inpainting wrapper that fills masked parts of an image from a prompt. Useful when you want FLUX-quality fills with a simple image plus mask plus prompt interface and adjustable mask strength.

fluximage-editinpainting

Gemini 1.5 Flash (vision)

Google Gemini 1.5 Flash, the fast low-cost multimodal model. 1M-token context, image/audio/video input, good for high-volume captioning, classification and long-video skim tasks.

googlegeminivision

Gemini Robotics (2025)

RoboticsGoogle DeepMind

Google DeepMind's vision-language-action model based on Gemini 2.0. Generalist robot policy with strong dexterity.

googledeepmindgemini

Gemini Robotics-ER

RoboticsGoogle DeepMind

Embodied-reasoning variant of Gemini Robotics. Enhanced 3D spatial reasoning and trajectory planning.

googledeepmindgemini

Get3D (NVIDIA)

NVIDIA GET3D generative model for textured 3D shapes. Trained on category-specific datasets producing meshes with high-quality textures.

nvidia3d-generationopen-weights

GFPGAN v1.4

ImageTencent ARC

Tencent ARC face-restoration GAN. Reconstructs realistic facial detail in low-quality or compressed photos using a pretrained StyleGAN2 prior.

replicateface-restoreupscaling

GLPN Depth

Global-Local Path Networks depth-estimation model. Combines hierarchical transformer encoder with selective feature fusion for sharp boundaries.

Google RT-2-X

RoboticsGoogle DeepMind

Google's VLA from RT-X collaboration. Trained on Open-X-Embodiment (22 robots, 527 skills), positive transfer.

googlevlarobotics

Google Veo 3 Fast

Faster cheaper Veo 3 with audio

€3.2059.0s

fastaudio

Google Veo 3.1 Fast

Faster Veo 3.1 with image-to-video and audio

€3.2059.0s

fastaudioi2v

GOT-OCR 2.0

StepFun GOT-OCR 2.0. Unified end-to-end OCR-2.0 model handling text, formulas, charts, sheet music and geometric shapes in one architecture.

GPT-4o Mini

Small, fast, and affordable model for lightweight tasks. Great balance of speed and capability.

Free800ms

GPT-4o mini (vision)

OpenAI's small multimodal model with image input. Much cheaper than GPT-4o, 128k context, good for high-volume captioning, OCR-style reads, tagging and screenshot understanding.

openaivisionmultimodal

GPT-5 Mini

Smaller, faster, cheaper member of OpenAI's GPT-5 family. Tuned for high-throughput chat, classification and extraction where the full flagship is overkill.

openaigpt-5cost-efficient

GPT-5.1

OpenAI GPT-5.1 chat model (November 2025). An earlier GPT-5 point release kept available for compatibility. Good general-purpose reasoning and coding.

openaigpt-5reasoning

GPT-5.4 Nano

OpenAI's smallest and cheapest GPT-5.4 variant. Built for high-volume classification, extraction and coding subagents at edge-grade latency.

openaicost-efficientlow-latency

Granite Code 20B

IBM Granite 20B Code Instruct. Larger Granite code model balancing quality and inference cost for enterprise CI/CD code-review automation.

replicatecode-generationibm

Granite Code 8B

IBM Granite 8B Code Instruct. Trained on permissively-licensed code, strong on multi-language code completion and instruction-following.

replicatecode-generationibm

Grok 2 Vision

MultimodalxAI

xAI's vision-capable Grok 2 snapshot. Image-in, text-out with strong multilingual instruction following.

xaivisionlegacy

Grok 3

xAI's flagship model. Strong at reasoning, coding, and real-time knowledge with web search capabilities.

Free3.0s

reasoningreal-time

Grok 4.1 Fast

MultimodalxAI

xAI's cost-efficient high-throughput model. 2M context, optional reasoning, optimized for agentic loops and real-time apps.

xaicost-efficientvision

Grok 4.20 (Non-Reasoning)

xAI's Grok 4.20 standard snapshot. Skips the extended thinking pass for lower-latency answers on tasks that do not need deep deliberation. 1M token context window.

xaigroklong-context

Grok 4.20 Multi-Agent

xAI's Grok 4.20 multi-agent snapshot. Coordinates several specialized agents under one call to handle multi-step workflows that mix research, tool use and synthesis. 1M token context window.

xaigrokmulti-agent

Grok Build 0.1

CodexAI

xAI's Grok coding-focused model. Tuned for code generation and software development tasks with a 256k token context window for working over large codebases.

xaigrokcode

Grok Imagine Video

xAI video with native audio and lip-sync, up to 15s

€1.5090.0s

audioi2vxai

Grounded-SAM

Grounding DINO plus SAM. Open-vocabulary text-prompted detection and segmentation in one pipeline for fully-automatic mask generation.

GTE Large EN v1.5

Alibaba (Tongyi Lab) general text embedding model. The v1.5 release extends the context to 8192 tokens and returns 1024-dim vectors, scoring competitively on MTEB while handling much longer inputs than typical 512-token encoders. A practical open model when documents exceed the usual short-context limit.

minimaxhailuotext-to-video

Hailuo / MiniMax Video-01

VideoCustom

MiniMax's Hailuo video-01. 6s 1280x720 clips with strong cinematic motion and physical realism.

€0.43

Hailuo 2.3

VideoMinimax

Minimax model for realistic human motion and VFX

€0.5060.0s

i2v1080p

Hunyuan3D 2.0

ImageTencent

Tencent's Hunyuan3D 2.0 image-to-3D pipeline. Two-stage shape and texture generation producing high-resolution textured meshes.

€0.21

Hunyuan3D 2.1

Refreshed Hunyuan3D 2.1 with improved texture fidelity and PBR-material support. Image-to-3D with textured GLB output.

€0.24

HunyuanVideo

VideoTencent

Tencent's 13B open-weights video diffusion transformer. SOTA among open video models at release.

tencenthunyuantext-to-video

IC-Light (Product Relighting)

Lvmin Zhang's IC-Light packaged by zsxkib. Relights a product or portrait from a text prompt or a chosen light direction while keeping the subject's shape and detail intact, so a flat product photo can be given studio, window, or dramatic side lighting without re-shooting.

productic-lightrelight

Idefics3 8B

Hugging Face Idefics3 8B. Llama-3 based open-source vision-language model with strong document QA and chart-understanding performance.

Ideogram 2.0 Turbo

Ideogram's fast text-to-image variant. Strong typography and logo rendering at low latency.

ideogramtext-to-imagetypography

Ideogram v2

Ideogram's text-to-image model known for accurate in-image text and typography. Handles posters, logos, and signage where other models garble lettering. Supports magic prompt expansion and multiple aspect ratios.

replicateideogramtext-to-image

Ideogram v3 Turbo

Ideogram's fast v3 model, the fastest and cheapest tier of the v3 family. Known for accurate in-image text rendering and reliable typography, which most diffusion models still get wrong. Hosted API only.

replicateideogramtext-to-image

IDM-VTON (Virtual Try-On)

IDM-VTON virtual try-on from the CVPR 2024 paper. You give it a photo of a person and a garment image; it dresses the person in that garment while preserving pose, body shape, and the garment's pattern and text. Good for showing a clothing product on a model for an ecommerce listing.

productvirtual-try-onvton

InstantMesh

Image-to-3D mesh generator from sparse-view diffusion. Produces textured meshes in under one minute on a single A100.

€0.12

InstructPix2Pix

Berkeley InstructPix2Pix. Edits an image from natural-language instructions in a single forward pass. Trained on GPT-3 plus Stable Diffusion synthetic pairs.

IP-Adapter FaceID Plus v2

Tencent's face-identity conditioning adapter for SD/SDXL. Face embedding + CLIP for ID-consistent generation.

tencentimage-editface-id

Janus Pro 7B

DeepSeek's unified multimodal model. Decouples vision encoding for both understanding and generation tasks.

deepseekjanusopen-weights

Jina Embeddings v3 (Multilingual)

Jina's frontier multilingual embedding model. 570M params, 8192 ctx, 89 languages, Matryoshka dims 128-1024.

jinaembeddingmultilingual

Kling 1.6 Pro

VideoKuaishou

Kuaishou's Kling 1.6 Pro. Premium cinematic motion and physics realism, ~$0.07/sec.

€0.35

kuaishouklingtext-to-video

Kling v1.6 Pro

Kuaishou's Kling v1.6 Pro on Replicate. Generates 5s and 10s clips in 1080p from text or an image, with cinematic motion and physics realism. The widely used pro tier of the 1.6 generation.

replicatekuaishoukling

Kokoro TTS 82M

Open-weights 82M-parameter TTS. Punches above its size class on naturalness benchmarks at a fraction of the inference cost of larger models.

kokorottsopen-weights

Kuaishou Kolors

Kuaishou's bilingual (CN/EN) latent diffusion text-to-image model with strong text rendering.

kuaishoutext-to-imageopen-weights

LeRobot SmolVLA

RoboticsCustom

HuggingFace's 450M VLA pretrained on 487 community LeRobot datasets. Runs on consumer GPUs.

huggingfacelerobotvla

LivePortrait

Kuaishou LivePortrait. Efficient portrait animation driven by reference videos with stitching, retargeting and motion-control parameters.

€0.08

replicatelipsynckuaishou

Llama 3.2 Vision 11B (Ollama)

Meta Llama 3.2 11B Vision served via Ollama on Replicate. Open-weights multimodal model for image captioning, document and chart reading, and visual question answering.

replicatemetallama

Llama 3.2 Vision 90B

Meta Llama 3.2 90B Vision. Largest open-weights Llama vision model. Strong visual reasoning, chart, OCR and document understanding.

Llama 3.3 70B

Text & ChatMeta

Meta's open-source 70B parameter model. Strong all-around performance with multilingual support.

open-sourcepopular

LLaVA 1.6 Vicuna 13B

LLaVA 1.6 (LLaVA-NeXT) with a Vicuna-13B language backbone. Open vision-language chat model that describes images, answers questions, reads charts and reasons about scenes. Version 1.6 adds higher input resolution and better OCR and reasoning than LLaVA 1.5.

replicatellavacaptioning

LLaVA v1.6 34B

LLaVA v1.6 on a Nous-Hermes-2 34B base, served on Replicate. Open-source vision-language assistant for image question answering, description and visual reasoning at higher resolution.

replicatellavavision-understanding

LogoAI (SDXL Logo Generator)

SDXL fine-tune by mejiabrayan aimed at logo generation. Produces simple, centered mark and wordmark style logos from a text prompt. Useful for quick brand concepts and mockups. Raster PNG output, not vector.

replicatelogoicon

Lotus-G

Lotus generative depth model. Treats depth as a generation task using a diffusion model, producing higher-fidelity depth on textured surfaces.

LTX-Video (Lightricks)

VideoLightricks

Lightricks' 2B DiT video model. Realtime generation on consumer GPUs (~6s @ H100, 24fps).

lightricksltxtext-to-video

Luma Dream Machine v1.6

VideoCustom

Luma's Dream Machine 1.6. 720p text/image-to-video with strong motion and camera control.

€0.40

lumatext-to-videoimage-to-video

Luma Ray Flash 2

VideoLuma AI

Fast affordable video with I2V support

€0.5045.0s

fastbudgeti2v

Luma Ray-2 720p

VideoLuma AI

Luma Labs' Ray-2 at 720p on Replicate. Text and image-to-video producing 5s and 9s clips with fast, coherent motion and strong camera control. Successor to Dream Machine.

replicatelumaray-2

MagicAnimate

replicateanimationhuman-motion

ByteDance MagicAnimate. Temporally consistent human-image animation driven by a DensePose motion sequence with strong identity preservation.

€0.10

Magicoder S CL 7B

CodeCommunity

UIUC Magicoder S CL 7B. CodeLlama-7B fine-tuned with OSS-Instruct synthetic data. Strong HumanEval Plus and MBPP Plus performance per parameter.

replicatecode-generationopen-weights

MAGNeT

AudioCommunity

MAGNeT is Meta's masked, non-autoregressive audio generator. Instead of predicting tokens left to right it fills masked audio tokens in parallel over a few decoding steps, so generation is faster than autoregressive MusicGen at similar quality. This Replicate packaging exposes the text-to-music and text-to-sound variants.

metamagnetnon-autoregressive

MAGNeT MusicGen

Meta MAGNeT non-autoregressive music generator. Up to 7x faster than MusicGen with comparable quality via masked generative transformers.

metamusic-generationmagnet

Magnific-Style Upscaler

Detail-hallucinating upscaler in the Magnific style. Adds plausible high-frequency texture using a Stable Diffusion refiner conditioned on the low-res input.

replicateupscalingcreative

Marigold

ETH Zurich Marigold. Diffusion-based monocular depth-estimation model fine-tuned from Stable Diffusion with strong fine-detail recovery.

Marker PDF Extract

Marker PDF-to-Markdown conversion pipeline. Combines layout, OCR and equation models to produce clean Markdown with preserved tables and formulas.

Mask2Former

Meta Mask2Former universal image-segmentation transformer. Single architecture for panoptic, instance and semantic segmentation tasks.

MiDaS v3.1

Intel MiDaS v3.1 relative depth-estimation model. Robust zero-shot single-image depth across diverse domains and resolutions.

MiniCPM-V 2.6

OpenBMB MiniCPM-V 2.6. 8B vision-language model with strong single-image, multi-image and video understanding plus OCR capabilities.

Minimax Video

VideoMinimax

MiniMax's video generation model. Fast, high-quality video output with text-to-video capabilities.

€2.5090.0s

Mistral Large

Text & ChatMistral AI

Mistral's flagship model. Strong reasoning, multilingual, and coding capabilities.

multilingualcoding

Mochi 1

Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.

genmomochitext-to-video

Mochi 1

Genmo's Mochi 1, an open text-to-video model with high-fidelity motion built on a 10B Asymmetric Diffusion Transformer. Released under Apache 2.0, it was the largest open video model at launch and is strong on smooth, physically plausible movement.

replicategenmomochi

Molmo 7B

Allen AI Molmo 7B-D on Replicate. Open vision-language model trained on the PixMo data, notable for pointing at and locating objects in images, not just describing them.

replicateallenaimolmo

Moondream2

Moondream2 small vision-language model on Replicate. About 1.9B params, designed to run on edge devices, handles captioning, visual QA and short OCR-style reads at very low cost.

replicatemoondreamvision-understanding

Multilingual E5 Large

Microsoft E5 multilingual embedding model with 560M parameters, initialized from XLM-RoBERTa-large and trained with weakly supervised contrastive learning. Covers around 100 languages and returns 1024-dim vectors. It expects query: and passage: prefixes on inputs and is a popular open model for multilingual semantic search.

MuseTalk

Tencent MuseTalk real-time lip-sync model. Audio-driven mouth-region editing in latent space at 30+ fps on a single GPU.

replicatelipsynctencent

MusicGen Large

TTSMeta

Meta's 3.3B-parameter MusicGen Large. Text-conditioned music generation with single-stage autoregressive transformer, supports melody conditioning.

metamusic-generationopen-weights

mxbai-embed-large-v1

Mixedbread's open-source 335M embedding model. Top MTEB benchmark for English retrieval at release.

mixedbreadembeddingopen-weights

NCT-CRC-HE Tissue Classifier (ResNet50)

ResNet50 fine-tuned on the NCT-CRC-HE-45K colorectal histology dataset. It sorts an H&E tissue patch into nine classes: adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal mucosa (NORM), stroma (STR) and tumor epithelium (TUM). Research use only, not a diagnostic device.

NVIDIA Cosmos-Predict-1

RoboticsCustom

NVIDIA's world foundation model for physical AI. Diffusion-based video prediction for robotics simulation.

nvidiacosmosvla

Octo Base

RoboticsUC Berkeley

Berkeley/Stanford 93M transformer diffusion policy. Pretrained on 800k Open-X-Embodiment episodes.

berkeleystanfordvla

Octo Small

RoboticsUC Berkeley

Compact 27M variant of Octo. Faster inference on consumer GPUs, designed for low-latency control.

berkeleyvlarobotics

olmOCR

Allen AI olmOCR. Open-source 7B vision-language model fine-tuned for high-fidelity document parsing including math, code and tables.

OOTDiffusion (Try-On)

OOTDiffusion virtual try-on. Takes a clear photo of a model and an upper-body garment and renders the garment onto the person using an outfitting-fusion diffusion approach that keeps the garment's texture and the model's pose. A lightweight alternative to IDM-VTON for clothing previews.

productvirtual-try-onvton

OpenAI o3

OpenAI's o3 reasoning model. Spends compute on a private chain of thought before answering, strong at math, science and hard coding problems that benefit from deliberate reasoning.

openaio-seriesreasoning

OpenAI o4-mini

OpenAI's o4-mini reasoning model. A cost-efficient reasoning model that trades some depth for much lower price and latency, good for high-volume math and code tasks.

openaio-seriesreasoning

OpenAI TTS-1

TTSOpenAI

OpenAI's text-to-speech model. Six built-in voices with natural intonation.

€0.602.0s

OpenAI TTS-1 HD

TTSOpenAI

OpenAI's high-definition TTS model. Better quality for production use cases.

€1.204.0s

high-quality

OpenPose

replicateposevision-understanding

CMU OpenPose multi-person 2D pose estimator. Real-time keypoint detection for body, hand, face and foot using Part Affinity Fields.

€0.005

OpenVLA-7B

RoboticsOpenVLA

Stanford/Berkeley open VLA trained on 970k Open-X-Embodiment episodes. Supports LoRA fine-tuning.

stanfordberkeleyvla

OpenVoice v2

MyShell OpenVoice v2. Multilingual zero-shot voice cloning with accurate tone-color reproduction and style/emotion control.

myshellttsvoice-cloning

PaddleOCR v3

Baidu PaddleOCR v3 PP-OCR pipeline. Lightweight detector plus recognizer optimized for production use with 80+ language support.

Parler-TTS

Hugging Face Parler-TTS Mini. Lightweight TTS conditioned on a natural-language style description for fine-grained control over voice characteristics.

parlerttshuggingface

PCam Lymph-Node Tumor Detector (ResNet18)

ResNet18 from the TIA Toolbox zoo, trained on the PatchCamelyon (PCam) dataset of lymph-node histology patches from breast-cancer metastasis screening. It performs binary classification of a 96x96 H&E tile as tumor (metastatic tissue present) or normal. Research use only, not a diagnostic device.

Perplexity Sonar

Perplexity's fastest and cheapest web-grounded chat model. Live-source citations included.

perplexityweb-searchcitations

Perplexity Sonar Reasoning

Perplexity's reasoning model with chain-of-thought and integrated web search.

perplexityweb-searchreasoning

Phind CodeLlama 34B v2

Phind CodeLlama 34B v2. Highly tuned CodeLlama variant focused on retrieval-augmented developer assistant workflows.

replicatecode-generationphind

PhotoMaker

ImageTencent ARC

Tencent ARC PhotoMaker. Identity-preserving stylized photo generation from a stacked-ID embedding. Realistic re-styling of a subject in seconds.

RoboticsPhysical Intelligence

Physical Intelligence Pi-0-FAST

Autoregressive π-0 variant using FAST action tokenizer. Faster inference at competitive task success.

physical-intelligencevlarobotics

Physical Intelligence π-0

RoboticsPhysical Intelligence

Physical Intelligence's flagship VLA flow-matching policy. Generalist robot control, pretrained on 10k+ hrs robot data.

physical-intelligencevlarobotics

Physical Intelligence π-0.5

RoboticsPhysical Intelligence

Upgraded π-0 with open-world generalization via knowledge insulation. Weights and fine-tuning open-sourced.

physical-intelligencevlarobotics

Pika 2.0 (Official)

VideoPika

Pika Labs' 2.0 release. Cinematic text/image-to-video with scene composition controls.

€0.20

pikatext-to-videoimage-to-video

PixVerse v5.6

Physics-accurate video generation up to 1080p

€0.5060.0s

i2v1080pphysics

Playground v2.5 (1024px Aesthetic)

ImagePlayground AI

Playground AI's diffusion model tuned for aesthetics. SDXL-based architecture trained on the EDM formulation, rated by users as more visually pleasing than SDXL in their study. Strong on vivid color and contrast.

replicateplaygroundtext-to-image

Playground v3 (Design)

ImagePlayground AI

Playground's text-to-image model focused on graphic design aesthetics and embedded typography.

playgroundtext-to-imagedesign

PlayHT 2.0

TTSCustom

PlayHT's 2.0 generative voice model. Multi-lingual expressive speech synthesis with sub-second latency and high-fidelity voice cloning.

playhtttsvoice-cloning

Point-E

OpenAI Point-E text-to-point-cloud system. Fast 3D point-cloud generation from text, optionally lifted to a mesh via marching cubes.

replicate3d-generationopenai

Qwen 2.5 72B

Text & ChatAlibaba / Qwen

Alibaba's powerful open-source model. Excellent at coding, math, and multilingual tasks.

open-sourcecodingmultilingual

Qwen 2.5-Max

Alibaba's flagship pretrained MoE model. Top-tier reasoning and code performance via DashScope API.

qwenalibabamoe

Qwen-Image-Edit

ImageAlibaba / Qwen

Alibaba Qwen's instruction-driven image editor. Extends Qwen-Image's text-rendering ability to editing, so it handles both semantic edits (swap objects, change style) and precise text edits inside the image while preserving the original layout and unedited regions.

qwenalibabaimage-edit

Qwen2-VL 7B Instruct

Alibaba Qwen2-VL 7B served on Replicate. Open-weights vision-language model that chats about images and video, with dynamic resolution and strong OCR and document QA for its size.

replicateqwenalibaba

Qwen2.5-Coder 32B Instruct

Alibaba's largest open Qwen2.5-Coder model. Trained on a code-heavy corpus, it matches or beats much larger general models on code generation and repair benchmarks like HumanEval and MBPP, and supports over 40 programming languages with fill-in-the-middle completion.

qwenalibabacoding

Qwen2.5-Coder 7B Instruct

The 7B instruct member of Alibaba's Qwen2.5-Coder family. A lighter, faster option for code completion, generation, and bug fixing across 40+ languages, with a 128k context and fill-in-the-middle support. Good price-to-quality balance for everyday coding tasks.

qwenalibabacoding

Qwen2.5-VL 7B Instruct (HF)

Multimodalhuggingface

Alibaba Qwen2.5-VL 7B via Hugging Face Inference. Open-weights image-text-to-text model with improved OCR, chart and table reading, object grounding and long-document understanding.

huggingfaceqwenalibaba

RDT-1B

RoboticsCustom

Tsinghua's 1B diffusion-transformer bimanual manipulation policy. Predicts next 64 actions per inference.

tsinghuavlarobotics

Real-ESRGAN 4x

AI-Upscaler that increases image resolution up to 4x while preserving texture and detail. Trained on synthetic and real data to reduce common ESRGAN artifacts.

replicateupscalingimage-restore

Real-ESRGAN Anime 4x

Real-ESRGAN variant fine-tuned for anime, manga, and illustrated artwork. 4x upscaling with cartoon-aware artifact suppression.

replicateupscalinganime

Recraft V3

State-of-the-art image generation optimized for design and branding. SVG vector output support.

€0.6012.0s

designvectorbranding

Recraft V3 Realistic

ImageRecraft

Recraft's high-prompt-adherence raster image model. Strong layout control and brand-style consistency.

recrafttext-to-imagedesign

Recraft v3 SVG

Recraft's v3 variant that outputs vector SVG instead of raster pixels. Generates clean, editable logos, icons and illustrations that scale without quality loss, which is unusual among image models. Hosted API only.

replicaterecrafttext-to-image

Recraft V3 SVG

ImageRecraft

Recraft's vector/SVG generation model. Editable illustrations and icons from text.

€0.08

recrafttext-to-svgvector

Recraft V4 SVG

Recraft V4 SVG turns a text prompt into production-ready SVG vector art with clean geometry and structured, editable layers. Newer generation than V3 with improved design quality on logos, icons and flat illustration. Returns true vector paths, not a traced bitmap.

svgvectorrecraft

Reka Core

MultimodalCustom

Reka's frontier multimodal model supporting text, image, video and audio inputs.

rekamultimodalvideo-understanding

Reka Edge

MultimodalCustom

Reka's small on-device-friendly multimodal model. ~7B parameters, 16k context.

rekamultimodaledge

Reka Flash

MultimodalCustom

Reka's 21B dense multimodal model balancing speed and quality. Up to 128k context.

rekamultimodalcost-efficient

Rembg

Open-source background-removal tool wrapping U2Net. Produces alpha mattes for photos, products and people with no manual masking.

replicatebackground-removalmatting

Remove Background (lucataco)

Lucataco's remove-bg, a rembg-based background removal model that returns the foreground subject on a transparent background. A popular, low-cost option for quick product and portrait cutouts.

background-removallucatacorembg

Remove Object (LaMa)

Object removal and cleanup using LaMa inpainting. Paint a mask over an unwanted object, logo or person and the model fills the area with plausible background, erasing it from the photo.

background-removalobject-removallama

Replit Code v1 3B

Replit's 3B code-completion model, trained on a permissively licensed code subset of the Stack across 20 programming languages. Built for low-latency autocomplete rather than chat. Served on Replicate per call.

replitcodingcompletion

Replit Code v1.5 3B

3B code completion model from Replit trained on roughly 1 trillion tokens of permissively licensed code across 30 programming languages, with a 4k context window. Designed for autocomplete-style code generation and fill-in-the-middle.

codehuggingfacereplit

RIFE Frame Interpolation

Real-Time Intermediate Flow Estimation. Doubles or quadruples FPS of an existing video via learned optical-flow-based frame interpolation.

replicateupscaleframe-interpolation

Riffusion

TTSRiffusion

Stable-Diffusion-based real-time music generator. Operates on spectrogram images then resynthesizes audio, enables seamless transitions and looping.

riffusionmusic-generationopen-weights

Runway Gen-3 Alpha Turbo

VideoCustom

Runway's faster, cheaper Gen-3 variant. Image-to-video at 5 credits/sec (~$0.05/sec).

runwayimage-to-videofast

RVC Voice Conversion

Retrieval-based Voice Conversion. Converts a source recording into a target speaker's voice, preserving pitch, prosody and rhythm.

rvcvoice-conversionvoice-cloning

SadTalker

replicatelipsynctalking-head

Stylized audio-driven talking-head generator. Synthesizes 3D motion coefficients from audio to animate a single portrait image with natural head movements.

€0.07

SciBERT (scivocab uncased)

AllenAI BERT-base pretrained from scratch on 1.14M scientific papers (mostly biomedical and computer science) with its own scientific WordPiece vocabulary. Used as a feature extractor it gives 768-dim contextual embeddings tuned to scientific text, outperforming general BERT on tasks like NER and relation extraction in research corpora.

SDXL Emoji

SDXL fine-tune by fofr trained on Apple emoji art. Generates rounded, glossy emoji and icon style graphics from a text prompt, useful for custom reactions, app glyphs and playful icon sets. Raster output.

replicateemojiicon

SDXL Inpainting

SDXL inpainting built on the Hugging Face Diffusers inpaint pipeline. Replace or remove masked regions of an image with prompt-conditioned content at SDXL resolution. A cheap, well-understood baseline for object removal and local edits.

sdxlstability-aiimage-edit

SeamlessM4T

Meta's SeamlessM4T multimodal translation model. Takes speech or text input and produces transcription or translation across about 100 languages, including speech-to-text and speech-to-speech. One model covers ASR plus cross-lingual translation without chaining separate systems.

replicatemetaseamless

SeamlessM4T v2 Large (Speech)

Meta SeamlessM4T v2 Large speech mode. Speech-to-speech, speech-to-text, and text-to-speech translation across 100+ languages in a single unified model.

replicatetranslationmeta

SeamlessM4T v2 Large (Text)

Text & ChatCommunity

Meta SeamlessM4T v2 Large. Universal multilingual translation across 100+ languages with text-to-text mode for documents and chat.

replicatetranslationmeta

Seedance Lite

VideoByteDance

Budget ByteDance video, fast and cheap

€0.5070.0s

budgeti2vfast

Seedance Pro

VideoByteDance

ByteDance video with T2V and I2V, up to 1080p

€1.0095.0s

i2v1080p

Segformer B5

NVIDIA SegFormer-B5 semantic segmentation. Hierarchical transformer encoder with lightweight MLP decoder, strong ADE20k and Cityscapes results.

Shap-E (OpenAI)

OpenAI Shap-E text/image to 3D. Generates implicit neural representations renderable as textured meshes or NeRFs.

replicate3d-generationopenai

Skin Cancer Classifier (Swin, ISIC)

Swin Transformer skin-lesion classifier trained on an ISIC-style skin cancer dataset. Predicts eight lesion classes including melanoma, basal cell carcinoma, squamous cell carcinoma, actinic keratosis, nevus, dermatofibroma, benign keratosis and vascular lesion. Research and educational use only, not for diagnosis.

Skin Cancer Image Classification (ViT, HAM10000)

Vision Transformer fine-tuned on the HAM10000 dermatoscopy dataset. Classifies a skin-lesion image into seven categories: melanoma, melanocytic nevi, basal cell carcinoma, actinic keratoses, benign keratosis-like lesions, dermatofibroma and vascular lesions. Research and educational use only, not a diagnostic tool.

Skin Type Image Detection (ViT)

ViT image classifier by dima806 that labels a facial or skin photo as dry, normal or oily skin type. Aimed at skincare and cosmetics research rather than disease detection, and not a medical diagnostic. Research and educational use only.

Snowflake Arctic Instruct

Snowflake's open MoE model: 480B total / 17B active params with dense+MoE hybrid architecture.

snowflakemoeopen-weights

Spark TTS

Spark efficient TTS with disentangled control over speaker, content and style. Strong cross-lingual zero-shot performance.

sparkttsvoice-cloning

SPIDER Colorectal Pathology Classifier

Patch-level colorectal pathology classifier from HistAI, built on the Hibou-L foundation model and trained on the SPIDER colorectal dataset with expert-annotated labels. It classifies a 1120x1120 H&E patch into pathology classes such as high- and low-grade adenocarcinoma, normal mucosa and other tissue types. Research use only, not a diagnostic device.

Stable Audio 2

TTSUdio

Stability AI's Stable Audio 2.0. Text-to-music up to 3 minutes of full-length, structured tracks at 44.1 kHz.

stabilitymusic-generationpricing-tbd

Stable Audio Open 1.0

AudioReplicate

Stability AI's Stable Audio Open generates short audio from text prompts, tuned for sound effects, drum loops, instrument riffs and production elements rather than full songs. Open weights, latent diffusion over a 44.1kHz audio autoencoder, with a configurable seconds_total up to about 47 seconds.

stability-aistable-audiosound-effects

Stable Code Instruct 3B

Instruction-tuned 3B code model from Stability AI, fine-tuned from stable-code-3b for chat-style coding tasks. Handles code generation, explanation and fix-up across multiple languages and was competitive with larger code models on benchmarks at release.

codehuggingfacestability-ai

Stable Diffusion 3.5 Large

Stability AI's 8B MMDiT-based flagship. Open weights at 1MP with improved typography and prompt adherence over SDXL. The largest model in the SD 3.5 release line.

replicatestability-aistable-diffusion

Stable Diffusion 3.5 Large (Stability)

stabilitytext-to-imageopen-weights

Stability AI's 8B-parameter flagship SD3.5 model. Strong prompt adherence and aesthetic quality.

€0.07

Stable Diffusion 3.5 Large Turbo

Distilled, 4-step version of SD 3.5 Large from Stability AI. Keeps most of the large model's quality and text rendering at a fraction of the inference time. Open weights under the Stability Community License.

replicatestability-aistable-diffusion

Stable Diffusion 3.5 Large Turbo

Distilled 4-step variant of SD3.5 Large. 8B params, ~4x faster inference at competitive quality.

stabilitytext-to-imageopen-weights

Stable Diffusion 3.5 Medium

Stability AI's 2.5B-parameter SD3.5 with strong quality/speed trade-off. Consumer-GPU friendly.

stabilitytext-to-imageopen-weights

Stable Diffusion XL

replicatecode-generationbigcode

Stability AI's SDXL model via Replicate. High-quality image generation with extensive customization.

€0.208.0s

open-sourcecustomizable

StarCoder2 15B

CodeCommunity

BigCode StarCoder2 15B code-generation flagship. Trained on 4T tokens of Stack v2 data with grouped-query attention and 16k context.

€0.005

StarVector 8B (image-to-SVG)

StarVector 8B is a multimodal model that generates SVG code directly from an input image. Rather than tracing pixels, it predicts the SVG markup token by token, which can produce compact, semantically structured paths for icons and simple graphics. Research model from the StarVector project.

svgvectorstarvector

StreamingT2V

replicateanimationlong-form

Picsart StreamingT2V. Generates long, consistent videos by chaining short autoregressive clips with motion and appearance memory.

€0.15

StyleTTS 2

Style-based TTS using diffusion and adversarial training. Human-level naturalness in zero-shot voice synthesis from a 3-5s reference clip.

stylettsttsvoice-cloning

Suno Bark

TTSSuno

Suno's text-prompted generative audio model. Speech, music, ambient sound and effects with non-verbal cues like laughter or sighs.

sunobarkmusic-generation

SUPIR

SUPIR by Fanghua Yu et al. is a large diffusion-based restoration model that recovers photorealistic detail from heavily degraded images and can be steered with a text prompt describing the scene.

restoresuper-resolutionsupir

SUPIR Upscaler

SUPIR (Scaling-Up Image Restoration) photo-real restoration model. Combines SDXL prior with language-guided controls for severely degraded inputs.

replicateupscalingimage-restore

Swin2SR

Transformer-based image super-resolution using Swin-V2 attention. Handles classical, lightweight, real-world, and compressed-input variants with 2x/4x upscaling.

replicateupscalingtransformer

SwinIR Video

SwinIR transformer-based super-resolution and denoising applied per-frame to video. Handles classic, real-world and lightweight upscaling.

replicateupscaletransformer

ToonCrafter

replicateanimationtooncrafter

Tencent ToonCrafter generative cartoon interpolation model. Synthesizes smooth in-between frames between two cartoon keyframes.

€0.08

Tortoise TTS

Multi-voice expressive TTS. Slow but high-quality with strong prosody and natural intonation. Trained for long-form narration use cases.

tortoisettsexpressive

TRELLIS (3D)

Microsoft TRELLIS image-to-3D model. Generates textured 3D assets in GLB or Gaussian-splat format from a single reference image.

€0.18

TripoSR

Stability AI and Tripo single-image 3D reconstruction model. Generates 3D meshes from a single image in roughly half a second.

Udio V1.5

AudioReplicate

AI music generation with studio-quality output. Generate full songs with vocals, instruments, and production.

€2.0060.0s

musicvocalshigh-quality

V-Express

Tencent V-Express. Audio-driven portrait animation with progressive training, weak-condition learning, and expressive lip sync.

€0.09

replicatelipsynctencent

Vectorizer (VTracer)

PNG/JPG to SVG vectorizer built on VTracer, the open-source raster-to-vector engine. Traces a bitmap into layered color regions and clean paths with controls for color count, area threshold and path simplification. Fast, deterministic alternative to model-based vectorizers.

svgvectorvtracer

VideoCrafter

replicateupscalevideo-generation

Tencent VideoCrafter latent video diffusion. Text-to-video and image-to-video generation up to 2s at 1024x576 with strong motion fidelity.

€0.07

ViT Brain Tumor MRI Classifier

ViT base fine-tuned on brain MRI slices to classify tumor type. Given an MRI image it returns scores for glioma, meningioma, pituitary tumor or no tumor. Most-downloaded brain-tumor classifier in this search. Research and education only, not a diagnostic tool.

ViT Chest X-ray Pneumonia

Vision Transformer fine-tuned on the Kaggle chest x-ray pneumonia dataset. Given a frontal chest radiograph it predicts NORMAL versus PNEUMONIA with class scores. A widely used baseline for pneumonia screening experiments. Research and education only, not a diagnostic tool.

ViT COVID-19 CT Scan Classifier

ViT base (patch16-224, ImageNet-21k pretrained) fine-tuned on lung CT scans to flag COVID-19 findings. Takes a CT slice image and returns COVID versus non-COVID class scores. Built for research on CT-based COVID screening. Research and education only, not a diagnostic tool.

ViT Diabetic Retinopathy Grading

Vision Transformer fine-tuned on retinal fundus photographs to grade diabetic retinopathy severity. Given a fundus image it returns scores across the five-level scale (0 no DR through 4 proliferative DR). Most-downloaded retinopathy classifier in this search. Research and education only, not a diagnostic tool.

ViT HAM10000 Sharpened Skin Lesion Classifier

ViT-base classifier fine-tuned on HAM10000 dermatoscopy images with a sharpening preprocessing step. Predicts the seven standard HAM10000 lesion classes (akiec, bcc, bkl, df, mel, nv, vasc) for a single skin-lesion image. Research and educational use only, not a medical diagnostic.

Voyage AI voyage-code-3

Voyage's code-specialized embedding model. Up to 32k context, Matryoshka 256-2048 dims, int8/binary support.

voyageembeddingcode

Wan 2.1 (Alibaba)

Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.

alibabawantext-to-video

Wan 2.1 I2V 720p

Image-to-video variant of Alibaba's Wan 2.1 14B at 720p, accelerated by WaveSpeedAI. Animates a still input image into a short clip driven by a text prompt, keeping the source composition while adding motion.

replicatewanalibaba

Wan 2.1 T2V 720p (Accelerated)

Accelerated inference for Alibaba's Wan 2.1 14B text-to-video at 720p, hosted by WaveSpeedAI on Replicate. Open suite of video foundation models with high-resolution output and faster generation.

replicatealibabawan

Wan 2.2 Image-to-Video

Ultra-cheap I2V. Upload image and animate it.

€0.1030.0s

budgeti2vfast

Wan 2.2 Text-to-Video

Ultra-cheap T2V for pennies

€0.1030.0s

budgetfast

Wav2Lip

Lip-sync model that re-syncs a target video's lip movement to an arbitrary audio track. Robust to identity and language with a lip-sync discriminator loss.

replicatelipsyncvideo-edit

Whisper Diarization

Whisper Large v3 Turbo combined with pyannote 4.0 for speaker diarization, returning who-said-what segments with timestamps. Built by Thomas Mol. Returns a clean JSON of speaker-labeled segments, handy for meeting notes, interviews, and podcasts.

replicatewhisperstt

WhisperX

STTReplicate

WhisperX (Large v3) with forced alignment for accurate word-level timestamps plus optional speaker diarization. Uses VAD to cut long files into segments and a wav2vec2 aligner to pin each word to its exact time. Useful for subtitles and per-speaker transcripts.

replicatewhisperxstt

White Blood Cell Classifier (ViT)

Vision-transformer classifier for peripheral-blood smear images. It labels a single white-blood-cell crop as one of four leukocyte types: eosinophil, lymphocyte, monocyte or neutrophil. Trained on a public blood-cell image dataset and meant for research and teaching, not clinical hematology.