Image Generation

Black Forest Labs' flagship text-to-image model. Faster generation than FLUX.1 Pro at higher prompt adherence, with strong photorealism and reliable spatial composition. Runs as a hosted Replicate model.

Flux 1.1 Pro Ultra

high-qualityphotorealistic

FLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.

€0.6015.0s

FLUX 1.1 Pro Ultra

FLUX 1.1 Pro in Ultra mode by Black Forest Labs. Generates up to 4 megapixel images with a raw mode for less processed, more natural-looking photography. Best FLUX option when output resolution and fine detail matter.

€5.00

Flux Dev

Black Forest Labs' development model. Fast, high-quality image generation with LoRA support.

€0.5010.0s

popularfastlora

Google Imagen 4

ImageGoogle DeepMind

Google DeepMind's Imagen 4 text-to-image model, hosted on Replicate. Sharp detail, accurate text rendering, and strong prompt adherence across photographic and illustrated styles. Outputs up to 2K resolution.

replicategoogleimagen

Google Imagen 4

ImageGoogle DeepMind

Google's Imagen 4. Text-to-image with strong photorealism and improved typography support.

googleimagentext-to-image

Google Imagen 4 Ultra

ImageGoogle DeepMind

googleimagentext-to-image

Premium Imagen 4 tier. Highest fidelity, prompt adherence and typography quality from Google.

€0.06

Icons (SDXL Flat Pop)

SDXL fine-tune by galleri5 for slick flat icons and pop constructivist graphics with thick edges. Trained on Bing generations, it produces clean single-subject icon art that suits app icons, badges and UI glyphs. Raster output, not true vector.

replicateiconlogo

Ideogram 3.0

ideogramtext-to-imagetypography

Ideogram's flagship text-to-image model with industry-leading text rendering and prompt adherence.

€0.0915.0s

Ideogram v3 Quality

The highest-quality tier of Ideogram v3. Improved photorealism and prompt adherence over v2 while keeping Ideogram's best-in-class text rendering. Supports style references and inline text layout.

replicateideogramtext-to-image

InstantID

InstantID makes realistic portraits of a real person from a single reference photo without per-user training. Combines a face encoder with an IdentityNet adapter on SDXL to keep identity and pose while following a text prompt, so it is fast and tuning-free.

avatarportraitinstant-id

Kather100K Colorectal Tissue Classifier (ResNet50)

ResNet50 from the TIA Toolbox model zoo, trained on the Kather100K dataset of 100,000 hematoxylin-and-eosin colorectal histology patches. It classifies a tissue tile into one of nine categories such as tumor epithelium, stroma, lymphocytes, mucus, smooth muscle, debris, adipose, background and normal mucosa. Research use only, not a diagnostic device.

Midjourney V7

high-qualityaestheticpopular

NewPopular

The latest Midjourney model. Industry-leading aesthetic quality and prompt adherence for image generation.

€3.0030.0s

Professional Headshot (FLUX Kontext)

Turns any single selfie into a clean professional headshot using FLUX Kontext image editing. Keeps the person's face while swapping to business attire, a studio background and even lighting. Aimed at LinkedIn-style profile photos.

avatarportraitheadshot

Recraft 20B SVG

Recraft's faster, cheaper vector model. Outputs editable SVG paths instead of raster pixels, so logos, icons and flat illustrations scale to any size without blur. Defaults to a vector_illustration style and supports line art and engraving looks. Hosted API only.

replicaterecraftsvg

Recraft V3

Recraft's text-to-image model that topped the Hugging Face text-to-image arena at release. Strong long-text rendering, brand-style consistency, and precise control over image dimensions and color palettes.

replicaterecrafttext-to-image

Recraft Vectorize

Recraft's raster-to-vector converter. Takes a PNG or JPG and traces it into a clean SVG with precise vector paths, aimed at logos, icons and graphics that need to scale. Image-to-SVG counterpart to Recraft's text-to-SVG models.

svgvectorrecraft

Stable Diffusion XL

Stability AI's SDXL 1.0 with the optional refiner. The 3.5B base plus 6.6B ensemble UNet that became the default open image model before FLUX. Good for fine-tuning and LoRAs, broad community support.

replicatesdxlstability-ai

Sticker Maker

fofr's sticker generator that outputs graphics with transparent backgrounds, so the result drops straight into chat apps or print sheets. Runs an SDXL-based pipeline at high speed (default 17 steps) and returns die-cut style art without manual background removal.

replicatestickertransparent

ViT Chest X-ray Classifier

Vision Transformer (ViT) fine-tuned on chest x-ray images for multi-class thoracic findings. Given a single frontal chest radiograph it returns class probabilities across several disease categories. One of the more downloaded chest x-ray classifiers on the Hugging Face Hub. Research and education only, not a diagnostic tool.

851-Labs Background Remover

Background removal model from 851-Labs that outputs a clean cutout with a transparent alpha channel. One of the most-run background removers on Replicate, handles people, products and objects on busy backgrounds.

background-removal851-labscutout

Ad Inpaint (Product Photo)

Product advertising photo generator. You upload a cut-out product shot and a prompt describing the scene; it places the product on a new generated background with matching lighting and shadows, so a plain packshot becomes an ecommerce or ad-ready hero image without a photo studio.

productecommerceproduct-photo

AuraFlow v0.3

fal.ai's fully open-source 6.8B flow-based text-to-image model. Up to 1536x1536 resolution.

auraflowtext-to-imageopen-weights

BiRefNet Background Removal

BiRefNet high-resolution dichotomous image segmentation for background removal. Bilateral reference network that produces sharp matting on fine detail like hair, fur and thin structures, often cleaner than older U2Net or rembg models.

background-removalbirefnetsegmentation

Bone Fracture Detection (X-ray)

Image classifier by prithivMLmods that labels a bone x-ray as Fractured or Not Fractured. Given a single radiograph it returns binary class scores. One of the more downloaded fracture classifiers on the Hub. Research and education only, not a diagnostic tool.

BRIA Remove Background

BRIA AI's commercial background removal model trained on fully licensed data. Produces accurate cutouts for e-commerce and design, with attention to clean edges around products and people.

background-removalbriaecommerce

BRIA RMBG-1.4

BRIA's first commercial-safe background-removal model. Trained on fully-licensed data, suitable for production e-commerce and design pipelines.

replicatebackground-removalbria

BRIA RMBG-2.0

BRIA's professional background-removal model trained on fully-licensed data. Commercial-safe.

briaimage-editbackground-removal

Bringing Old Photos Back to Life

ImageMicrosoft

Microsoft Research pipeline by Ziyu Wan et al. that restores scanned old photos, removing scratches, dust and fading and optionally enhancing faces in one pass.

restoreold-photoscratch-removal

Cartoonify

catacolabs Cartoonify turns a photo into a flat cartoon illustration. Takes a single image and returns a stylized cartoon version with clean shapes and bold outlines. Straightforward one-input model for avatars and profile pictures.

avatarportraitcartoon

CCSR (Content-Consistent SR)

Content-Consistent Super-Resolution model. Reduces hallucination compared to typical diffusion-based upscalers while keeping perceptual quality high.

replicateupscalingimage-restore

Clarity Upscaler

High-resolution image upscaler with creative detail re-imagination via SD-based hallucination. Strong for photography and product shots.

replicateupscalingcreative

CodeFormer

replicateface-restoreupscaling

Robust face-restoration model using a transformer-based codebook prior. Handles severe degradation, occlusion, and old-photo restoration with adjustable fidelity-quality tradeoff.

€0.002

Consistent Character

fofr's model generates the same character in many poses and angles from one reference image. Useful for building an avatar set or character sheet where the face and design stay consistent across outputs. Can produce a grid or individual images.

avatarportraitcharacter

ControlNet Canny

ControlNet conditioned on Canny edge maps. Preserves composition and outlines while restyling with Stable Diffusion 1.5 or SDXL backbones.

€0.01

ControlNet Depth

ControlNet conditioned on depth maps. Preserves the 3D scene layout while letting the prompt change style, lighting and content.

€0.01

high-qualityprompt-following

DALL-E 3

ImageOpenAI

OpenAI's latest image generation model. Excellent at following complex prompts with high fidelity.

€4.0015.0s

DDColor

DDColor by Xiaoyang Kang et al. colorizes black-and-white photos using dual decoders that jointly learn pixel colors and semantic color queries, giving vivid and natural results on old images.

colorizerestoreddcolor

DINOv2 Skin Disease Classifier

DINOv2-base backbone fine-tuned for skin-disease image classification across 31 conditions, including basal cell carcinoma, lichen planus, lupus, herpes simplex, impetigo, leprosy variants and several genodermatoses. Broader than melanoma-only models. Research and educational use only, not a diagnostic.

DreamGaussian

Generative Gaussian-splatting model for fast image-to-3D synthesis. Produces textured meshes in two minutes via differentiable rasterization.

€0.09

Ecommerce Virtual Try-On

Try-on pipeline aimed at ecommerce listings. You give it a photo containing clothing on a body pose plus a separate face image; it composes a person wearing that clothing with the supplied face, controllable by prompt, CFG, and output size. Useful for generating on-model product shots from a flat garment image.

productvirtual-try-onvton

ESRGAN Classic

Enhanced Super-Resolution GAN, the original 2018 architecture. Produces sharp 4x upscales with strong perceptual quality on natural images.

replicateupscalingesrgan

Face to Many

fofr's face stylizer converts a face photo into 3D render, emoji, pixel art, video-game character, claymation or toy styles. Uses InstantID plus style LoRAs on SDXL to keep the likeness while applying a chosen art style. Popular for fun avatars.

avatarportraitstylize

Face to Sticker

fofr's model turns a face photo into a die-cut sticker with a white border and transparent background. Uses InstantID to hold the likeness and outputs a clean PNG suitable for chat stickers or print. Simple single-image input.

avatarportraitsticker

FLUX PuLID

PuLID identity customization running on FLUX.1-dev. Inserts a face from one reference photo into prompt-driven scenes using contrastive alignment, giving higher likeness and detail than SDXL-era ID adapters. Good for realistic avatars and character portraits.

avatarportraitpulid

Flux Schnell

The fastest Flux model. Generate images in under 2 seconds. Great for prototyping.

€0.032.0s

fastaffordable

FLUX.1 [dev]

The open-weight 12B rectified-flow transformer from Black Forest Labs. Close to FLUX Pro quality with a guidance-distilled checkpoint released under a non-commercial license. The most widely fine-tuned base in the FLUX family.

FLUX.1 [schnell]

The fastest FLUX model from Black Forest Labs, distilled to produce images in 1 to 4 steps. Apache 2.0 licensed for commercial use. Built for high-volume generation and real-time previews.

FLUX.1 [Schnell]

fluxblack-forest-labsopen-weights

Black Forest Labs' fastest open-weights image model. Apache-2.0 licensed, ~1-4 step inference.

€0.003

FLUX.1 Canny

FLUX structural control via Canny edge maps. Preserve composition while restyling.

FLUX.1 Canny [dev]

Open-weight edge-guided FLUX model from Black Forest Labs. Extracts Canny edges from a control image and regenerates it from your prompt while holding the original composition and outlines, so you can restyle a scene without changing its structure.

FLUX.1 Depth

FLUX structural control via depth maps. Keep 3D scene layout while changing style/content.

FLUX.1 Depth [dev]

Open-weight depth-guided FLUX model from Black Forest Labs. Derives a depth map from the control image and regenerates from your prompt while preserving 3D spatial layout, useful for re-texturing rooms, products, or scenes without moving objects.

FLUX.1 Fill

Black Forest Labs' inpainting/outpainting model for FLUX. Fill masked regions with prompt-guided content.

FLUX.1 Fill [dev]

Black Forest Labs' open-weight inpainting and outpainting model, guidance-distilled from FLUX.1 Fill [pro]. You supply an image plus a mask and a prompt; it fills the masked region or extends the canvas with prompt-guided content that matches lighting and texture.

FLUX.1 Kontext [dev]

Open-weight version of FLUX.1 Kontext by Black Forest Labs. Instruction-based editing: pass an input image and a plain text edit ('change the jacket to red', 'remove the person on the left') and it applies the change while keeping the rest of the scene and identity consistent.

fluxkontextblack-forest-labs

FLUX.1 Redux

FLUX image-variation adapter. Generate variations and remixes from a reference image.

FLUX.1-dev Inpainting

FLUX.1-dev inpainting wrapper that fills masked parts of an image from a prompt. Useful when you want FLUX-quality fills with a simple image plus mask plus prompt interface and adjustable mask strength.

fluximage-editinpainting

Get3D (NVIDIA)

NVIDIA GET3D generative model for textured 3D shapes. Trained on category-specific datasets producing meshes with high-quality textures.

nvidia3d-generationopen-weights

GFPGAN v1.4

ImageTencent ARC

Tencent ARC face-restoration GAN. Reconstructs realistic facial detail in low-quality or compressed photos using a pretrained StyleGAN2 prior.

€0.002

replicateface-restoreupscaling

Hunyuan3D 2.0

ImageTencent

Tencent's Hunyuan3D 2.0 image-to-3D pipeline. Two-stage shape and texture generation producing high-resolution textured meshes.

€0.21

Hunyuan3D 2.1

New

Refreshed Hunyuan3D 2.1 with improved texture fidelity and PBR-material support. Image-to-3D with textured GLB output.

€0.24

IC-Light (Product Relighting)

Lvmin Zhang's IC-Light packaged by zsxkib. Relights a product or portrait from a text prompt or a chosen light direction while keeping the subject's shape and detail intact, so a flat product photo can be given studio, window, or dramatic side lighting without re-shooting.

productic-lightrelight

Ideogram 2.0 Turbo

Ideogram's fast text-to-image variant. Strong typography and logo rendering at low latency.

ideogramtext-to-imagetypography

Ideogram v2

Ideogram's text-to-image model known for accurate in-image text and typography. Handles posters, logos, and signage where other models garble lettering. Supports magic prompt expansion and multiple aspect ratios.

replicateideogramtext-to-image

Ideogram v3 Turbo

Ideogram's fast v3 model, the fastest and cheapest tier of the v3 family. Known for accurate in-image text rendering and reliable typography, which most diffusion models still get wrong. Hosted API only.

replicateideogramtext-to-image

IDM-VTON (Virtual Try-On)

IDM-VTON virtual try-on from the CVPR 2024 paper. You give it a photo of a person and a garment image; it dresses the person in that garment while preserving pose, body shape, and the garment's pattern and text. Good for showing a clothing product on a model for an ecommerce listing.

productvirtual-try-onvton

InstantMesh

Image-to-3D mesh generator from sparse-view diffusion. Produces textured meshes in under one minute on a single A100.

€0.12

InstructPix2Pix

Berkeley InstructPix2Pix. Edits an image from natural-language instructions in a single forward pass. Trained on GPT-3 plus Stable Diffusion synthetic pairs.

€0.01

IP-Adapter FaceID Plus v2

Tencent's face-identity conditioning adapter for SD/SDXL. Face embedding + CLIP for ID-consistent generation.

tencentimage-editface-id

Janus Pro 7B

DeepSeek's unified multimodal model. Decouples vision encoding for both understanding and generation tasks.

deepseekjanusopen-weights

Kuaishou Kolors

Kuaishou's bilingual (CN/EN) latent diffusion text-to-image model with strong text rendering.

kuaishoutext-to-imageopen-weights

LogoAI (SDXL Logo Generator)

SDXL fine-tune by mejiabrayan aimed at logo generation. Produces simple, centered mark and wordmark style logos from a text prompt. Useful for quick brand concepts and mockups. Raster PNG output, not vector.

replicatelogoicon

Magnific-Style Upscaler

replicateupscalingcreative

Detail-hallucinating upscaler in the Magnific style. Adds plausible high-frequency texture using a Stable Diffusion refiner conditioned on the low-res input.

€0.06

NCT-CRC-HE Tissue Classifier (ResNet50)

ResNet50 fine-tuned on the NCT-CRC-HE-45K colorectal histology dataset. It sorts an H&E tissue patch into nine classes: adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal mucosa (NORM), stroma (STR) and tumor epithelium (TUM). Research use only, not a diagnostic device.

OOTDiffusion (Try-On)

OOTDiffusion virtual try-on. Takes a clear photo of a model and an upper-body garment and renders the garment onto the person using an outfitting-fusion diffusion approach that keeps the garment's texture and the model's pose. A lightweight alternative to IDM-VTON for clothing previews.

productvirtual-try-onvton

PCam Lymph-Node Tumor Detector (ResNet18)

ResNet18 from the TIA Toolbox zoo, trained on the PatchCamelyon (PCam) dataset of lymph-node histology patches from breast-cancer metastasis screening. It performs binary classification of a 96x96 H&E tile as tumor (metastatic tissue present) or normal. Research use only, not a diagnostic device.

PhotoMaker

ImageTencent ARC

Tencent ARC PhotoMaker. Identity-preserving stylized photo generation from a stacked-ID embedding. Realistic re-styling of a subject in seconds.

Playground v2.5 (1024px Aesthetic)

ImagePlayground AI

Playground AI's diffusion model tuned for aesthetics. SDXL-based architecture trained on the EDM formulation, rated by users as more visually pleasing than SDXL in their study. Strong on vivid color and contrast.

replicateplaygroundtext-to-image

Playground v3 (Design)

ImagePlayground AI

Playground's text-to-image model focused on graphic design aesthetics and embedded typography.

playgroundtext-to-imagedesign

Point-E

OpenAI Point-E text-to-point-cloud system. Fast 3D point-cloud generation from text, optionally lifted to a mesh via marching cubes.

replicate3d-generationopenai

Qwen-Image-Edit

ImageAlibaba / Qwen

Alibaba Qwen's instruction-driven image editor. Extends Qwen-Image's text-rendering ability to editing, so it handles both semantic edits (swap objects, change style) and precise text edits inside the image while preserving the original layout and unedited regions.

qwenalibabaimage-edit

Real-ESRGAN 4x

AI-Upscaler that increases image resolution up to 4x while preserving texture and detail. Trained on synthetic and real data to reduce common ESRGAN artifacts.

replicateupscalingimage-restore

Real-ESRGAN Anime 4x

Real-ESRGAN variant fine-tuned for anime, manga, and illustrated artwork. 4x upscaling with cartoon-aware artifact suppression.

replicateupscalinganime

Recraft V3

New

State-of-the-art image generation optimized for design and branding. SVG vector output support.

€0.6012.0s

designvectorbranding

Recraft V3 Realistic

ImageRecraft

Recraft's high-prompt-adherence raster image model. Strong layout control and brand-style consistency.

recrafttext-to-imagedesign

Recraft v3 SVG

Recraft's v3 variant that outputs vector SVG instead of raster pixels. Generates clean, editable logos, icons and illustrations that scale without quality loss, which is unusual among image models. Hosted API only.

replicaterecrafttext-to-image

Recraft V3 SVG

ImageRecraft

Recraft's vector/SVG generation model. Editable illustrations and icons from text.

€0.08

recrafttext-to-svgvector

Recraft V4 SVG

Recraft V4 SVG turns a text prompt into production-ready SVG vector art with clean geometry and structured, editable layers. Newer generation than V3 with improved design quality on logos, icons and flat illustration. Returns true vector paths, not a traced bitmap.

svgvectorrecraft

Rembg

Open-source background-removal tool wrapping U2Net. Produces alpha mattes for photos, products and people with no manual masking.

replicatebackground-removalmatting

Remove Background (lucataco)

Lucataco's remove-bg, a rembg-based background removal model that returns the foreground subject on a transparent background. A popular, low-cost option for quick product and portrait cutouts.

background-removallucatacorembg

Remove Object (LaMa)

Object removal and cleanup using LaMa inpainting. Paint a mask over an unwanted object, logo or person and the model fills the area with plausible background, erasing it from the photo.

background-removalobject-removallama

SDXL Emoji

SDXL fine-tune by fofr trained on Apple emoji art. Generates rounded, glossy emoji and icon style graphics from a text prompt, useful for custom reactions, app glyphs and playful icon sets. Raster output.

replicateemojiicon

SDXL Inpainting

SDXL inpainting built on the Hugging Face Diffusers inpaint pipeline. Replace or remove masked regions of an image with prompt-conditioned content at SDXL resolution. A cheap, well-understood baseline for object removal and local edits.

sdxlstability-aiimage-edit

Shap-E (OpenAI)

OpenAI Shap-E text/image to 3D. Generates implicit neural representations renderable as textured meshes or NeRFs.

replicate3d-generationopenai

Skin Cancer Classifier (Swin, ISIC)

Swin Transformer skin-lesion classifier trained on an ISIC-style skin cancer dataset. Predicts eight lesion classes including melanoma, basal cell carcinoma, squamous cell carcinoma, actinic keratosis, nevus, dermatofibroma, benign keratosis and vascular lesion. Research and educational use only, not for diagnosis.

Skin Cancer Image Classification (ViT, HAM10000)

Vision Transformer fine-tuned on the HAM10000 dermatoscopy dataset. Classifies a skin-lesion image into seven categories: melanoma, melanocytic nevi, basal cell carcinoma, actinic keratoses, benign keratosis-like lesions, dermatofibroma and vascular lesions. Research and educational use only, not a diagnostic tool.

Skin Type Image Detection (ViT)

ViT image classifier by dima806 that labels a facial or skin photo as dry, normal or oily skin type. Aimed at skincare and cosmetics research rather than disease detection, and not a medical diagnostic. Research and educational use only.

SPIDER Colorectal Pathology Classifier

Patch-level colorectal pathology classifier from HistAI, built on the Hibou-L foundation model and trained on the SPIDER colorectal dataset with expert-annotated labels. It classifies a 1120x1120 H&E patch into pathology classes such as high- and low-grade adenocarcinoma, normal mucosa and other tissue types. Research use only, not a diagnostic device.

Stable Diffusion 3.5 Large

Stability AI's 8B MMDiT-based flagship. Open weights at 1MP with improved typography and prompt adherence over SDXL. The largest model in the SD 3.5 release line.

replicatestability-aistable-diffusion

Stable Diffusion 3.5 Large (Stability)

stabilitytext-to-imageopen-weights

Stability AI's 8B-parameter flagship SD3.5 model. Strong prompt adherence and aesthetic quality.

€0.07

Stable Diffusion 3.5 Large Turbo

Distilled, 4-step version of SD 3.5 Large from Stability AI. Keeps most of the large model's quality and text rendering at a fraction of the inference time. Open weights under the Stability Community License.

replicatestability-aistable-diffusion

Stable Diffusion 3.5 Large Turbo

Distilled 4-step variant of SD3.5 Large. 8B params, ~4x faster inference at competitive quality.

stabilitytext-to-imageopen-weights

Stable Diffusion 3.5 Medium

Stability AI's 2.5B-parameter SD3.5 with strong quality/speed trade-off. Consumer-GPU friendly.

stabilitytext-to-imageopen-weights

Stable Diffusion XL

Stability AI's SDXL model via Replicate. High-quality image generation with extensive customization.

€0.208.0s

open-sourcecustomizable

StarVector 8B (image-to-SVG)

StarVector 8B is a multimodal model that generates SVG code directly from an input image. Rather than tracing pixels, it predicts the SVG markup token by token, which can produce compact, semantically structured paths for icons and simple graphics. Research model from the StarVector project.

svgvectorstarvector

SUPIR

SUPIR by Fanghua Yu et al. is a large diffusion-based restoration model that recovers photorealistic detail from heavily degraded images and can be steered with a text prompt describing the scene.

restoresuper-resolutionsupir

SUPIR Upscaler

replicateupscalingimage-restore

SUPIR (Scaling-Up Image Restoration) photo-real restoration model. Combines SDXL prior with language-guided controls for severely degraded inputs.

€0.06

Swin2SR

replicateupscalingtransformer

Transformer-based image super-resolution using Swin-V2 attention. Handles classical, lightweight, real-world, and compressed-input variants with 2x/4x upscaling.

€0.002

TRELLIS (3D)

Microsoft TRELLIS image-to-3D model. Generates textured 3D assets in GLB or Gaussian-splat format from a single reference image.

€0.18

TripoSR

Stability AI and Tripo single-image 3D reconstruction model. Generates 3D meshes from a single image in roughly half a second.

Vectorizer (VTracer)

PNG/JPG to SVG vectorizer built on VTracer, the open-source raster-to-vector engine. Traces a bitmap into layered color regions and clean paths with controls for color count, area threshold and path simplification. Fast, deterministic alternative to model-based vectorizers.

svgvectorvtracer

ViT Brain Tumor MRI Classifier

ViT base fine-tuned on brain MRI slices to classify tumor type. Given an MRI image it returns scores for glioma, meningioma, pituitary tumor or no tumor. Most-downloaded brain-tumor classifier in this search. Research and education only, not a diagnostic tool.

ViT Chest X-ray Pneumonia

Vision Transformer fine-tuned on the Kaggle chest x-ray pneumonia dataset. Given a frontal chest radiograph it predicts NORMAL versus PNEUMONIA with class scores. A widely used baseline for pneumonia screening experiments. Research and education only, not a diagnostic tool.

ViT COVID-19 CT Scan Classifier

ViT base (patch16-224, ImageNet-21k pretrained) fine-tuned on lung CT scans to flag COVID-19 findings. Takes a CT slice image and returns COVID versus non-COVID class scores. Built for research on CT-based COVID screening. Research and education only, not a diagnostic tool.

ViT Diabetic Retinopathy Grading

Vision Transformer fine-tuned on retinal fundus photographs to grade diabetic retinopathy severity. Given a fundus image it returns scores across the five-level scale (0 no DR through 4 proliferative DR). Most-downloaded retinopathy classifier in this search. Research and education only, not a diagnostic tool.

ViT HAM10000 Sharpened Skin Lesion Classifier

ViT-base classifier fine-tuned on HAM10000 dermatoscopy images with a sharpening preprocessing step. Predicts the seven standard HAM10000 lesion classes (akiec, bcc, bkl, df, mel, nv, vasc) for a single skin-lesion image. Research and educational use only, not a medical diagnostic.

White Blood Cell Classifier (ViT)

Vision-transformer classifier for peripheral-blood smear images. It labels a single white-blood-cell crop as one of four leukocyte types: eosinophil, lymphocyte, monocyte or neutrophil. Trained on a public blood-cell image dataset and meant for research and teaching, not clinical hematology.