Image Generation
Create stunning visuals with state-of-the-art AI models
Modelos de geração de imagem para produto, marketing e design
Os modelos de imagem transformam um prompt de texto — e, opcionalmente, uma imagem de referência ou uma máscara — num raster final. A categoria abrange tudo, desde fotografias de produto fotorrealistas a ilustrações em estilo vetorial e inpainting/outpainting controláveis. Recorre-se a um modelo de imagem quando se precisa de visuais on-brand em escala, quando a fila de um designer é o estrangulamento ou quando se quer lançar uma funcionalidade generativa dentro do próprio produto.
Ao contrário dos modelos de texto, a geração de imagem é faturada por chamada e não por token. Uma imagem 1024×1024 custa entre meio cêntimo (open-weights SDXL turbo) e quinze cêntimos (Imagen ou FLUX Pro topo de gama). Resoluções mais altas e mais passos de difusão custam proporcionalmente mais. Alguns fornecedores expõem um endpoint de edit separado com tarifa diferente; verifique o cartão do modelo antes de integrar.
O compromisso central é fotorrealismo versus controlo. Os flagships de difusão (FLUX 1.1 Pro, Imagen 3, Recraft V3) produzem output de qualidade de revista mas ignoram instruções composicionais detalhadas cerca de metade das vezes. Os modelos mais pequenos (SDXL, Playground V3, Stable Diffusion 3.5) custam dez vezes menos, renderizam em menos de dois segundos e permitem guiar o resultado com ControlNet, IP-Adapter ou LoRA. Para produção em batch, o pipeline mais pequeno e direcionável ganha quase sempre; para hero shots pontuais, recorra ao flagship.
Atenção à diluição de contexto nos prompts de imagem: a maior parte dos modelos de difusão limita o comprimento útil do prompt a cerca de 75 tokens, pelo que enfiar doze adjetivos e três referências de estilo normalmente faz a média de tudo em vez de empilhar. Escreva primeiro o sujeito, a ação e a iluminação; tudo o que vier depois da terceira cláusula tem influência decrescente no resultado.
As licenças importam: a maior parte dos fornecedores concede licença comercial perpétua sobre as imagens geradas, mas alguns (tier gratuito do FLUX Schnell, alguns checkpoints open) restringem ao uso não comercial. O cartão do modelo explica-o — leia antes de pôr o output num outdoor.
As top picks abaixo cobrem o flagship de fotorrealismo, o cavalo de batalha mais barato, o modelo com o prompt mais longo e a opção mais rápida em tempo real da categoria.
55 models available
Flux 1.1 Pro Ultra
FLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.
Flux Dev
Black Forest Labs' development model. Fast, high-quality image generation with LoRA support.
Google Imagen 4
Google's Imagen 4. Text-to-image with strong photorealism and improved typography support.
Google Imagen 4 Ultra
Premium Imagen 4 tier. Highest fidelity, prompt adherence and typography quality from Google.
Ideogram 3.0
Ideogram's flagship text-to-image model with industry-leading text rendering and prompt adherence.
Midjourney V7
The latest Midjourney model. Industry-leading aesthetic quality and prompt adherence for image generation.
AuraFlow v0.3
fal.ai's fully open-source 6.8B flow-based text-to-image model. Up to 1536x1536 resolution.
BRIA RMBG-1.4
BRIA's first commercial-safe background-removal model. Trained on fully-licensed data, suitable for production e-commerce and design pipelines.
BRIA RMBG-2.0
BRIA's professional background-removal model trained on fully-licensed data. Commercial-safe.
CCSR (Content-Consistent SR)
Content-Consistent Super-Resolution model. Reduces hallucination compared to typical diffusion-based upscalers while keeping perceptual quality high.
Clarity Upscaler
High-resolution image upscaler with creative detail re-imagination via SD-based hallucination. Strong for photography and product shots.
CodeFormer
Robust face-restoration model using a transformer-based codebook prior. Handles severe degradation, occlusion, and old-photo restoration with adjustable fidelity-quality tradeoff.
ControlNet Canny
ControlNet conditioned on Canny edge maps. Preserves composition and outlines while restyling with Stable Diffusion 1.5 or SDXL backbones.
ControlNet Depth
ControlNet conditioned on depth maps. Preserves the 3D scene layout while letting the prompt change style, lighting and content.
DALL-E 3
OpenAI's latest image generation model. Excellent at following complex prompts with high fidelity.
DreamGaussian
Generative Gaussian-splatting model for fast image-to-3D synthesis. Produces textured meshes in two minutes via differentiable rasterization.
ESRGAN Classic
Enhanced Super-Resolution GAN, the original 2018 architecture. Produces sharp 4x upscales with strong perceptual quality on natural images.
Flux Schnell
The fastest Flux model. Generate images in under 2 seconds. Great for prototyping.
FLUX.1 [Schnell]
Black Forest Labs' fastest open-weights image model. Apache-2.0 licensed, ~1-4 step inference.
FLUX.1 Canny
FLUX structural control via Canny edge maps. Preserve composition while restyling.
FLUX.1 Depth
FLUX structural control via depth maps. Keep 3D scene layout while changing style/content.
FLUX.1 Fill
Black Forest Labs' inpainting/outpainting model for FLUX. Fill masked regions with prompt-guided content.
FLUX.1 Redux
FLUX image-variation adapter. Generate variations and remixes from a reference image.
Get3D (NVIDIA)
NVIDIA GET3D generative model for textured 3D shapes. Trained on category-specific datasets producing meshes with high-quality textures.
GFPGAN v1.4
Tencent ARC face-restoration GAN. Reconstructs realistic facial detail in low-quality or compressed photos using a pretrained StyleGAN2 prior.
Hunyuan3D 2.0
Tencent's Hunyuan3D 2.0 image-to-3D pipeline. Two-stage shape and texture generation producing high-resolution textured meshes.
Hunyuan3D 2.1
Refreshed Hunyuan3D 2.1 with improved texture fidelity and PBR-material support. Image-to-3D with textured GLB output.
Ideogram 2.0 Turbo
Ideogram's fast text-to-image variant. Strong typography and logo rendering at low latency.
InstantMesh
Image-to-3D mesh generator from sparse-view diffusion. Produces textured meshes in under one minute on a single A100.
InstructPix2Pix
Berkeley InstructPix2Pix. Edits an image from natural-language instructions in a single forward pass. Trained on GPT-3 plus Stable Diffusion synthetic pairs.
IP-Adapter FaceID Plus v2
Tencent's face-identity conditioning adapter for SD/SDXL. Face embedding + CLIP for ID-consistent generation.
Janus Pro 7B
DeepSeek's unified multimodal model. Decouples vision encoding for both understanding and generation tasks.
Kuaishou Kolors
Kuaishou's bilingual (CN/EN) latent diffusion text-to-image model with strong text rendering.
Magnific-Style Upscaler
Detail-hallucinating upscaler in the Magnific style. Adds plausible high-frequency texture using a Stable Diffusion refiner conditioned on the low-res input.
PhotoMaker
Tencent ARC PhotoMaker. Identity-preserving stylized photo generation from a stacked-ID embedding. Realistic re-styling of a subject in seconds.
Playground v3 (Design)
Playground's text-to-image model focused on graphic design aesthetics and embedded typography.
Point-E
OpenAI Point-E text-to-point-cloud system. Fast 3D point-cloud generation from text, optionally lifted to a mesh via marching cubes.
Real-ESRGAN 4x
AI-Upscaler that increases image resolution up to 4x while preserving texture and detail. Trained on synthetic and real data to reduce common ESRGAN artifacts.
Real-ESRGAN Anime 4x
Real-ESRGAN variant fine-tuned for anime, manga, and illustrated artwork. 4x upscaling with cartoon-aware artifact suppression.
Recraft V3
State-of-the-art image generation optimized for design and branding. SVG vector output support.
Recraft V3 Realistic
Recraft's high-prompt-adherence raster image model. Strong layout control and brand-style consistency.
Recraft V3 SVG
Recraft's vector/SVG generation model. Editable illustrations and icons from text.
Rembg
Open-source background-removal tool wrapping U2Net. Produces alpha mattes for photos, products and people with no manual masking.
Shap-E (OpenAI)
OpenAI Shap-E text/image to 3D. Generates implicit neural representations renderable as textured meshes or NeRFs.
Stable Diffusion 3.5 Large (Stability)
Stability AI's 8B-parameter flagship SD3.5 model. Strong prompt adherence and aesthetic quality.
Stable Diffusion 3.5 Large Turbo
Distilled 4-step variant of SD3.5 Large. 8B params, ~4x faster inference at competitive quality.
Stable Diffusion 3.5 Medium
Stability AI's 2.5B-parameter SD3.5 with strong quality/speed trade-off. Consumer-GPU friendly.
Stable Diffusion XL
Stability AI's SDXL model via Replicate. High-quality image generation with extensive customization.
SUPIR Upscaler
SUPIR (Scaling-Up Image Restoration) photo-real restoration model. Combines SDXL prior with language-guided controls for severely degraded inputs.
Swin2SR
Transformer-based image super-resolution using Swin-V2 attention. Handles classical, lightweight, real-world, and compressed-input variants with 2x/4x upscaling.
T2I-Adapter Color
Tencent T2I-Adapter color-guided generation for SDXL. Lightweight adapter that conditions image generation on a color reference image.
Transparent Background
PyTorch background-removal tool supporting multiple modes: base, fast and high-quality. Produces RGBA outputs and is suitable for batch processing.
TRELLIS (3D)
Microsoft TRELLIS image-to-3D model. Generates textured 3D assets in GLB or Gaussian-splat format from a single reference image.
TripoSR
Stability AI and Tripo single-image 3D reconstruction model. Generates 3D meshes from a single image in roughly half a second.
U2Net Saliency
Salient-object detection network used for background removal and matting. Nested U-Net architecture trained on DUTS-TR for general scenes.
Top image generation picks
Hand-picked across four common criteria — resolved against the live catalog so the picks track price and performance changes.
FLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.
Learn morefal.ai's fully open-source 6.8B flow-based text-to-image model. Up to 1536x1536 resolution.
Learn moreFLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.
Learn moreThe fastest Flux model. Generate images in under 2 seconds. Great for prototyping.
Learn morePopular use cases
Common patterns built with image generation on Railwail.
Related comparisons
Side-by-side reviews of the most-compared models in this category.
Frequently asked questions
Start Building with AI
Access all models through a single API. Get free credits when you sign up — no credit card required.