AI Models
Browse and explore all available AI models.
184 models available
Claude Code
Anthropic's specialized coding agent. Autonomous code writing, debugging, and refactoring with deep codebase understanding.
Claude Opus 4
Anthropic's most powerful model. Exceptional at complex analysis, nuanced writing, math, and coding. Sets new benchmarks across evaluation suites.
Claude Sonnet 4
Anthropic's balanced model offering excellent performance at a lower cost than Opus. Great all-rounder for production workloads.
Cursor (GPT-4o)
AI-powered code editor backed by GPT-4o. Inline code completion, chat-based editing, and codebase-aware suggestions.
DALL-E 3
OpenAI's latest image generation model. Creates highly detailed, accurate images from text descriptions with excellent prompt adherence.
DeepSeek R1
DeepSeek's reasoning model trained with reinforcement learning. Performs chain-of-thought reasoning, rivaling OpenAI's o1 on math and science benchmarks.
DeepSeek V3
DeepSeek's flagship 671B MoE model. Competitive with GPT-4o on many benchmarks. Exceptional at coding, math, and Chinese language tasks.
ElevenLabs Multilingual v2
ElevenLabs' most capable TTS model. Natural-sounding speech in 29 languages with emotion control and voice cloning.
Flux 1.1 Pro
Black Forest Labs' most capable image model. Photorealistic outputs with exceptional text rendering and prompt following.
Flux Pro 1.1 Ultra
Black Forest Labs' highest resolution Flux model. Generates up to 4MP images with exceptional detail and prompt adherence.
Gemini 2.0 Flash (Multimodal)
Google's multimodal model accepting text, images, audio, and video. Native multimodal understanding across input types.
Gemini 2.5 Pro
Google's most advanced thinking model with built-in reasoning capabilities. Excels at complex tasks requiring multi-step reasoning.
GitHub Copilot
GitHub's AI pair programmer. Real-time code suggestions, chat assistance, and PR reviews powered by OpenAI models.
GPT-4.5 Preview
OpenAI's latest frontier model with improved reasoning, creativity, and instruction following. Significant improvements over GPT-4o.
GPT-4o
OpenAI's most capable multimodal model. Accepts text and image inputs, produces text outputs. Excellent for complex reasoning, creative writing, and analysis.
GPT-4o (Vision)
GPT-4o's vision capabilities. Analyze images, charts, documents, and screenshots with detailed understanding and reasoning.
GR00T N1
NVIDIA's foundation model for humanoid robots. World-model-based VLA enabling whole-body control and human-like manipulation.
Grok 3
xAI's most powerful model. Trained on massive compute with strong reasoning, humor, and real-time knowledge from X (Twitter).
Llama 3.3 70B
Meta's latest 70B model delivering performance comparable to Llama 3.1 405B at a fraction of the cost. Excellent open-source option.
Llama 4 Maverick
Meta's powerful Llama 4 Maverick model. A larger, more capable variant with strong reasoning, creative writing, and multilingual abilities.
Luma Ray2
Luma AI's latest video generation model. High-quality, realistic video creation with excellent motion dynamics.
Midjourney v6.1
Midjourney's latest model known for stunning artistic quality. Excels at creative, aesthetic images with a distinctive artistic style.
o1
OpenAI's reasoning model that thinks before answering. Uses chain-of-thought to solve complex math, science, and coding problems.
OpenVLA
Open-source 7B Vision-Language-Action model built on Prismatic VLM and Llama 2. Converts visual observations and language goals into robot actions.
Pi0
Physical Intelligence's foundation model for robot control. Combines vision-language understanding with dexterous manipulation across diverse tasks.
RT-2
Google DeepMind's Robotic Transformer 2. Vision-Language-Action model that translates visual observations and language instructions directly into robot actions.
Runway Gen-3 Alpha
Runway's latest video generation model. Professional-quality video creation with fine-grained control over motion and style.
Sora
OpenAI's video generation model. Creates realistic and imaginative videos from text prompts with impressive temporal coherence.
Suno v4
Suno's latest music AI. Generates complete songs with lyrics, vocals, and instrumentation. Supports many genres and custom lyrics.
text-embedding-3-large
OpenAI's most capable embedding model. 3072 dimensions with excellent retrieval performance for RAG and semantic search.
Udio
AI music generation platform creating full songs with vocals and lyrics from text descriptions. Wide range of genres and styles.
Veo 2
Google DeepMind's video generation model. Creates high-fidelity, 1080p videos with strong understanding of physics and motion.
Wan 2.1 14B
High-quality video generation from Wan AI. 14B parameter model creating detailed, coherent videos from text.
Whisper Large v3
OpenAI's state-of-the-art speech recognition model. Supports 100+ languages with exceptional accuracy for transcription and translation.
AnimagineXL 3.1
Anime-focused SDXL fine-tune. High-quality anime and manga-style illustration generation.
AnimateDiff
Animate images and create short video animations. Works with existing SD models for animated content.
AssemblyAI Universal-2
AssemblyAI's latest speech model. Excellent accuracy across accents and noisy environments with built-in speaker diarization.
AudioLDM 2
Audio generation from text descriptions. Create sound effects, music, and ambient audio from natural language.
AuraFlow
Open-source flow-based image generation model. Lightweight with fast generation and good quality output.
Bark
Suno's open-source TTS model. Generate realistic speech with laughter, music, and sound effects from text.
BGE-M3
BAAI's versatile embedding model supporting dense, sparse, and multi-vector retrieval. Open-source and highly effective.
BLIP-2
Salesforce's image captioning model. Generate detailed descriptions of images with natural language.
Clarity Upscaler
AI image upscaler with creative enhancement. Adds detail and clarity while upscaling images up to 4x.
Claude 3.5 Haiku
Anthropic's fastest and most affordable model. Ideal for high-volume tasks, customer support, and quick responses.
Claude 3.5 Sonnet
Previous generation balanced model from Anthropic. Still excellent for many tasks including coding, analysis, and creative writing.
Claude 3.5 Sonnet (Vision)
Claude's vision capabilities. Excellent at analyzing images, documents, and code screenshots with detailed, accurate descriptions.
CLIP Interrogator
Generate text prompts from images. Reverse-engineer prompts that could reproduce a given image.
CodeFormer
AI face restoration model. Restore severely degraded face photos with high fidelity.
CodeLlama 70B
Meta's largest code-specialized Llama model. Trained on code-heavy data with strong performance on code generation and infilling.
Codestral
Mistral's dedicated code model. Trained specifically for code generation, completion, and understanding across 80+ programming languages.
CogVideoX
Open-source video generation model from Tsinghua University. Generates coherent videos from text with strong temporal consistency.
CogVideoX-5B
Tsinghua's 5B parameter video model. Creates coherent text-to-video with strong temporal consistency.
CogVLM
Powerful visual language model from Tsinghua. Deep image understanding with detailed visual reasoning.
Cohere Embed v3
Cohere's multilingual embedding model. Supports 100+ languages with separate search and classification modes.
Command R
Cohere's efficient model optimized for RAG and tool use. Great balance of quality and cost for production deployments.
Command R+
Cohere's flagship model for enterprise RAG applications. Excellent at retrieval-augmented generation, summarization, and multi-step tasks.
Consistent Character
Generate consistent character images across different poses and scenes from a single reference.
ControlNet SDXL
Controlled image generation with SDXL. Use depth, pose, canny edge, and other control signals.
DBRX Instruct
Databricks' open-source MoE model with 132B total parameters. Strong at enterprise tasks, SQL, and data-related queries.
Deepgram Nova 2
Deepgram's most accurate ASR model. Optimized for real-time transcription with industry-leading word error rates.
DeepSeek Coder V2
DeepSeek's dedicated coding model. Specialized for code generation, completion, and debugging across many programming languages.
Depth Anything V2
State-of-the-art monocular depth estimation. Generate accurate depth maps from single images.
Dolphin 2.5 Mixtral
Uncensored Mixtral fine-tune. Open-ended assistant without content restrictions for research purposes.
DreamShaper XL
Versatile fine-tuned SDXL model. Excels at both realistic and stylized image generation with rich details.
ElevenLabs Turbo v2.5
Low-latency TTS model from ElevenLabs. Optimized for real-time applications with natural-sounding output.
Face to Sticker
Convert face photos to cartoon stickers. Fun, expressive sticker generation from selfies.
Florence 2
Microsoft's foundation vision model. Object detection, captioning, segmentation, and OCR in one model.
Flux Canny
Edge-conditioned image generation. Use canny edge detection maps to control the structure of generated images.
Flux Depth
Depth-conditioned image generation from Black Forest Labs. Generate images with precise 3D structural control.
Flux Dev
Development version of Flux with high quality generation. Open-weight model suitable for fine-tuning and customization.
Flux Fill
Inpainting and outpainting model from Black Forest Labs. Edit and extend images seamlessly with text guidance.
Flux Redux
Image variation and remix model. Create new images inspired by reference images with text-guided modifications.
Flux Schnell
Ultra-fast image generation from Black Forest Labs. Generates images in under 2 seconds while maintaining good quality.
Frame Interpolation (FILM)
Google's frame interpolation model. Create smooth slow-motion by generating intermediate frames.
Gemini 2.0 Flash
Google's fast, versatile multimodal model. Supports text, images, audio, and video inputs. Great balance of speed and capability.
Gemini 2.0 Flash Lite
Google's most cost-efficient model. Optimized for high-volume, lower-complexity tasks with excellent throughput.
Gemini Robotics
Google DeepMind's Gemini model adapted for robotics. Leverages Gemini's multimodal understanding for zero-shot robot task planning and execution.
Gemma 2 27B
Google's open-source 27B model. Strong performance in reasoning and text generation, built with Google's research expertise.
Gemma 2 9B
Compact open-source model from Google. Excellent for on-device deployment and resource-constrained environments.
Gemma 2 9B (Replicate)
Google's compact open model on Replicate. Efficient 9B model with strong general capabilities.
GFPGAN
Face restoration model. Fix and enhance degraded face photos with realistic detail recovery.
GPT-4o Mini
Small, fast, and affordable model from OpenAI. Great for lightweight tasks like classification, summarization, and simple Q&A.
Grok 2
xAI's previous flagship model. Known for its witty personality, strong reasoning, and ability to handle nuanced questions.
Grok 3 Mini
Smaller, faster version of Grok 3. Excellent for quick responses and lower-cost applications while maintaining strong capabilities.
Grounding DINO
Open-set object detection with text prompts. Detect any object by describing it in natural language.
Helix
Figure AI's VLA model powering their humanoid robots. Combines language understanding with full-body motion planning for household and industrial tasks.
Hunyuan Video
Tencent's open-source video generation model. Creates high-quality videos with strong motion coherence.
Ideogram 2.0
Ideogram's latest model excelling at typography and text in images. Best-in-class text rendering in generated images.
Ideogram V2 Turbo
Fast version of Ideogram's text-rendering model. Quick generation with excellent typography capabilities.
Illusion Diffusion
Create optical illusion images. Generate images that contain hidden patterns and visual tricks.
Img2Img SDXL
Image-to-image translation with SDXL. Transform and modify existing images with text guidance.
Incredibly Fast Whisper
Optimized Whisper model for ultra-fast transcription. 10x faster than standard Whisper with comparable accuracy.
InstantID
Zero-shot identity-preserving generation. Create images of a person in any style using just one reference photo.
InternVL 2
Open-source vision-language model rivaling GPT-4V. Strong visual understanding across diverse domains.
IP-Adapter FaceID
Face-preserving image generation using IP-Adapter. Generate images that maintain facial identity from reference photos.
Jina Embeddings v3
Jina AI's latest embedding model with task-specific adapters. Supports flexible dimensions and multiple retrieval tasks.
Juggernaut XL
Popular fine-tuned SDXL model. Known for photorealistic outputs, especially portraits and landscapes.
Kandinsky 2.2
Open-source multilingual text-to-image model. Supports prompts in multiple languages with good creative output.
Kling 1.5
Kuaishou's video generation model. Creates high-quality videos with good motion consistency and diverse styles.
Kolors
Kwai's photorealistic image generation model. Strong at generating realistic human portraits and scenes.
Llama 3.1 405B
Meta's largest open-source model. 405 billion parameters delivering frontier-class performance on reasoning, coding, and multilingual tasks.
Llama 3.1 70B
Meta's highly capable 70B open-source model. Great balance of performance and efficiency for a wide range of tasks.
Llama 3.1 70B (Replicate)
Meta's popular 70B model on Replicate. Strong all-around performance for chat, coding, and reasoning.
Llama 3.1 8B
Meta's compact 8B model. Surprisingly capable for its size, perfect for fast inference, edge deployment, and cost-sensitive applications.
Llama 3.1 8B (Replicate)
Efficient 8B Llama model on Replicate. Fast and affordable for straightforward tasks.
Llama 3.2 11B Vision
Compact multimodal Llama 3.2. Vision-language model for efficient image understanding and text generation.
Llama 3.2 1B
Smallest Llama model for on-device inference. 1B parameters, ideal for mobile and IoT applications.
Llama 3.2 3B
Ultra-compact Llama model for edge deployment. 3B parameters with surprising capability for its size.
Llama 3.2 90B Vision
Meta's multimodal Llama 3.2. 90B parameter model with native image understanding and text generation.
Llama 4 Scout
Meta's next-generation Llama 4 model optimized for efficiency. Built on a new architecture with improved reasoning and instruction following.
LLARVA
Vision-Language-Action model using LLM backbones for structured robot action prediction. Bridges language models and low-level robot control.
LLaVA 1.6 34B
Open-source multimodal model combining language and vision. Strong visual understanding with conversational capabilities.
LLaVA v1.6 13B
Open-source multimodal model. Analyze and describe images with natural language understanding.
Logo Generator SDXL
Logo and icon generation using fine-tuned SDXL. Create professional logos and brand assets.
LTX Video
Lightweight text-to-video model. Fast generation with reasonable quality for prototyping and previews.
Luma Dream Machine
Luma AI's video generation model. Creates dreamy, cinematic videos with excellent visual quality and creative flexibility.
Material Transfer
Transfer materials and textures between images. Apply the material of one image onto objects in another.
Minimax Image
Minimax's text-to-image model. High-quality image generation with strong prompt understanding.
Minimax Video-01
Minimax's video generation model supporting up to 720p resolution. Good for short-form video content creation.
MiniMax Video-01 (Replicate)
Minimax's video model on Replicate. Generate short videos from text descriptions.
Mistral Large 2
Mistral's most capable model. 123B parameters with strong reasoning, multilingual support, and function calling. Great for complex enterprise tasks.
Mistral Medium
Mid-range model from Mistral AI offering a good balance of performance and cost for most business applications.
Mistral Nemo
12B open-weight model by Mistral and NVIDIA. Compact but capable, ideal for on-device or self-hosted deployments.
Mistral Small
Mistral's efficient small model. Fast and cost-effective for straightforward tasks like classification, text generation, and RAG.
Mixtral 8x7B
Mistral's MoE model with 8 experts. Strong performance with efficient inference using sparse architecture.
Mochi 1
Genmo's video generation model. Creative video generation with artistic style flexibility.
Moondream 2
Tiny but capable vision-language model. Only 1.8B params yet surprisingly good at image understanding.
MusicGen Large
Meta's open-source music generation model. Creates high-quality music from text descriptions with control over style, tempo, and instruments.
MusicGen Melody
Meta's music generation with melody conditioning. Create music that follows a reference melody while matching text descriptions.
MusicGen Stereo Large
Stereo music generation from Meta. Creates high-quality stereo music tracks from text descriptions.
Nous Hermes 2 Mixtral
Nous Research fine-tune of Mixtral. Enhanced instruction following and conversational quality.
o1 Mini
Smaller, faster version of OpenAI's o1 reasoning model. Optimized for STEM tasks with lower latency and cost.
o3 Mini
OpenAI's latest small reasoning model. Highly efficient chain-of-thought reasoning with excellent cost-performance ratio.
OCR with GPT-4o
Accurate text extraction from images using GPT-4o vision. Extract text, tables, and structured data.
Octo
Open-source generalist robot policy from UC Berkeley. Supports multiple robot embodiments and can be fine-tuned for new tasks with minimal data.
OpenAI TTS-1
OpenAI's standard TTS model. Fast and affordable text-to-speech synthesis with good quality for most applications.
OpenAI TTS-1 HD
OpenAI's high-definition text-to-speech model. Natural, human-like voice synthesis with 6 preset voices.
Outpainting SDXL
Extend images beyond their borders. Seamlessly expand the canvas of any image with AI-generated content.
Phi-4
Microsoft's small but mighty 14B model. Punches well above its weight class on reasoning, math, and coding benchmarks.
PhotoMaker v2
Customizable realistic photo generation. Create photos of specific people in different scenes and styles.
Pi0.5
Physical Intelligence's latest VLA model with improved generalization. Handles complex multi-step manipulation tasks with fewer demonstrations.
Pika 2.0
Pika's latest video model with improved motion quality and generation speed. User-friendly interface for video creation.
PixArt Sigma
Efficient transformer-based image generation. High-quality 4K images with excellent text rendering capabilities.
Pixtral Large
Mistral's vision-language model. 124B parameters with native image understanding, document analysis, and visual reasoning.
Playground v2.5
Playground AI's aesthetic-focused model. Trained for beautiful, photorealistic images with excellent color and composition.
Playground v3
Playground AI's latest model focused on photorealistic image generation with strong aesthetic quality and prompt adherence.
QR Code Generator
AI-powered artistic QR code generator. Create beautiful, functional QR codes with custom designs.
Qwen 2.5 72B
Alibaba's flagship 72B model. Exceptional at Chinese and English tasks, strong coding abilities, and competitive with leading closed-source models.
Qwen 2.5 7B
Compact 7B model from Alibaba's Qwen series. Fast and efficient while maintaining strong multilingual and coding capabilities.
Qwen VL Plus
Alibaba's vision-language model. Strong at document understanding, charts, and multilingual visual QA.
Qwen2.5-Coder 32B
Alibaba's specialized coding model. Strong performance on code benchmarks with support for many programming languages.
QwQ 32B
Alibaba's reasoning model. Uses chain-of-thought to solve complex math, logic, and coding problems. Open-weight alternative to o1.
Real-ESRGAN
Powerful image upscaler using enhanced ESRGAN. Upscale images 2-4x with excellent detail preservation and artifact removal.
Realistic Vision XL
SDXL fine-tune optimized for photorealism. Creates stunning realistic photos from text descriptions.
Recraft V3
Recraft's SVG and design-focused generation model. Creates vector graphics, icons, and design assets from text descriptions.
Remove Background
Automatic background removal. Remove backgrounds from images with high accuracy and clean edges.
Riffusion
Real-time music generation through spectrograms. Create music by interpolating between text prompts.
RoboFlamingo
Robotics adaptation of the Flamingo vision-language model. Few-shot learning for robot tasks using language-conditioned visuomotor policies.
RT-X
Cross-embodiment robotic foundation model from the Open X-Embodiment collaboration. Trained on data from 22 robot types for generalized manipulation.
SDXL Lightning
Distilled SDXL model for ultra-fast generation. Creates high-quality images in 1-4 steps.
Segment Anything (SAM)
Meta's universal image segmentation model. Automatically detect and segment any object in images.
Snowflake Arctic
Snowflake's enterprise-focused LLM. Optimized for SQL generation, data analysis, and enterprise coding tasks.
SpatialVLA
VLA model with explicit 3D spatial reasoning. Uses depth perception and spatial understanding for more precise robotic manipulation.
Stable Audio
Stability AI's audio generation model. Creates music and sound effects from text prompts with customizable duration.
Stable Audio Open
Stability AI's open-source audio model. Generate music and sound effects up to 47 seconds.
Stable Diffusion 3 Medium
Stability AI's efficient SD3 model. Good balance of quality and speed for general-purpose image generation.
Stable Diffusion 3.5 Large
Stability AI's latest open-source image model. 8B parameter model with improved prompt adherence, typography, and photorealism.
Stable Diffusion 3.5 Large Turbo
Accelerated version of SD3.5 Large. Few-step generation for near real-time image creation.
Stable Diffusion 3.5 Medium
Mid-size variant of SD3.5. Faster generation while maintaining strong visual quality and prompt adherence.
Stable Diffusion XL
Stability AI's popular SDXL model. Widely adopted, extensive community support, and thousands of fine-tuned variants available.
Stable Video Diffusion
Stability AI's video generation model. Create short video clips from text or image prompts.
StarCoder2 15B
BigCode's open-source code model trained on The Stack v2. Supports 600+ programming languages with strong completion quality.
Style Transfer
Apply artistic styles from one image to another. Neural style transfer for creative image transformation.
SUPIR Upscaler
State-of-the-art AI image upscaler. Practice image restoration and upscaling with incredible detail generation.
SwinIR
Swin Transformer-based image restoration. Denoising, super-resolution, and JPEG artifact removal.
text-embedding-3-small
OpenAI's efficient embedding model. 1536 dimensions with strong performance at lower cost, ideal for most use cases.
Tortoise TTS
High-quality multi-speaker TTS. Generates natural speech with voice cloning capabilities from short reference clips.
Video Upscaler
AI-powered video upscaling. Enhance video resolution up to 4x with detail preservation.
Wan 2.1 1.3B
Lightweight video generation model. Fast text-to-video generation suitable for quick previews and prototyping.
Wan 2.1 Image-to-Video
Animate still images into videos. Transform a single image into a dynamic video sequence.
Whisper (Replicate)
OpenAI's Whisper model on Replicate. Transcribe audio in 100+ languages with word-level timestamps.
Whisper Diarize
Whisper with speaker diarization. Transcribe conversations and identify individual speakers.
XTTS-v2
Coqui's cross-lingual TTS model. Generate speech in 17 languages using voice cloning from a short reference clip.
Yi Lightning
01.AI's fast inference model. Optimized for speed with competitive quality, ideal for real-time applications.
Start Building with AI
Access all models through a single API. OpenAI-compatible, no vendor lock-in.