IP-Adapter FaceID Plus v2
Tencent's face-identity conditioning adapter for SD/SDXL. Face embedding + CLIP for ID-consistent generation.
IP-Adapter FaceID Plus v2 is image generation AI model from Replicate, priced at €0.000 per 1M input tokens with a unknown context window.
Pricing
API Integration
Use our OpenAI-compatible API to integrate IP-Adapter FaceID Plus v2 into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
const images = await rw.run("ip-adapter-faceid-plus-v2", "A beautiful sunset over Tokyo");
console.log(images[0].url);
// Or use the image() method for full control
const res = await rw.image("ip-adapter-faceid-plus-v2", "A cat in space", {
size: "1024x1024",
n: 1,
});
console.log(res.data[0].url);Deep dive — Tencent (ARC Lab)'s IP-Adapter FaceID Plus v2
IP-Adapter is a family of open-source image-prompt adapters developed by the Applied Research Center (ARC Lab) at Tencent. The original IP-Adapter paper (Ye et al. 2023, arXiv 2308.06721) introduced a decoupled cross-attention scheme that lets a pretrained text-to-image diffusion model accept image prompts as a parallel conditioning channel without retraining the backbone. The FaceID family extends this idea with face-identity embeddings (extracted by InsightFace's ArcFace recogniser) instead of CLIP image embeddings, achieving high identity preservation. IP-Adapter FaceID Plus v2 is a refined version released by author Hu Ye in late 2023 / early 2024 combining ArcFace identity vectors with CLIP image features for better likeness and prompt control. The code and weights are released open-source on GitHub (tencent-ailab/IP-Adapter).
Visit Tencent (ARC Lab) →IP-Adapter FaceID Plus v2 is an adapter module attached to a frozen Stable Diffusion XL (or SD 1.5) backbone. The architecture is based on the IP-Adapter design (Ye et al. 2023): a small MLP projector maps an external image embedding into the cross-attention space of the diffusion U-Net, and a parallel set of cross-attention layers is added so that text and image conditioning operate on decoupled keys/values. The 'FaceID' line replaces the CLIP image encoder with InsightFace's ArcFace face recogniser, which produces a 512-dim identity embedding focused on facial geometry rather than appearance. The 'Plus v2' variant combines the ArcFace embedding with a CLIP image embedding of the face crop, giving the model both identity (from ArcFace) and pose/lighting cues (from CLIP). At inference the user supplies one or more reference face images plus a text prompt, and the adapter steers the diffusion process toward a portrait that matches the identity while following the prompt. Training used a large face dataset with paired identity vectors and aligned face crops.
- Parameters
- ~100M adapter parameters on top of a frozen SDXL or SD1.5 backbone
- Context
- 75 tokens
- High-quality face-identity preservation from one reference image
- Works on top of SDXL and SD 1.5 with frozen weights
- Compatible with LoRAs, ControlNets and other adapters
- Open-source under non-commercial / academic terms
- Strong prompt control for clothing, scene and style while preserving face
- Used as the backbone for many consumer 'AI photo' apps in 2024
- Best for: avatar generation, personalised marketing, headshot apps, character consistency.
Trained on large public face datasets (LAION subsets, FFHQ-style data) with InsightFace embeddings extracted per face. Exact composition is not disclosed.
License: Open-source weights under the Tencent IP-Adapter license, which allows research and non-commercial use. Commercial use generally requires negotiation with Tencent ARC Lab.
Known limitations
- Strong deepfake risk — needs consent and abuse controls
- Identity preservation degrades for unusual poses or low-quality refs
- Requires the InsightFace recognition stack at inference
- Less effective on children's faces (training data is mostly adult)
- Newer models (PuLID, InstantID) often outperform on identity fidelity
Frequently asked questions
Related Models
View all Image GenerationFlux 1.1 Pro Ultra
FLUX 1.1 Pro in ultra mode. Up to 4 megapixel images with raw mode for photorealism.
Flux Dev
Black Forest Labs' development model. Fast, high-quality image generation with LoRA support.
Google Imagen 4
Google's Imagen 4. Text-to-image with strong photorealism and improved typography support.
Google Imagen 4 Ultra
Premium Imagen 4 tier. Highest fidelity, prompt adherence and typography quality from Google.
Start using IP-Adapter FaceID Plus v2 today
Get started with free credits. No credit card required. Access IP-Adapter FaceID Plus v2 and 100+ other models through a single API.