Google Veo 3.1

New
Popular
Google DeepMind
Video Generation

Latest Veo with image-to-video and context-aware audio

Queue video with Google Veo 3.1
Video generation runs asynchronously — we'll queue a job and you can track it in your history.
Sign in to try this model with €5 free credits.
Sign in
Generates as an async job — typically 30 s to 2 min.
TL;DR·Last updated March 25, 2026

Google Veo 3.1 is video generation AI model from Google DeepMind, priced at €0.000 per 1M input tokens with a unknown context window.

Try Google Veo 3.1

Image References

Sign in to generate — 50 free credits on sign-up

Examples

See what Google Veo 3.1 can generate

0:08

Aerial

"Sweeping aerial view of coastal city at golden hour"

Pricing

Price per Generation
Per generation€6.00

API Integration

Use our OpenAI-compatible API to integrate Google Veo 3.1 into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("veo-3-1", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("veo-3-1", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("veo-3-1", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€6.00
Avg. latency
92.0s
Est. duration
2min
Developer
Google DeepMind
Category
Video Generation
Supported Formats
mp4
Tags
popular
audio
i2v

Deep dive — Google DeepMind's Google Veo 3.1

About Google DeepMind
Founded 2010 · London, United Kingdom

Google DeepMind, formed in 2023 from the merger of Google Brain and DeepMind under Demis Hassabis, runs the Veo video-generation programme. After the headline launch of Veo 3 (May 2025, Google I/O) with native audio, the team shipped Veo 3.1 in late 2025 as an incremental quality and capability upgrade. Veo 3.1 improves prompt adherence, motion physics, character consistency across extended sequences and audio fidelity (including richer ambient soundscapes and more controllable dialogue). The model is exposed via Vertex AI, the Gemini API, Google Labs (VideoFX, Whisk) and the Flow filmmaking surface aimed at professional creators.

Visit Google DeepMind →
Architecture
Latent video diffusion / DiT with joint audio-video diffusion and reference-frame conditioning

Veo 3.1 retains the joint audio-video diffusion architecture introduced in Veo 3: video is encoded into a spatio-temporal latent space via a 3D causal VAE and denoised by a transformer-based diffusion model, while a coupled audio diffusion module generates synchronized music, ambient sound and dialogue. Improvements in 3.1 are reported across motion physics (water, fabric, crowds), audio quality and the ability to stay consistent across extended sequences via reference-frame and subject-reference conditioning. Native clips run up to 8 seconds at 1080p with extensions for longer sequences and a separate cascaded super-resolution stage for 4K. Text conditioning uses Gemini-family encoders; image and reference-frame conditioning add identity and layout control. Training expands on the Veo 3 corpus with additional curated multilingual audio-video data and refined recaptioning.

Parameters
Undisclosed
Context
unknown
What it can do
  • All Veo 3 features plus improved physics, audio fidelity and consistency
  • Up to 8-second 1080p clips natively, 4K via cascaded super-resolution
  • Reference-frame and subject-reference conditioning
  • Joint audio-video diffusion with music, ambient sound and dialogue lip-sync
  • Rich cinematographic prompt vocabulary
  • Multilingual prompts via Gemini text encoders
  • Available via Vertex AI, Gemini API, VideoFX, Whisk and Flow
  • SynthID audio + visual watermarking
  • Best for: high-end commercial creative, longer sequences, branded campaigns with sound.
Training & License

Expanded curated multilingual audio-video corpus including licensed footage, public web video (including YouTube under Google's terms) and synthetic data, with multi-granularity captions from Gemini vision-language models.

License: Proprietary commercial licence via Google Cloud / Vertex AI and Gemini API; commercial use under Google's generative-AI terms; mandatory SynthID watermarking.

Known limitations
  • 8-second native clip limit
  • Audio short-form and English-leaning
  • Strict moderation on people, brands and political content
  • Closed model with no peer-reviewed paper
  • Per-clip cost higher than Veo 3 Fast or Veo 3.1 Fast tiers

Frequently asked questions

Start using Google Veo 3.1 today

Get started with free credits. No credit card required. Access Google Veo 3.1 and 100+ other models through a single API.