What is Google Veo 3.1?

Google Veo 3.1 is video generation AI model developed by Google DeepMind. Latest Veo with image-to-video and context-aware audio Access it through Railwail's unified, OpenAI-compatible API at €0.000 per 1M input tokens.

How much does Google Veo 3.1 cost via Railwail?

Per-call: €6.00. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Google Veo 3.1?

Google Veo 3.1 supports a unknown context window — enough for typical AI workloads.

How fast is Google Veo 3.1?

Average response latency: 92.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is Google Veo 3.1 better than Google Veo 2?

It depends on your use case. Google Veo 3.1 (Google DeepMind) and Google Veo 2 (Google DeepMind) are both strong choices in video generation. Compare them side-by-side at /compare/veo-3-1-vs-google-veo-2.

Google Veo 3.1

Name: Google Veo 3.1
Brand: Replicate
SKU: veo-3-1
Price: 6 EUR
Availability: InStock

New

Popular

Google DeepMind

Video Generation

Latest Veo with image-to-video and context-aware audio

Queue video with Google Veo 3.1

Video generation runs asynchronously — we'll queue a job and you can track it in your history.

Generates as an async job — typically 30 s to 2 min.

TL;DR·Last updated March 25, 2026

Google Veo 3.1 is video generation AI model from Google DeepMind, priced at €0.000 per 1M input tokens with a unknown context window.

Try Google Veo 3.1

Prompt

Image References

Reference Images (max 3)

Add

Duration

Aspect Ratio

Examples

See what Google Veo 3.1 can generate

0:08

Aerial

"Sweeping aerial view of coastal city at golden hour"

Pricing

Price per Generation

Per generation€6.00

API Integration

Use our OpenAI-compatible API to integrate Google Veo 3.1 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("veo-3-1", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("veo-3-1", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("veo-3-1", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€6.00

Avg. latency

92.0s

Est. duration

2min

Developer

Google DeepMind

Deep dive — Google DeepMind's Google Veo 3.1

About Google DeepMind

Founded 2010 · London, United Kingdom

Google DeepMind, formed in 2023 from the merger of Google Brain and DeepMind under Demis Hassabis, runs the Veo video-generation programme. After the headline launch of Veo 3 (May 2025, Google I/O) with native audio, the team shipped Veo 3.1 in late 2025 as an incremental quality and capability upgrade. Veo 3.1 improves prompt adherence, motion physics, character consistency across extended sequences and audio fidelity (including richer ambient soundscapes and more controllable dialogue). The model is exposed via Vertex AI, the Gemini API, Google Labs (VideoFX, Whisk) and the Flow filmmaking surface aimed at professional creators.

Visit Google DeepMind →

Architecture

Latent video diffusion / DiT with joint audio-video diffusion and reference-frame conditioning

Veo 3.1 retains the joint audio-video diffusion architecture introduced in Veo 3: video is encoded into a spatio-temporal latent space via a 3D causal VAE and denoised by a transformer-based diffusion model, while a coupled audio diffusion module generates synchronized music, ambient sound and dialogue. Improvements in 3.1 are reported across motion physics (water, fabric, crowds), audio quality and the ability to stay consistent across extended sequences via reference-frame and subject-reference conditioning. Native clips run up to 8 seconds at 1080p with extensions for longer sequences and a separate cascaded super-resolution stage for 4K. Text conditioning uses Gemini-family encoders; image and reference-frame conditioning add identity and layout control. Training expands on the Veo 3 corpus with additional curated multilingual audio-video data and refined recaptioning.

Parameters: Undisclosed
Context: unknown

What it can do

All Veo 3 features plus improved physics, audio fidelity and consistency
Up to 8-second 1080p clips natively, 4K via cascaded super-resolution
Reference-frame and subject-reference conditioning
Joint audio-video diffusion with music, ambient sound and dialogue lip-sync
Rich cinematographic prompt vocabulary
Multilingual prompts via Gemini text encoders
Available via Vertex AI, Gemini API, VideoFX, Whisk and Flow
SynthID audio + visual watermarking
Best for: high-end commercial creative, longer sequences, branded campaigns with sound.

Training & License

Expanded curated multilingual audio-video corpus including licensed footage, public web video (including YouTube under Google's terms) and synthetic data, with multi-granularity captions from Gemini vision-language models.

License: Proprietary commercial licence via Google Cloud / Vertex AI and Gemini API; commercial use under Google's generative-AI terms; mandatory SynthID watermarking.

Known limitations

8-second native clip limit
Audio short-form and English-leaning
Strict moderation on people, brands and political content
Closed model with no peer-reviewed paper
Per-clip cost higher than Veo 3 Fast or Veo 3.1 Fast tiers

Research papers

Frequently asked questions

Related Models

View all Video Generation

Google Veo 2

Google DeepMind

Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.

€5.00

Google Veo 3

Google DeepMind

Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.

€0.75

Google Veo 3 (Replicate)

Google DeepMind

Google's Veo 3 served via Replicate. Text-to-video with native synchronized audio generation. High-fidelity motion and scene coherence in short clips.

€8.00

HunyuanVideo

Tencent

Tencent's HunyuanVideo, a 13B open-weights text-to-video diffusion transformer. Produces high-motion, photorealistic clips with smooth temporal consistency and was one of the first open models to rival closed systems on motion quality.

€5.00

Start using Google Veo 3.1 today

Get started with free credits. No credit card required. Access Google Veo 3.1 and 100+ other models through a single API.

Get Started Free Browse All Models