Sora
OpenAI video generation model. Create realistic and imaginative videos from text prompts up to 20 seconds.
Sora is video generation AI model from OpenAI, priced at €0.000 per 1M input tokens with a unknown context window.
About this model
Image References
Examples
See what Sora can generate
Cinematic Scene
"A cinematic aerial shot of a lighthouse on a rocky coast at sunset, golden light reflecting on crashing waves"
Nature Close-up
"Close-up of rain drops falling on a green leaf in slow motion, crystal clear water beads"
Pricing
API Integration
Use our OpenAI-compatible API to integrate Sora into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple — just pass a string
const reply = await rw.run("sora", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("sora", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("sora", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive — OpenAI's Sora
OpenAI was founded in December 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba, John Schulman, Andrej Karpathy and others, originally as a non-profit AI research lab. It restructured in 2019 into a capped-profit company and built the GPT, DALL-E, Whisper and Codex families. Sora was unveiled by OpenAI on 15 February 2024 as a research preview, demonstrating one-minute high-fidelity video generation from text prompts. After a months-long red-team and creator-partner phase, Sora launched publicly in December 2024 as 'Sora 1' inside the ChatGPT Plus and Pro plans. OpenAI's headline framing for Sora was that scaling a diffusion-transformer trained on long video clips and 'spacetime patches' produced emergent world-model-like behaviour: object permanence, physical plausibility and persistent identity.
Visit OpenAI →OpenAI describe Sora as a diffusion transformer that operates on 'spacetime patches' of a learned video latent representation. A video is first encoded by a learned 3D causal VAE into a spatio-temporal latent tensor, then split into a sequence of cube-shaped patches that play the role of tokens for a transformer. The DiT denoiser is conditioned on rich text embeddings produced by a GPT-family captioner trained to generate dense, highly descriptive captions for training videos -- a recaptioning trick borrowed from DALL-E 3. The model supports text-to-video, image-to-video, video-to-video, in-painting and extension. Native generations at launch were up to 1 minute at 1080p in research demos, with public Sora capped lower for compute reasons (typically up to 20 seconds at 720p / 1080p). Training uses a massive curated multilingual video corpus with synthetic captions; OpenAI have not published a formal paper, only a technical report 'Video generation models as world simulators'.
- Parameters
- Undisclosed
- Context
- unknown
- Text-to-video, image-to-video, video-to-video, in-painting and extension
- Up to ~1 minute high-fidelity video in research demos
- Strong object permanence and emergent physical plausibility
- Rich cinematographic prompt vocabulary
- Spacetime-patch DiT architecture pioneered at this scale
- Storyboard mode and remix in the Sora consumer app
- Available via ChatGPT Plus / Pro and Sora app
- Dense recaptioning pipeline (DALL-E 3 style) for strong prompt adherence
- Best for: cinematic shorts, creative ideation, complex multi-shot scenes.
Massive curated multilingual video corpus including licensed footage, public web data and partner sources, with dense synthetic captions produced by a GPT-family captioner. Exact size and sources undisclosed.
License: Proprietary commercial licence via OpenAI terms; commercial use on Pro plans subject to content policy and provenance requirements (C2PA metadata, visible watermark).
Known limitations
- Public Sora 1 capped well below research-demo 1-minute / 1080p capacity
- No native audio in Sora 1
- Strict moderation, including on people, brands and political content
- Closed model without a peer-reviewed paper
- Per-clip cost and queue times can be high
Frequently asked questions
Related Models
View all Video GenerationGoogle Veo 2
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Google Veo 3
Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.
Google Veo 3.1
Latest Veo with image-to-video and context-aware audio
Kling v3
Cinematic video up to 15s with multi-shot and native audio
Start using Sora today
Get started with free credits. No credit card required. Access Sora and 100+ other models through a single API.