Reka Flash
Reka's 21B dense multimodal model balancing speed and quality. Up to 128k context.
Reka Flash is multimodal AI model from Custom, priced at €0.200 per 1M input tokens with a 128K tokens context window.
0.7
Pricing
API Integration
Use our OpenAI-compatible API to integrate Reka Flash into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple — just pass a string
const reply = await rw.run("reka-flash", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("reka-flash", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("reka-flash", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive — Reka AI's Reka Flash
Reka AI was founded in mid-2022 by Dani Yogatama, Yi Tay, Donovan Ong and Qi Liu, senior researchers from Google DeepMind, Google Brain, Apple and Facebook AI. The company is headquartered in San Francisco with engineering hubs in London and Singapore. Reka raised a $58M Series A in 2023 led by DST Global Partners and a Series B in 2024 at a reported near-unicorn valuation, with backers including Snowflake. Reka's multimodal-by-design family contains three tiers: Edge (~7B), Flash (~21B) and Core (flagship). Reka Flash launched in February 2024 as the cost-performance sweet spot and was the first publicly available Reka model. The Flash technical report described the multimodal training approach later expanded in the April 2024 Reka Core paper. Flash is offered exclusively as a hosted model on the Reka API, Reka Playground and Snowflake Cortex.
Visit Reka AI →Reka Flash is a ~21B parameter decoder-only Transformer trained on Reka's multimodal corpus of text, code, images, video frames and audio. As described in the Reka Core technical report, all three Reka sizes share the same architecture (modality encoders projecting into a shared token embedding space) but differ in scale and the proportion of training compute spent on each modality. Flash supports a 128K-token context window, accepts text, image, video (up to several minutes via uniform frame sampling) and audio (via a learned audio encoder), and produces JSON output and tool calls. Public Reka benchmarks place Flash between GPT-3.5 and GPT-4 on MMLU and BIG-Bench Hard, while matching GPT-4V on Perception Test and VideoMME at a fraction of the price. Flash is the recommended default model in Snowflake Cortex's multimodal API and is positioned as the workhorse Reka model for production AI features.
- Parameters
- 21B
- Context
- 128K tokens
- Multimodal input: text, image, video up to several minutes, audio
- 128K-token context window
- JSON output and function calling
- Multilingual coverage across 32+ languages
- Available on Reka API, Reka Playground and Snowflake Cortex
- Strong cost-performance ratio: between GPT-3.5 and GPT-4 quality at much lower price
- 21B parameters give faster serving than Core
- Best for: production multimodal workloads, RAG, audio-grounded chat, video QA
Multimodal pretraining over a curated corpus of text, code, images, video frames and audio with progressive curriculum, same mix as Reka Core at a smaller compute budget than Core but larger than Edge.
License: Proprietary commercial API. Generated outputs may be used commercially under the Reka terms.
Known limitations
- Closed weights, hosted only
- Quality below Core on hardest reasoning and video tasks
- Smaller ecosystem and tooling than OpenAI / Anthropic
- Audio understanding lighter than dedicated ASR models
- No external fine-tuning
Frequently asked questions
Related Models
View all MultimodalClaude Opus 4.7
Anthropic's April 2026 flagship. 87.6% on SWE-bench Verified, 3x higher image resolution, output self-verification, vision + reasoning.
Claude Sonnet 4.6
Anthropic's balanced mid-tier model from February 2026. Best price/performance for production workloads: 5x cheaper than Opus, near-flagship quality.
Depth Anything v2
Monocular depth-estimation model trained on 595k labeled and 62M unlabeled images. Strong zero-shot generalization in indoor and outdoor scenes.
GPT-5.4
OpenAI's unified flagship combining GPT and o-series reasoning into one model. 1M context, multimodal, top SWE-Bench Pro and OSWorld scores.
Start using Reka Flash today
Get started with free credits. No credit card required. Access Reka Flash and 100+ other models through a single API.