Reka Core is multimodal AI model developed by Custom. Reka's frontier multimodal model supporting text, image, video and audio inputs. Access it through Railwail's unified, OpenAI-compatible API at €10.00 per 1M input tokens.

How much does Reka Core cost via Railwail?

Input: €10.00 per 1M tokens. Output: €25.00 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Reka Core?

Reka Core supports a 128K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is Reka Core?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Reka Core better than BLIP?

It depends on your use case. Reka Core (Custom) and BLIP (Salesforce) are both strong choices in multimodal. Compare them side-by-side at /compare/reka-core-vs-blip-captioning.

Does Reka Core support image input (vision)?

Yes — Reka Core accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: text, image, video, audio.

Reka Core

Name: Reka Core
Brand: Custom
SKU: reka-core
Price: 0.01 EUR
Availability: InStock

Custom

Multimodal

Reka's frontier multimodal model supporting text, image, video and audio inputs.

Try Reka Core now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

Reka Core is multimodal AI model from Custom, priced at €10.00 per 1M input tokens with a 128K tokens context window.

Try Reka Core

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Reka Core into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("reka-core", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("reka-core", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("reka-core", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

128,000 tokens

Max output

4,096 tokens

Developer

Custom

Deep dive — Reka AI's Reka Core

About Reka AI

Founded 2022 · San Francisco, California, USA

Reka AI was founded in mid-2022 by Dani Yogatama (CEO, ex-Google DeepMind, ex-Facebook), Yi Tay (Chief Scientist, ex-Google Brain), Donovan Ong (ex-Apple) and Qi Liu, with engineering teams in San Francisco, London and Singapore. The founders had previously worked on large-scale language models, mixture-of-experts and efficient long-context architectures inside DeepMind and Brain. Reka raised a $58M Series A in 2023 led by DST Global Partners and reportedly closed a Series B at a near-unicorn valuation in 2024. The Reka model line is intentionally multimodal-first and includes Edge (small), Flash (mid) and Core (flagship), all trained jointly on text, image, video and audio. Reka Core launched in April 2024 with the public Reka Core technical report, positioning it as the first multimodal-from-the-ground-up frontier model from a startup outside the FAANG/DeepMind orbit.

Visit Reka AI →

Architecture

Decoder-only Transformer trained multimodally on text, image, video and audio

Reka Core is the flagship of Reka's multimodal frontier family. According to the publicly released Reka Core technical report, the model is a decoder-only Transformer trained from scratch on a multimodal corpus comprising text, code, images, video frames and audio clips, with separate modality encoders that project into a shared token embedding space. Training was carried out on a custom cluster using Pathways-style infrastructure and a curriculum that first pretrains on text and code, then progressively introduces images, audio and video. The context window is 128K tokens, video input is supported up to several minutes, and audio is processed via a learned audio encoder (described as ImageBind-style on the audio side). The training compute is reported as roughly an order of magnitude less than GPT-4 while reaching competitive scores on MMMU, Perception Test and VideoMME. Core is offered exclusively through the Reka API, Reka Playground and Snowflake Cortex; weights are not released.

Parameters: Undisclosed (described as 'flagship', estimated tens of billions)
Context: 128K tokens

What it can do

True multimodal input: text, image, video and audio in the same context
128K-token context window
Video understanding up to several minutes
Audio understanding for transcription and sound classification
Function calling, JSON output and tool use
Available on Reka API, Reka Playground and Snowflake Cortex
Multilingual coverage across 32+ languages
Best for: multimodal assistants, video QA, audio-grounded chat, frontier features at startup pricing

Training & License

Decoder-only multimodal training over a curated corpus of text, code, images, video frames and audio with progressive curriculum. Compute reported as roughly an order of magnitude less than GPT-4.

License: Proprietary commercial API. Generated outputs may be used commercially under the Reka terms.

Known limitations

Closed weights, hosted only
Quality below GPT-4o and Claude 3.5 Sonnet on hardest text reasoning
Smaller ecosystem and tooling than OpenAI / Anthropic
Video processing latency higher than competitors
No fine-tuning available to external customers

Research papers

Frequently asked questions

Related Models

View all Multimodal

BLIP

Salesforce

Salesforce BLIP. Vision-language model for image captioning and visual question answering. Given an image it writes a short natural-language caption, or answers a question about the image when one is supplied. A widely used baseline for automatic captioning.

€1.00

CLIP Interrogator

Community

pharmapsychotic's CLIP Interrogator. Takes an image and produces a Stable-Diffusion-style text prompt by combining BLIP captioning with CLIP to rank likely subjects, artists, mediums and styles. Commonly used to reverse-engineer a prompt from an existing picture.

€1.00

Claude 3.5 Sonnet (vision)

Anthropic

Anthropic Claude 3.5 Sonnet with image input. 200k context, strong on dense documents, tables, charts and handwriting. Reliable structured extraction from screenshots and scans.

Free

Claude Opus 4.7