Reka Edge is multimodal AI model developed by Custom. Reka's small on-device-friendly multimodal model. ~7B parameters, 16k context. Access it through Railwail's unified, OpenAI-compatible API at €0.100 per 1M input tokens.

How much does Reka Edge cost via Railwail?

Input: €0.100 per 1M tokens. Output: €0.100 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Reka Edge?

Reka Edge supports a 16.4K tokens context window — enough for long documents up to ~24,000 words.

How fast is Reka Edge?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Reka Edge better than BLIP?

It depends on your use case. Reka Edge (Custom) and BLIP (Salesforce) are both strong choices in multimodal. Compare them side-by-side at /compare/reka-edge-vs-blip-captioning.

Does Reka Edge support image input (vision)?

Yes — Reka Edge accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: text, image.

Reka Edge

Name: Reka Edge
Brand: Custom
SKU: reka-edge
Price: 0.0001 EUR
Availability: InStock

Custom

Multimodal

Reka's small on-device-friendly multimodal model. ~7B parameters, 16k context.

Try Reka Edge now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

Reka Edge is multimodal AI model from Custom, priced at €0.100 per 1M input tokens with a 16.4K tokens context window.

Try Reka Edge

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Reka Edge into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("reka-edge", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("reka-edge", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("reka-edge", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

16,384 tokens

Max output

4,096 tokens

Developer

Custom

Deep dive — Reka AI's Reka Edge

About Reka AI

Founded 2022 · San Francisco, California, USA

Reka AI was founded in mid-2022 by Dani Yogatama, Yi Tay, Donovan Ong and Qi Liu, all senior researchers from Google DeepMind, Google Brain, Apple and Facebook AI. The company is headquartered in San Francisco with engineering centres in London and Singapore. Reka has raised $58M in Series A funding led by DST Global Partners in 2023 and closed an additional Series B in 2024 at a reported near-unicorn valuation. The Reka model family is multimodal-by-design (text, image, video, audio) and is sized into three tiers: Edge (smallest), Flash (mid), Core (flagship). Reka Edge (7B parameters) was released alongside Flash and Core in early 2024 as the on-device / cost-sensitive sibling, designed to run on a single consumer GPU while preserving the multimodal capabilities that distinguish Reka from text-only competitors.

Visit Reka AI →

Architecture

Decoder-only Transformer trained multimodally on text, image, video and audio (small variant)

Reka Edge is a ~7B parameter decoder-only Transformer trained on the same multimodal corpus as the larger Reka Flash and Core, comprising text, code, images, video frames and audio clips. According to the public Reka Core technical report, all three sizes share the architecture (modality encoders projecting into a shared token embedding space) but differ in scale and the proportion of training compute spent on each modality. Edge is optimised for on-device and cost-sensitive deployments, fits on a single 24 GB consumer GPU in FP16 and supports a 128K-token context window. It accepts text, image, video (up to short clips) and audio (with a learned audio encoder). On MMMU, Perception Test and VideoMME, Edge scores below Flash and Core but ahead of similarly sized open-weights multimodal baselines like LLaVA-Next 7B and InternVL 7B at launch. Edge is offered through the Reka API and Reka Playground; weights are not publicly released.

Parameters: 7B
Context: 128K tokens

What it can do

Multimodal input: text, image, short video and audio
128K-token context window
Runs on a single 24 GB consumer GPU in FP16
JSON output and tool use
Multilingual coverage across 32+ languages
Hosted through Reka API and Reka Playground
Cost-efficient pricing tier in the Reka family
Best for: cost-sensitive multimodal apps, on-device experiments, edge AI

Training & License

Multimodal pretraining over a curated corpus of text, code, images, video frames and audio with progressive curriculum, identical mix to Reka Core but smaller compute budget.

License: Proprietary commercial API. Generated outputs may be used commercially under the Reka terms.

Known limitations

Closed weights, hosted only
Smaller capacity than Flash and Core gives lower quality on hard reasoning
Video clips supported only up to short durations
Smaller ecosystem and tooling than OpenAI / Anthropic
No public fine-tuning for external customers

Research papers

Frequently asked questions

Related Models

View all Multimodal

BLIP

Salesforce

Salesforce BLIP. Vision-language model for image captioning and visual question answering. Given an image it writes a short natural-language caption, or answers a question about the image when one is supplied. A widely used baseline for automatic captioning.

€1.00

CLIP Interrogator

Community

pharmapsychotic's CLIP Interrogator. Takes an image and produces a Stable-Diffusion-style text prompt by combining BLIP captioning with CLIP to rank likely subjects, artists, mediums and styles. Commonly used to reverse-engineer a prompt from an existing picture.

€1.00

Claude 3.5 Sonnet (vision)

Anthropic

Anthropic Claude 3.5 Sonnet with image input. 200k context, strong on dense documents, tables, charts and handwriting. Reliable structured extraction from screenshots and scans.

Free

Claude Opus 4.7