How much does AI21 Jamba 1.5 Large cost via Railwail?

Input: €2.00 per 1M tokens. Output: €8.00 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of AI21 Jamba 1.5 Large?

AI21 Jamba 1.5 Large supports a 256K tokens context window — enough for entire codebases or research papers in one prompt.

How fast is AI21 Jamba 1.5 Large?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is AI21 Jamba 1.5 Large better than Bio_ClinicalBERT?

It depends on your use case. AI21 Jamba 1.5 Large (Custom) and Bio_ClinicalBERT (huggingface) are both strong choices in text & chat. Compare them side-by-side at /compare/jamba-1-5-large-vs-bio-clinicalbert.

AI21 Jamba 1.5 Large

Name: AI21 Jamba 1.5 Large
Brand: Custom
SKU: jamba-1-5-large
Price: 0.002 EUR
Availability: InStock

Custom

Text & Chat

AI21's flagship hybrid Mamba-Transformer model with a 256k context window for long-document tasks.

Try AI21 Jamba 1.5 Large now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

AI21 Jamba 1.5 Large is text & chat AI model from Custom, priced at €2.00 per 1M input tokens with a 256K tokens context window.

Try AI21 Jamba 1.5 Large

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate AI21 Jamba 1.5 Large into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("jamba-1-5-large", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("jamba-1-5-large", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("jamba-1-5-large", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

256,000 tokens

Max output

4,096 tokens

Developer

Custom

Deep dive — AI21 Labs's AI21 Jamba 1.5 Large

About AI21 Labs

Founded 2017 · Tel Aviv, Israel

AI21 Labs is one of the earliest commercial LLM companies, founded in 2017 in Tel Aviv by Yoav Shoham (Stanford emeritus, AI pioneer), Ori Goshen and Amnon Shashua (Mobileye founder, ex-Intel SVP). AI21 built the Jurassic-1 (2021) and Jurassic-2 (2023) families and pioneered hybrid State-Space + Transformer architectures with Jamba in March 2024 — the first production-scale Mamba-Transformer hybrid LLM. Jamba 1.5 followed in August 2024 at two scales: Mini (52B total / 12B active) and Large (398B total / 94B active). AI21 has raised over $336M from investors including Google, Nvidia, Walden Catalyst and Pitango, and serves enterprise customers through AI21 Studio, AWS Bedrock, Azure AI Studio, and Snowflake Cortex.

Visit AI21 Labs →

Architecture

Hybrid Mamba-Transformer Mixture-of-Experts

Jamba 1.5 Large is a hybrid State-Space + Transformer Mixture-of-Experts model. The architecture interleaves Mamba (selective state-space) layers with standard self-attention layers in a 7:1 Mamba-to-Attention ratio across 72 blocks. Mixture-of-Experts is applied to MLP modules in attention blocks with 16 experts and top-2 routing, giving 94B active parameters out of 398B total. The Mamba layers handle long-range dependencies with O(N) memory while the attention layers preserve in-context retrieval quality, enabling a true 256,000-token effective context — empirically validated on the RULER long-context benchmark, where pure-transformer 128K models degrade noticeably. The model uses a 64,000-token BPE tokeniser and supports nine languages (English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, Hebrew). Released August 2024 under the Jamba Open Model License with hosted access via AI21 Studio, AWS Bedrock, Azure AI Studio and Snowflake Cortex.

Parameters: 398B total, 94B active per token (16 experts, top-2 routing)
Context: 256K tokens

What it can do

Hybrid Mamba+Transformer+MoE architecture
398B total / 94B active parameters
256K effective context — best-in-class on RULER long-context benchmark
Constant memory per token from Mamba — cheap long-context inference
Native function calling and JSON-mode structured output
Multilingual: English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, Hebrew
Open weights on Hugging Face under Jamba Open Model License
Best for: long-document analysis, many-document RAG, long-trace agents, cost-efficient enterprise long-context inference.

Training & License

Pretrained on trillions of tokens of web data, code, math, books and multilingual sources (exact figure not disclosed). Knowledge cutoff March 2024. Post-training is supervised fine-tuning plus preference optimisation; the SSAM (state-space attention mix) post-training adapts Mamba-state regularisation.

License: Jamba Open Model License. Permissive for research and commercial use with attribution and AUP compliance — weaker than Apache 2.0 but more open than research-only licenses. Hosted commercial access via AI21 Studio, AWS Bedrock, Azure AI Studio and Snowflake Cortex.

Known limitations

398B total parameters need ~8x H100 for FP16 inference
No vision modality
Hybrid architecture has less community tooling — some inference engines unsupported
Behind GPT-4o / Claude 3.5 Sonnet on hardest reasoning, code and math
Jamba Open Model License has acceptable-use restrictions and attribution requirements

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Bio_ClinicalBERT

huggingface

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

Biomedical NER (all entities)

huggingface

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Opus 4.8