How much does Cohere Command R+ (08-2024) cost via Railwail?

Input: €2.50 per 1M tokens. Output: €10.00 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Cohere Command R+ (08-2024)?

Cohere Command R+ (08-2024) supports a 128K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is Cohere Command R+ (08-2024)?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Cohere Command R+ (08-2024) better than Bio_ClinicalBERT?

It depends on your use case. Cohere Command R+ (08-2024) (Cohere) and Bio_ClinicalBERT (huggingface) are both strong choices in text & chat. Compare them side-by-side at /compare/command-r-plus-08-2024-vs-bio-clinicalbert.

Cohere Command R+ (08-2024)

Name: Cohere Command R+ (08-2024)
Brand: Custom
SKU: command-r-plus-08-2024
Price: 0.0025 EUR
Availability: InStock

Cohere

Text & Chat

Cohere's flagship RAG- and tool-optimized chat model. 128k context, refreshed August 2024.

Try Cohere Command R+ (08-2024) now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

Cohere Command R+ (08-2024) is text & chat AI model from Cohere, priced at €2.50 per 1M input tokens with a 128K tokens context window.

Try Cohere Command R+ (08-2024)

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Cohere Command R+ (08-2024) into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("command-r-plus-08-2024", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("command-r-plus-08-2024", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("command-r-plus-08-2024", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

128,000 tokens

Max output

4,096 tokens

Developer

Cohere

Deep dive — Cohere's Cohere Command R+ (08-2024)

About Cohere

Founded 2019 · Toronto, Canada

Cohere is an enterprise-focused LLM lab founded in 2019 by Aidan Gomez (co-author of 'Attention Is All You Need'), Ivan Zhang and Nick Frosst. Cohere targets enterprise customers with deployment options spanning Cohere's hosted API, AWS Bedrock, Azure, Oracle Cloud and on-prem (Cohere North). Command R+ is Cohere's flagship enterprise LLM, scaling the Command R recipe up to 104B parameters and positioned against GPT-4, Claude 3 Opus and Mistral Large. The 08-2024 refresh shipped August 2024 with major gains on math, code, reasoning, citation accuracy and reduced refusals. Investors include Nvidia, Cisco, Salesforce and Oracle, with total funding above $1B and a 2024 valuation around $5.5B.

Visit Cohere →

Architecture

Decoder-only Transformer optimised for RAG and tool use

Command R+ 08-2024 is a 104B parameter dense decoder-only transformer — the larger sibling of Command R, scaled up with the same architectural recipe. It uses grouped-query attention (8 KV heads), RoPE positional embeddings (extended for 128K context), SwiGLU activations, RMSNorm and no biases, with the same 256,000-token multilingual BPE tokeniser as Command R and the Aya family. The 08-2024 refresh kept the architecture identical to the April 2024 release but delivered substantial improvements in math, code, reasoning, instruction following and structured outputs while reducing spurious refusals at unchanged pricing. Post-training emphasises citation-grounded RAG (structured `<co: 0>` citation tokens), parallel multi-step tool calling with chain-of-thought planning, and multilingual instruction following across 10+ business languages. Open weights released on Hugging Face under CC-BY-NC 4.0 for research; commercial production via Cohere API or marketplaces.

Parameters: 104B (dense)
Context: 128K tokens

What it can do

104B dense parameters — Cohere's frontier enterprise model
Top-tier RAG quality with structured inline citations
Highly capable multi-step tool use with parallel calls and error recovery
128K context window
Strong multilingual: 10+ business languages including Arabic, Japanese, Korean
Open weights (CC-BY-NC) at 104B — largest open RAG-tuned model at release
Hosted on Cohere API, AWS Bedrock, Azure, Oracle Cloud
Best for: production enterprise RAG, complex tool-using agents, multilingual content ops, regulated-industry deployments.

Training & License

Trained on trillions of tokens of multilingual web data, code, math, books, and Cohere-curated synthetic RAG and tool-use traces. Knowledge cutoff approximately early 2024. Post-training combines supervised fine-tuning with preference optimisation; specific methods and data are proprietary.

License: Cohere Terms of Service for the hosted API. Open weights on Hugging Face under CC-BY-NC 4.0 (research only). Commercial production via Cohere API, AWS Bedrock, Azure, Oracle Cloud, or licensed on-prem deployment under Cohere North.

Known limitations

No vision modality
Pricier and slower than GPT-4o-mini class models for simple tasks
Open weights non-commercial — production requires hosted/licensed access
Behind GPT-4o / Claude 3.5 Sonnet on the hardest reasoning and coding benchmarks
Smaller community and tooling ecosystem than OpenAI/Anthropic

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Bio_ClinicalBERT

huggingface

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

Biomedical NER (all entities)

huggingface

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Opus 4.8