How much does Kimi K2 (Moonshot) cost via Railwail?

Input: €0.600 per 1M tokens. Output: €2.50 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Kimi K2 (Moonshot)?

Kimi K2 (Moonshot) supports a 131.1K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is Kimi K2 (Moonshot)?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Kimi K2 (Moonshot) better than Bio_ClinicalBERT?

It depends on your use case. Kimi K2 (Moonshot) (Custom) and Bio_ClinicalBERT (huggingface) are both strong choices in text & chat. Compare them side-by-side at /compare/kimi-k2-vs-bio-clinicalbert.

Kimi K2 (Moonshot)

Name: Kimi K2 (Moonshot)
Brand: Custom
SKU: kimi-k2
Price: 0.0006 EUR
Availability: InStock

Popular

Custom

Text & Chat

Moonshot AI's 1T-parameter MoE model. Industry-leading agentic coding and tool-use benchmarks.

Try Kimi K2 (Moonshot) now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

Kimi K2 (Moonshot) is text & chat AI model from Custom, priced at €0.600 per 1M input tokens with a 131.1K tokens context window.

Try Kimi K2 (Moonshot)

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Kimi K2 (Moonshot) into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("kimi-k2", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("kimi-k2", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("kimi-k2", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

131,072 tokens

Max output

16,384 tokens

Developer

Custom

Deep dive — Moonshot AI's Kimi K2 (Moonshot)

About Moonshot AI

Founded 2023 · Beijing, China

Moonshot AI (月之暗面, literally 'dark side of the moon') was founded in March 2023 in Beijing by Yang Zhilin (CEO, Tsinghua and CMU alumnus, co-author of XLNet and Transformer-XL), Zhou Xinyu and Wu Yuxin. The company quickly emerged as one of the four 'AI tigers' of China alongside Zhipu AI, MiniMax and 01.AI. Moonshot's flagship product is Kimi (named after the explorer Kimi Raikkonen), a consumer-facing chatbot launched in October 2023 that became famous in China for very long context handling. Kimi initially launched with a 200K Chinese-character context, expanded to 1M and then 2M tokens via long-context architectural research. Moonshot has raised over $1.3B across multiple rounds, including a $1B round in early 2024 led by Alibaba at a $2.5B valuation, and a subsequent round in late 2024 reportedly valuing the company near $3.3B. Investors include Alibaba, Tencent, Sequoia China, Hongshan and HSG. Kimi K2 was released in July 2025 as the lab's open-weight Mixture-of-Experts flagship, marking Moonshot's move to a more open model strategy alongside its consumer app.

Visit Moonshot AI →

Architecture

Sparse Mixture-of-Experts Transformer (Muon optimizer, agentic post-training)

Kimi K2 was released by Moonshot AI in July 2025 as an open-weight Mixture-of-Experts model with 1 trillion total parameters and 32 billion active per token across 384 experts (8 selected per token). The architecture follows DeepSeek-style fine-grained MoE with Multi-head Latent Attention for efficient inference. Kimi K2 was pretrained on 15.5 trillion tokens of multilingual web text, code, books and scientific papers, with a heavy emphasis on Chinese and English. The training run used the Muon optimizer (Momentum Orthogonalized by Newton-Schulz) at unprecedented scale and reportedly improved sample efficiency over AdamW. Moonshot published the MuonClip variant that adds gradient clipping to stabilise Muon at trillion-parameter scale. Post-training emphasised agentic capabilities: Kimi K2 was trained with synthetic and real tool-use trajectories covering multi-step web search, coding, file manipulation and structured tool calling, positioning it as one of the strongest open-weight agentic models. The model supports a 128K context window. Two variants were released: Kimi K2-Base for fine-tuning and Kimi K2-Instruct for general chat and agentic use. Weights ship under a Modified MIT License that requires attribution for very-large-scale commercial deployments.

Parameters: 1T total, 32B active per token
Context: 128K tokens

What it can do

1T-parameter Mixture-of-Experts with 32B active per token
Pretrained on 15.5T tokens using the Muon optimizer (MuonClip variant)
Strong agentic capability: SWE-bench Verified, Terminal-Bench, ToolBench leadership
128K context window
Function calling and parallel tool calls
Excellent Chinese-English bilingual performance
Code generation and editing across major languages
Open weights under Modified MIT License
Compatible with vLLM, SGLang, llama.cpp, HuggingFace
Available via Kimi consumer app and Moonshot API
Best for: agentic workloads, bilingual chat, coding agents, on-prem deployment.

Training & License

Pretrained on 15.5 trillion tokens of multilingual web text (Chinese and English dominant), code, books and scientific papers using the Muon optimizer with MuonClip stability. Knowledge cutoff is approximately early 2025. Post-training emphasises agentic tool-use trajectories and supervised fine-tuning on curated coding and reasoning data.

License: Modified MIT License: open weights, commercial use permitted; very-large-scale deployments require attribution in product UI.

Known limitations

Filters Chinese political topics
Large memory footprint requires multi-GPU inference
No native vision input in base K2 release
Limited third-party safety evaluations
Less integrated tooling ecosystem outside China

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Bio_ClinicalBERT

huggingface

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

Biomedical NER (all entities)

huggingface

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Opus 4.8