How much does Nous Hermes 3 405B cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Nous Hermes 3 405B?

Nous Hermes 3 405B supports a 131.1K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is Nous Hermes 3 405B?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Nous Hermes 3 405B better than Claude Opus 4?

It depends on your use case. Nous Hermes 3 405B (Together AI) and Claude Opus 4 (Anthropic) are both strong choices in text & chat. Compare them side-by-side at /compare/hermes-3-405b-vs-claude-opus-4.

Nous Hermes 3 405B

Name: Nous Hermes 3 405B
Brand: Together AI
SKU: hermes-3-405b
Availability: InStock

Together AI

Text & Chat

Full-parameter fine-tune of Llama 3.1 405B by Nous Research. Steerable, uncensored, strong tool use.

Try Nous Hermes 3 405B now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated May 16, 2026

Nous Hermes 3 405B is text & chat AI model from Together AI, priced at €0.000 per 1M input tokens with a 131.1K tokens context window.

Try Nous Hermes 3 405B

System Prompt

Message

Temperature

0.7

Max Tokens

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Nous Hermes 3 405B into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("hermes-3-405b", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("hermes-3-405b", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("hermes-3-405b", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

131,072 tokens

Max output

4,096 tokens

Developer

Together AI

Deep dive — Nous Research's Nous Hermes 3 405B

About Nous Research

Founded 2023 · San Francisco, USA (distributed)

Nous Research is a community-driven open-source AI collective founded in 2023, co-led by Karan 'Teknium' Malhotra, Jeffrey Quesnelle, Bowen Peng and others, with a distributed contributor base of independent researchers. Nous focuses on uncensored, steerable, character-rich fine-tunes of open base models — its Hermes line (Hermes, OpenHermes, Hermes 2, Hermes 2.5, Hermes 3) is the flagship instruction-following family, with Yarn (context-length extension) and Capybara as influential adjacent projects. Hermes 3 405B was released August 2024 in partnership with Lambda Labs (compute) and was the first full-parameter fine-tune of Meta's Llama 3.1 405B base model. Nous raised seed funding from Distributed Global and a16z-affiliated angels in 2024.

Visit Nous Research →

Architecture

Decoder-only Transformer (Llama 3.1 architecture)

Hermes 3 405B is a full-parameter supervised fine-tune (not LoRA) of Meta's Llama 3.1 405B base model. The architecture is unchanged from Llama 3.1: 126 layers, 16,384 hidden size, 128-head grouped-query attention with 8 KV heads, RoPE positional embeddings with the Llama 3 scaling that supports 128K context, SwiGLU activations and the 128,000-token Llama 3 BPE tokeniser. The fine-tune was carried out by Nous Research on approximately 256 NVIDIA H100 GPUs supplied by Lambda Labs, using a curated dataset of around 390M instruction tokens (~2.5M examples) covering role-play, function calling, code, math, RAG, agentic tool use and uncensored creative writing — much of it Nous-curated synthetic from larger models. The model uses a ChatML-style format with native `<tool_call>` JSON-schema tags and `<scratchpad>` chain-of-thought tags. Released August 2024 under the Llama 3.1 Community License.

Parameters: 405B (dense)
Context: 128K tokens

What it can do

Full-parameter fine-tune of Llama 3.1 405B (not LoRA)
Strong system-prompt steering for persona and rule-set instructions
Native ChatML `<tool_call>` JSON-schema tags and `<scratchpad>` reasoning tags
128K context inherited from Llama 3.1
Reduced RLHF-style refusals — friendlier for research and creative writing
Competitive benchmark scores with Llama 3.1 405B Instruct (MMLU, GPQA, math)
Open weights under Llama 3.1 Community License
Best for: customisable agents, role-play platforms, function-calling assistants, self-hosted Llama 3.1 alternatives.

Training & License

Supervised fine-tuning on ~390M instruction tokens across ~2.5M examples covering role-play, function calling, code, math, RAG, agent traces and creative writing. Large fraction is Nous-curated synthetic data distilled from larger models. The full Hermes 3 dataset card is published alongside the model. No RLHF / no DPO in the 405B variant. Base model knowledge cutoff December 2023.

License: Llama 3.1 Community License. Commercial use permitted, but services with >700M monthly active users require a separate Meta license. Meta's Acceptable Use Policy applies to all derivatives.

Known limitations

Reduced safety guardrails versus Meta's Llama 3.1 405B Instruct
Requires ~810GB GPU memory at FP16 (~200GB at INT4) — expensive to self-host
No vision modality
Slower than smaller open instructs for low-latency chatbot use
Llama 3.1 license excludes services with >700M MAU without separate Meta license
Knowledge cutoff inherited from Llama 3.1 (December 2023)

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Sonnet 4

Anthropic

Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.

Free

DeepSeek V3.1

DeepSeek

DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.

Free

DeepSeek V4 Pro

DeepSeek

DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.

Free

Start using Nous Hermes 3 405B today

Get started with free credits. No credit card required. Access Nous Hermes 3 405B and 100+ other models through a single API.

Get Started Free Browse All Models

Nous Hermes 3 405B

Pricing

API Integration

Deep dive — Nous Research's Nous Hermes 3 405B

Research papers

Frequently asked questions

What is Nous Hermes 3 405B?

How much does Nous Hermes 3 405B cost via Railwail?

What is the context window of Nous Hermes 3 405B?

How fast is Nous Hermes 3 405B?

Is Nous Hermes 3 405B better than Claude Opus 4?

Related Models

Claude Opus 4

Claude Sonnet 4

DeepSeek V3.1

DeepSeek V4 Pro

Start using Nous Hermes 3 405B today