Nous Hermes 3 405B

Together AI
Text & Chat

Full-parameter fine-tune of Llama 3.1 405B by Nous Research. Steerable, uncensored, strong tool use.

Try Nous Hermes 3 405B now
Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.
Sign in to try this model with €5 free credits.
Sign in
Press Cmd+Enter to send
Response appears here.
TL;DR·Last updated May 16, 2026

Nous Hermes 3 405B is text & chat AI model from Together AI, priced at €0.000 per 1M input tokens with a 131.1K tokens context window.

Try Nous Hermes 3 405B

0.7

Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Nous Hermes 3 405B into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("hermes-3-405b", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("hermes-3-405b", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("hermes-3-405b", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Context window
131,072 tokens
Max output
4,096 tokens
Developer
Together AI
Category
Text & Chat
Supported Formats
text
Tags
nous
open-weights
tools
roleplay
pricing-tbd

Deep dive — Nous Research's Nous Hermes 3 405B

About Nous Research
Founded 2023 · San Francisco, USA (distributed)

Nous Research is a community-driven open-source AI collective founded in 2023, co-led by Karan 'Teknium' Malhotra, Jeffrey Quesnelle, Bowen Peng and others, with a distributed contributor base of independent researchers. Nous focuses on uncensored, steerable, character-rich fine-tunes of open base models — its Hermes line (Hermes, OpenHermes, Hermes 2, Hermes 2.5, Hermes 3) is the flagship instruction-following family, with Yarn (context-length extension) and Capybara as influential adjacent projects. Hermes 3 405B was released August 2024 in partnership with Lambda Labs (compute) and was the first full-parameter fine-tune of Meta's Llama 3.1 405B base model. Nous raised seed funding from Distributed Global and a16z-affiliated angels in 2024.

Visit Nous Research →
Architecture
Decoder-only Transformer (Llama 3.1 architecture)

Hermes 3 405B is a full-parameter supervised fine-tune (not LoRA) of Meta's Llama 3.1 405B base model. The architecture is unchanged from Llama 3.1: 126 layers, 16,384 hidden size, 128-head grouped-query attention with 8 KV heads, RoPE positional embeddings with the Llama 3 scaling that supports 128K context, SwiGLU activations and the 128,000-token Llama 3 BPE tokeniser. The fine-tune was carried out by Nous Research on approximately 256 NVIDIA H100 GPUs supplied by Lambda Labs, using a curated dataset of around 390M instruction tokens (~2.5M examples) covering role-play, function calling, code, math, RAG, agentic tool use and uncensored creative writing — much of it Nous-curated synthetic from larger models. The model uses a ChatML-style format with native `<tool_call>` JSON-schema tags and `<scratchpad>` chain-of-thought tags. Released August 2024 under the Llama 3.1 Community License.

Parameters
405B (dense)
Context
128K tokens
What it can do
  • Full-parameter fine-tune of Llama 3.1 405B (not LoRA)
  • Strong system-prompt steering for persona and rule-set instructions
  • Native ChatML `<tool_call>` JSON-schema tags and `<scratchpad>` reasoning tags
  • 128K context inherited from Llama 3.1
  • Reduced RLHF-style refusals — friendlier for research and creative writing
  • Competitive benchmark scores with Llama 3.1 405B Instruct (MMLU, GPQA, math)
  • Open weights under Llama 3.1 Community License
  • Best for: customisable agents, role-play platforms, function-calling assistants, self-hosted Llama 3.1 alternatives.
Training & License

Supervised fine-tuning on ~390M instruction tokens across ~2.5M examples covering role-play, function calling, code, math, RAG, agent traces and creative writing. Large fraction is Nous-curated synthetic data distilled from larger models. The full Hermes 3 dataset card is published alongside the model. No RLHF / no DPO in the 405B variant. Base model knowledge cutoff December 2023.

License: Llama 3.1 Community License. Commercial use permitted, but services with >700M monthly active users require a separate Meta license. Meta's Acceptable Use Policy applies to all derivatives.

Known limitations
  • Reduced safety guardrails versus Meta's Llama 3.1 405B Instruct
  • Requires ~810GB GPU memory at FP16 (~200GB at INT4) — expensive to self-host
  • No vision modality
  • Slower than smaller open instructs for low-latency chatbot use
  • Llama 3.1 license excludes services with >700M MAU without separate Meta license
  • Knowledge cutoff inherited from Llama 3.1 (December 2023)

Frequently asked questions

Start using Nous Hermes 3 405B today

Get started with free credits. No credit card required. Access Nous Hermes 3 405B and 100+ other models through a single API.