Nous Hermes 3 405B
Full-parameter fine-tune of Llama 3.1 405B by Nous Research. Steerable, uncensored, strong tool use.
Nous Hermes 3 405B is text & chat AI model from Together AI, priced at β¬0.000 per 1M input tokens with a 131.1K tokens context window.
0.7
Pricing
API Integration
Use our OpenAI-compatible API to integrate Nous Hermes 3 405B into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple β just pass a string
const reply = await rw.run("hermes-3-405b", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("hermes-3-405b", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("hermes-3-405b", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive β Nous Research's Nous Hermes 3 405B
Nous Research is a community-driven open-source AI collective founded in 2023, co-led by Karan 'Teknium' Malhotra, Jeffrey Quesnelle, Bowen Peng and others, with a distributed contributor base of independent researchers. Nous focuses on uncensored, steerable, character-rich fine-tunes of open base models β its Hermes line (Hermes, OpenHermes, Hermes 2, Hermes 2.5, Hermes 3) is the flagship instruction-following family, with Yarn (context-length extension) and Capybara as influential adjacent projects. Hermes 3 405B was released August 2024 in partnership with Lambda Labs (compute) and was the first full-parameter fine-tune of Meta's Llama 3.1 405B base model. Nous raised seed funding from Distributed Global and a16z-affiliated angels in 2024.
Visit Nous Research βHermes 3 405B is a full-parameter supervised fine-tune (not LoRA) of Meta's Llama 3.1 405B base model. The architecture is unchanged from Llama 3.1: 126 layers, 16,384 hidden size, 128-head grouped-query attention with 8 KV heads, RoPE positional embeddings with the Llama 3 scaling that supports 128K context, SwiGLU activations and the 128,000-token Llama 3 BPE tokeniser. The fine-tune was carried out by Nous Research on approximately 256 NVIDIA H100 GPUs supplied by Lambda Labs, using a curated dataset of around 390M instruction tokens (~2.5M examples) covering role-play, function calling, code, math, RAG, agentic tool use and uncensored creative writing β much of it Nous-curated synthetic from larger models. The model uses a ChatML-style format with native `<tool_call>` JSON-schema tags and `<scratchpad>` chain-of-thought tags. Released August 2024 under the Llama 3.1 Community License.
- Parameters
- 405B (dense)
- Context
- 128K tokens
- Full-parameter fine-tune of Llama 3.1 405B (not LoRA)
- Strong system-prompt steering for persona and rule-set instructions
- Native ChatML `<tool_call>` JSON-schema tags and `<scratchpad>` reasoning tags
- 128K context inherited from Llama 3.1
- Reduced RLHF-style refusals β friendlier for research and creative writing
- Competitive benchmark scores with Llama 3.1 405B Instruct (MMLU, GPQA, math)
- Open weights under Llama 3.1 Community License
- Best for: customisable agents, role-play platforms, function-calling assistants, self-hosted Llama 3.1 alternatives.
Supervised fine-tuning on ~390M instruction tokens across ~2.5M examples covering role-play, function calling, code, math, RAG, agent traces and creative writing. Large fraction is Nous-curated synthetic data distilled from larger models. The full Hermes 3 dataset card is published alongside the model. No RLHF / no DPO in the 405B variant. Base model knowledge cutoff December 2023.
License: Llama 3.1 Community License. Commercial use permitted, but services with >700M monthly active users require a separate Meta license. Meta's Acceptable Use Policy applies to all derivatives.
Known limitations
- Reduced safety guardrails versus Meta's Llama 3.1 405B Instruct
- Requires ~810GB GPU memory at FP16 (~200GB at INT4) β expensive to self-host
- No vision modality
- Slower than smaller open instructs for low-latency chatbot use
- Llama 3.1 license excludes services with >700M MAU without separate Meta license
- Knowledge cutoff inherited from Llama 3.1 (December 2023)
Frequently asked questions
Related Models
View all Text & ChatClaude Opus 4
Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.
Claude Sonnet 4
Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.
DeepSeek V3.1
DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.
DeepSeek V4 Pro
DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.
Start using Nous Hermes 3 405B today
Get started with free credits. No credit card required. Access Nous Hermes 3 405B and 100+ other models through a single API.