Llama 3.3 70B

Meta
Text & Chat

Meta's open-source 70B parameter model. Strong all-around performance with multilingual support.

Try Llama 3.3 70B now
Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.
Sign in to try this model with €5 free credits.
Sign in
Press Cmd+Enter to send
Response appears here.
TL;DR·Last updated March 4, 2026

Llama 3.3 70B is text & chat AI model from Meta, priced at €8.80 per 1M input tokens with a 131.1K tokens context window.

Try Llama 3.3 70B

0.7

Sign in to generate — 50 free credits on sign-up

Examples

See what Llama 3.3 70B can generate

Comparison Essay

Compare and contrast REST and GraphQL APIs in terms of flexibility and performance

REST APIs use fixed endpoints that return predetermined data structures, which is simple but can lead to over-fetching (getting more data than needed) or under-fetching (requiring multiple requests). GraphQL lets clients request exactly the fields they need in a single query, offering superior flexibility. However, REST benefits from better HTTP caching, simpler error handling, and wider tooling support. GraphQL shines in complex applications with varied data needs, while REST remains ideal for straightforward CRUD operations and public APIs.

Explain Like I'm 5

Explain how the internet works to a 5-year-old

Imagine you want to send a drawing to your friend who lives far away. You put your drawing in a magic mailbox, and the mailbox breaks it into tiny puzzle pieces. These pieces zoom through special tunnels underground and even under the ocean! When all the pieces reach your friend's magic mailbox, it puts the puzzle back together. That's basically how the internet works—your computer breaks messages into tiny pieces, sends them through wires, and the other computer puts them back together super fast.

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Llama 3.3 70B into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("llama-3-3-70b", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("llama-3-3-70b", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("llama-3-3-70b", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Context window
131,072 tokens
Max output
4,096 tokens
Avg. latency
2.5s
Developer
Meta
Category
Text & Chat
Tags
open-source
popular

Deep dive — Meta AI's Llama 3.3 70B

About Meta AI
Founded 2013 · Menlo Park, USA

Meta AI (originally Facebook AI Research, FAIR) was founded in December 2013 by Mark Zuckerberg with Yann LeCun as its first director. The lab is now part of Meta Platforms and houses several thousand researchers across Menlo Park, New York, Paris, Montreal, Seattle and Tel Aviv. FAIR has authored landmark papers including PyTorch (2017), Detectron, fastText, RoBERTa, the original LLaMA paper (Feb 2023), Llama 2 (Jul 2023), Llama 3 (Apr 2024), Llama 3.1 (Jul 2024, including the 405B flagship), Llama 3.2 (Sep 2024, multimodal and on-device sizes), Llama 3.3 (Dec 2024) and Llama 4 (Apr 2025). Meta's open-weights strategy under the Llama Community License has made Llama by far the most widely deployed open-weight family with over 700M cumulative downloads. Yann LeCun, Joelle Pineau and Ahmad Al-Dahle lead the GenAI organisation that productises the work. Beyond Llama, Meta ships SeamlessM4T for speech, Segment Anything for vision and the Meta AI consumer assistant across WhatsApp, Instagram, Messenger and Meta.ai. Meta is also the largest user of NVIDIA H100 GPUs in industry, with reported cluster sizes above 350,000 H100-equivalents.

Visit Meta AI →
Architecture
Decoder-only Transformer (dense, Grouped Query Attention)

Llama 3.3 70B Instruct was released by Meta on 6 December 2024 as the final 3.x release before Llama 4. It is a dense decoder-only Transformer with 70 billion parameters, 80 layers, 64 query heads, 8 KV heads (Grouped Query Attention) and a tokenizer with 128K vocabulary. The notable claim is that the post-training recipe lifts the 70B model to match or exceed Llama 3.1 405B on most benchmarks at roughly 1/6th of the inference cost. Llama 3.3 reuses the Llama 3.1 pretrained base (which was trained on approximately 15 trillion tokens of curated public web data, code, books and licensed datasets, with a December 2023 knowledge cutoff). The improvement comes from an updated post-training pipeline combining new supervised fine-tuning, rejection sampling, Direct Preference Optimisation (DPO), online reinforcement learning, and synthetic instruction data generated by Llama 3.1 405B and other models. Llama 3.3 supports 128K context, function calling, parallel tool calls, JSON output and the official Llama 3 chat template. The model is text-only (vision sits in the Llama 3.2 family) and ships under the Llama 3.3 Community License which permits commercial use except for products with more than 700 million monthly active users at launch.

Parameters
70B (dense)
Context
128K tokens
What it can do
  • 70B dense parameters with Grouped Query Attention
  • Post-training lifts 70B to roughly match Llama 3.1 405B on key benchmarks
  • 128K context window
  • Function calling and parallel tool calls (Llama 3.1+ style)
  • JSON output with the chat template tool format
  • Multilingual: 8 officially supported languages including English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
  • Code generation across major programming languages
  • Open weights under Llama 3.3 Community License
  • Massive ecosystem: vLLM, SGLang, TGI, llama.cpp, Ollama, MLX, HuggingFace
  • GGUF/AWQ/GPTQ quantised variants in the community
  • Best for: open-weight chat, function calling, on-prem enterprise, cost-efficient replacement for 405B.
Training & License

Reuses the Llama 3.1 pretrained base (15T tokens of curated public web data, code, books and licensed data, with a December 2023 cutoff). Post-training applies SFT, rejection sampling, DPO, online RL and synthetic data from Llama 3.1 405B.

License: Llama 3.3 Community License: open weights, commercial use permitted, with a >700M MAU clause that requires a separate license from Meta.

Known limitations
  • Text-only (no native vision; use Llama 3.2 Vision variants)
  • Knowledge cutoff December 2023
  • Tool-calling format is bespoke and requires the official chat template
  • Community License has a >700M MAU restriction
  • Long context recall degrades beyond ~64K on some tasks

Frequently asked questions

Start using Llama 3.3 70B today

Get started with free credits. No credit card required. Access Llama 3.3 70B and 100+ other models through a single API.