Llama 3.3 70B
Meta's open-source 70B parameter model. Strong all-around performance with multilingual support.
Llama 3.3 70B is text & chat AI model from Meta, priced at €8.80 per 1M input tokens with a 131.1K tokens context window.
0.7
Examples
See what Llama 3.3 70B can generate
Compare and contrast REST and GraphQL APIs in terms of flexibility and performance
REST APIs use fixed endpoints that return predetermined data structures, which is simple but can lead to over-fetching (getting more data than needed) or under-fetching (requiring multiple requests). GraphQL lets clients request exactly the fields they need in a single query, offering superior flexibility. However, REST benefits from better HTTP caching, simpler error handling, and wider tooling support. GraphQL shines in complex applications with varied data needs, while REST remains ideal for straightforward CRUD operations and public APIs.
Explain how the internet works to a 5-year-old
Imagine you want to send a drawing to your friend who lives far away. You put your drawing in a magic mailbox, and the mailbox breaks it into tiny puzzle pieces. These pieces zoom through special tunnels underground and even under the ocean! When all the pieces reach your friend's magic mailbox, it puts the puzzle back together. That's basically how the internet works—your computer breaks messages into tiny pieces, sends them through wires, and the other computer puts them back together super fast.
Pricing
API Integration
Use our OpenAI-compatible API to integrate Llama 3.3 70B into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple — just pass a string
const reply = await rw.run("llama-3-3-70b", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("llama-3-3-70b", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("llama-3-3-70b", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive — Meta AI's Llama 3.3 70B
Meta AI (originally Facebook AI Research, FAIR) was founded in December 2013 by Mark Zuckerberg with Yann LeCun as its first director. The lab is now part of Meta Platforms and houses several thousand researchers across Menlo Park, New York, Paris, Montreal, Seattle and Tel Aviv. FAIR has authored landmark papers including PyTorch (2017), Detectron, fastText, RoBERTa, the original LLaMA paper (Feb 2023), Llama 2 (Jul 2023), Llama 3 (Apr 2024), Llama 3.1 (Jul 2024, including the 405B flagship), Llama 3.2 (Sep 2024, multimodal and on-device sizes), Llama 3.3 (Dec 2024) and Llama 4 (Apr 2025). Meta's open-weights strategy under the Llama Community License has made Llama by far the most widely deployed open-weight family with over 700M cumulative downloads. Yann LeCun, Joelle Pineau and Ahmad Al-Dahle lead the GenAI organisation that productises the work. Beyond Llama, Meta ships SeamlessM4T for speech, Segment Anything for vision and the Meta AI consumer assistant across WhatsApp, Instagram, Messenger and Meta.ai. Meta is also the largest user of NVIDIA H100 GPUs in industry, with reported cluster sizes above 350,000 H100-equivalents.
Visit Meta AI →Llama 3.3 70B Instruct was released by Meta on 6 December 2024 as the final 3.x release before Llama 4. It is a dense decoder-only Transformer with 70 billion parameters, 80 layers, 64 query heads, 8 KV heads (Grouped Query Attention) and a tokenizer with 128K vocabulary. The notable claim is that the post-training recipe lifts the 70B model to match or exceed Llama 3.1 405B on most benchmarks at roughly 1/6th of the inference cost. Llama 3.3 reuses the Llama 3.1 pretrained base (which was trained on approximately 15 trillion tokens of curated public web data, code, books and licensed datasets, with a December 2023 knowledge cutoff). The improvement comes from an updated post-training pipeline combining new supervised fine-tuning, rejection sampling, Direct Preference Optimisation (DPO), online reinforcement learning, and synthetic instruction data generated by Llama 3.1 405B and other models. Llama 3.3 supports 128K context, function calling, parallel tool calls, JSON output and the official Llama 3 chat template. The model is text-only (vision sits in the Llama 3.2 family) and ships under the Llama 3.3 Community License which permits commercial use except for products with more than 700 million monthly active users at launch.
- Parameters
- 70B (dense)
- Context
- 128K tokens
- 70B dense parameters with Grouped Query Attention
- Post-training lifts 70B to roughly match Llama 3.1 405B on key benchmarks
- 128K context window
- Function calling and parallel tool calls (Llama 3.1+ style)
- JSON output with the chat template tool format
- Multilingual: 8 officially supported languages including English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
- Code generation across major programming languages
- Open weights under Llama 3.3 Community License
- Massive ecosystem: vLLM, SGLang, TGI, llama.cpp, Ollama, MLX, HuggingFace
- GGUF/AWQ/GPTQ quantised variants in the community
- Best for: open-weight chat, function calling, on-prem enterprise, cost-efficient replacement for 405B.
Reuses the Llama 3.1 pretrained base (15T tokens of curated public web data, code, books and licensed data, with a December 2023 cutoff). Post-training applies SFT, rejection sampling, DPO, online RL and synthetic data from Llama 3.1 405B.
License: Llama 3.3 Community License: open weights, commercial use permitted, with a >700M MAU clause that requires a separate license from Meta.
Known limitations
- Text-only (no native vision; use Llama 3.2 Vision variants)
- Knowledge cutoff December 2023
- Tool-calling format is bespoke and requires the official chat template
- Community License has a >700M MAU restriction
- Long context recall degrades beyond ~64K on some tasks
Frequently asked questions
Related Models
View all Text & ChatClaude Opus 4
Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.
Claude Sonnet 4
Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.
DeepSeek V3.1
DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.
DeepSeek V4 Pro
DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.
Start using Llama 3.3 70B today
Get started with free credits. No credit card required. Access Llama 3.3 70B and 100+ other models through a single API.