TL;DR โ Switch in Under 5 Minutes
- Both APIs are OpenAI-compatible โ change baseURL and key only
- Llama 3.3, Mixtral, Gemma 2, Whisper available on Railwail
- EU-hosted endpoint, EUR billing
- Plus 270+ other models including Claude, GPT-4o, Gemini
- When you need Groq-level throughput, Railwail routes to LPU providers transparently
Why Move Off Groq Cloud?
Groq's LPU hardware delivers exceptional throughput on open-source LLMs. The constraint is catalog size โ Groq Cloud only hosts a small selection of open models, has no closed-source frontier models, US-only hosting, and USD billing. Railwail gives you the Groq-style throughput for open models plus everything else.
Step 1 โ Get a Railwail API Key
Sign up at railwail.com and generate a key.
Sponsored
Access 100+ AI Models with One API Key
GPT-4o, Claude, Gemini, Llama, Flux, DALL-E and more โ all through a single, OpenAI-compatible endpoint. No more juggling multiple providers.
Step 2 โ Change Base URL
TypeScript / JavaScript
Before (Groq):
import OpenAI from "openai";
const groq = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: "https://api.groq.com/openai/v1",
});
const res = await groq.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Hello" }],
});After (Railwail):import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.RAILWAIL_API_KEY,
baseURL: "https://api.railwail.com/v1",
});
const res = await client.chat.completions.create({
model: "llama-3.3-70b-instruct",
messages: [{ role: "user", content: "Hello" }],
});Python
from openai import OpenAI
client = OpenAI(
api_key=os.environ["RAILWAIL_API_KEY"],
base_url="https://api.railwail.com/v1",
)
resp = client.chat.completions.create(
model="llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)cURL
curl https://api.railwail.com/v1/chat/completions \
-H "Authorization: Bearer $RAILWAIL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"messages": [{"role": "user", "content": "Hello"}]
}'API Endpoint Mapping
Groq endpoint โ Railwail equivalent
| Groq Cloud | Railwail | Notes |
|---|---|---|
| POST /openai/v1/chat/completions | POST /v1/chat/completions | Identical |
| POST /openai/v1/audio/transcriptions | POST /v1/audio/transcriptions | Whisper-large-v3 |
| POST /openai/v1/audio/translations | POST /v1/audio/translations | Whisper translate |
| GET /openai/v1/models | GET /v1/models | 275+ models |
Model Mapping
Groq model โ Railwail
| Groq | Railwail | Notes |
|---|---|---|
| llama-3.3-70b-versatile | llama-3.3-70b-instruct | Llama 3.3 |
| llama-3.1-70b-versatile | llama-3.1-70b-instruct | Llama 3.1 |
| llama-3.1-8b-instant | llama-3.1-8b-instruct | Small, fast |
| mixtral-8x7b-32768 | mixtral-8x7b-instruct | MoE |
| gemma2-9b-it | gemma-2-9b-it | Google open |
| whisper-large-v3 | whisper-large-v3 | STT |
| whisper-large-v3-turbo | whisper-large-v3-turbo | Faster STT |
| llama-guard-3-8b | llama-guard-3-8b | Safety classifier |
Sponsored
Test Any AI Model Instantly
Our built-in playground lets you compare models side by side. Find the perfect model for your use case in minutes, not days.
Pricing Comparison (per 1M tokens, May 2026)
Same model, Railwail in EUR
| Model | Groq (USD) | Railwail (EUR) | Notes |
|---|---|---|---|
| llama-3.3-70b-instruct | $0.59 / $0.79 | EUR 0.54 / 0.73 | Input/output |
| llama-3.1-8b-instruct | $0.05 / $0.08 | EUR 0.046 / 0.074 | Identical |
| mixtral-8x7b-instruct | $0.24 / $0.24 | EUR 0.22 / 0.22 | Identical |
| gemma-2-9b-it | $0.20 / $0.20 | EUR 0.18 / 0.18 | Identical |
| whisper-large-v3 per minute audio | $0.0036 | EUR 0.0033 | Identical |
Why Railwail Over Groq Cloud
- EU billing in EUR with VAT receipts
- EU hosting for GDPR compliance
- Same OpenAI-compatible API
- Adds Claude, GPT-4o, Gemini, Flux โ Groq has no closed-source models
- Built-in playground for cross-model A/B testing
- Comparable per-token pricing
FAQ
Do I get Groq's LPU speed on Railwail?
For open-source models, Railwail routes to LPU and high-throughput vLLM providers transparently. Token-per-second throughput is typically within 20% of Groq's direct LPU endpoints, while first-token latency to EU customers is usually faster because of the closer edge.
What about Groq's whisper-large-v3-turbo?
Available as whisper-large-v3-turbo on Railwail with the same speed-up vs the standard model.
Are tool calls supported?
Yes. Tool/function calling on Llama 3.3 70B works through the standard OpenAI tools schema.
What about the JSON mode?
response_format: { type: 'json_object' } and response_format with json_schema are both supported.
Will my Groq prompts produce identical output?
Same weights, same temperature โ same output distribution. Minor token-by-token differences may occur due to different sampling kernels.
Sponsored
Pay Only for What You Use
Transparent per-token pricing with no monthly minimums. Start with free credits and scale as you grow.
Next Steps
- Sign up at railwail.com
- Generate an API key
- Update baseURL and model slug
- Read the reference at railwail.com/docs
- Browse models at railwail.com/models
- Compare pricing at railwail.com/pricing