TL;DR โ Switch in Under 5 Minutes
- Both APIs are OpenAI-compatible โ change base URL and key only
- All Fireworks-hosted models mirrored: Llama 3.3, Mixtral, DeepSeek, Qwen, FireFunction
- EU-hosted endpoint, EUR billing
- Same low-latency vLLM-based serving for open models
- Plus access to closed-source frontier models (Claude, GPT-4o, Gemini) on the same key
Why Move Off Fireworks AI?
Fireworks AI is well-known for fast inference on open-source LLMs. The trade-offs are US-only hosting, USD billing, and the lack of closed-source frontier models (Claude, GPT-4o, Gemini) in the catalog. Railwail keeps the speed and adds EU hosting, EUR billing, and the closed-source models โ all behind one API key.
Step 1 โ Get a Railwail API Key
Sign up at railwail.com and generate a key. Free credits included.
Sponsored
Access 100+ AI Models with One API Key
GPT-4o, Claude, Gemini, Llama, Flux, DALL-E and more โ all through a single, OpenAI-compatible endpoint. No more juggling multiple providers.
Step 2 โ Change Base URL and Model Slug
TypeScript / JavaScript
Before (Fireworks):
import OpenAI from "openai";
const fireworks = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const res = await fireworks.chat.completions.create({
model: "accounts/fireworks/models/llama-v3p3-70b-instruct",
messages: [{ role: "user", content: "Hello" }],
});After (Railwail):import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.RAILWAIL_API_KEY,
baseURL: "https://api.railwail.com/v1",
});
const res = await client.chat.completions.create({
model: "llama-3.3-70b-instruct",
messages: [{ role: "user", content: "Hello" }],
});Python
from openai import OpenAI
client = OpenAI(
api_key=os.environ["RAILWAIL_API_KEY"],
base_url="https://api.railwail.com/v1",
)
resp = client.chat.completions.create(
model="llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)cURL
curl https://api.railwail.com/v1/chat/completions \
-H "Authorization: Bearer $RAILWAIL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-instruct",
"messages": [{"role": "user", "content": "Hello"}]
}'API Endpoint Mapping
Fireworks endpoint โ Railwail equivalent
| Fireworks | Railwail | Notes |
|---|---|---|
| POST /inference/v1/chat/completions | POST /v1/chat/completions | Identical |
| POST /inference/v1/completions | POST /v1/completions | Legacy supported |
| POST /inference/v1/embeddings | POST /v1/embeddings | Identical |
| POST /inference/v1/image_generation/{model} | POST /v1/images/generations | OpenAI-style |
| POST /inference/v1/audio/transcriptions | POST /v1/audio/transcriptions | Whisper |
| GET /inference/v1/models | GET /v1/models | 275+ models |
Model Mapping
Fireworks model โ Railwail
| Fireworks | Railwail | Notes |
|---|---|---|
| accounts/fireworks/models/llama-v3p3-70b-instruct | llama-3.3-70b-instruct | Llama 3.3 |
| accounts/fireworks/models/llama-v3p1-405b-instruct | llama-3.1-405b-instruct | Largest |
| accounts/fireworks/models/mixtral-8x7b-instruct | mixtral-8x7b-instruct | MoE |
| accounts/fireworks/models/mixtral-8x22b-instruct | mixtral-8x22b-instruct | Bigger MoE |
| accounts/fireworks/models/deepseek-v3 | deepseek-v3 | Frontier MoE |
| accounts/fireworks/models/deepseek-r1 | deepseek-r1 | Reasoning |
| accounts/fireworks/models/qwen2p5-72b-instruct | qwen2.5-72b-instruct | Alibaba |
| accounts/fireworks/models/firefunction-v2 | firefunction-v2 | Function calling |
| accounts/fireworks/models/firellava-13b | firellava-13b | Vision |
Sponsored
Test Any AI Model Instantly
Our built-in playground lets you compare models side by side. Find the perfect model for your use case in minutes, not days.
Pricing Comparison (per 1M tokens, May 2026)
Same model, Railwail in EUR
| Model | Fireworks (USD) | Railwail (EUR) | Notes |
|---|---|---|---|
| llama-3.3-70b-instruct | $0.90 | EUR 0.83 | Identical |
| llama-3.1-405b-instruct | $3.00 | EUR 2.76 | Identical |
| mixtral-8x7b-instruct | $0.50 | EUR 0.46 | Identical |
| deepseek-v3 | $0.90 | EUR 0.83 | Identical |
| deepseek-r1 | $3.00 / $8.00 | EUR 2.76 / 7.36 | Input/output |
| firefunction-v2 | $0.90 | EUR 0.83 | Identical |
Why Railwail Over Fireworks
- EU billing in EUR with VAT receipts
- Frankfurt-region hosting for low-latency EU customers
- Same OpenAI-compatible API
- Adds Claude, GPT-4o, Gemini to the catalog โ Fireworks has no closed-source
- Built-in playground for cross-model comparison
- Comparable pricing on Fireworks-style open models
FAQ
What about FireFunction and the function-calling models?
FireFunction v2 is mirrored as firefunction-v2 with identical tool/function calling behaviour. For higher-quality function calling, llama-3.3-70b-instruct now matches FireFunction v2 on most benchmarks.
Can I bring my own LoRA?
Custom LoRAs (Fireworks' multi-LoRA feature) are not currently hosted on Railwail. You can keep them on Fireworks and use Railwail for everything else.
What about Fireworks' speculative decoding speedups?
Railwail uses vLLM with speculative decoding on all Llama models โ comparable throughput to Fireworks.
How does streaming compare?
Both APIs stream OpenAI-format SSE chunks. First-token latency from EU origins is typically lower via Railwail's Frankfurt edge.
What about embeddings?
POST /v1/embeddings accepts identical request shapes. Railwail adds Voyage and Cohere embedding models alongside the open-source options.
Sponsored
Pay Only for What You Use
Transparent per-token pricing with no monthly minimums. Start with free credits and scale as you grow.
Next Steps
- Sign up at railwail.com
- Generate an API key
- Update baseURL to https://api.railwail.com/v1
- Replace long Fireworks model paths with short Railwail slugs
- Read the reference at railwail.com/docs
- Compare per-token pricing at railwail.com/pricing