Migrate from Fireworks AI to Railwail

TL;DR — Switch in Under 5 Minutes

Both APIs are OpenAI-compatible — change base URL and key only
All Fireworks-hosted models mirrored: Llama 3.3, Mixtral, DeepSeek, Qwen, FireFunction
EU-hosted endpoint, EUR billing
Same low-latency vLLM-based serving for open models
Plus access to closed-source frontier models (Claude, GPT-4o, Gemini) on the same key

Why Move Off Fireworks AI?

Fireworks AI is well-known for fast inference on open-source LLMs. The trade-offs are US-only hosting, USD billing, and the lack of closed-source frontier models (Claude, GPT-4o, Gemini) in the catalog. Railwail keeps the speed and adds EU hosting, EUR billing, and the closed-source models — all behind one API key.

Step 1 — Get a Railwail API Key

Access 100+ AI Models with One API Key

GPT-4o, Claude, Gemini, Llama, Flux, DALL-E and more — all through a single, OpenAI-compatible endpoint. No more juggling multiple providers.

Get Started Free

Step 2 — Change Base URL and Model Slug

TypeScript / JavaScript

Before (Fireworks):
import OpenAI from "openai"; const fireworks = new OpenAI({ apiKey: process.env.FIREWORKS_API_KEY, baseURL: "https://api.fireworks.ai/inference/v1", }); const res = await fireworks.chat.completions.create({ model: "accounts/fireworks/models/llama-v3p3-70b-instruct", messages: [{ role: "user", content: "Hello" }], });After (Railwail):
import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.RAILWAIL_API_KEY, baseURL: "https://api.railwail.com/v1", }); const res = await client.chat.completions.create({ model: "llama-3.3-70b-instruct", messages: [{ role: "user", content: "Hello" }], });

Python

from openai import OpenAI client = OpenAI( api_key=os.environ["RAILWAIL_API_KEY"], base_url="https://api.railwail.com/v1", ) resp = client.chat.completions.create( model="llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Hello"}], ) print(resp.choices[0].message.content)

cURL

curl https://api.railwail.com/v1/chat/completions \ -H "Authorization: Bearer $RAILWAIL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.3-70b-instruct", "messages": [{"role": "user", "content": "Hello"}] }'

API Endpoint Mapping

Fireworks endpoint → Railwail equivalent

Fireworks	Railwail	Notes
POST /inference/v1/chat/completions	POST /v1/chat/completions	Identical
POST /inference/v1/completions	POST /v1/completions	Legacy supported
POST /inference/v1/embeddings	POST /v1/embeddings	Identical
POST /inference/v1/image_generation/{model}	POST /v1/images/generations	OpenAI-style
POST /inference/v1/audio/transcriptions	POST /v1/audio/transcriptions	Whisper
GET /inference/v1/models	GET /v1/models	275+ models

Model Mapping

Fireworks model → Railwail

Fireworks	Railwail	Notes
accounts/fireworks/models/llama-v3p3-70b-instruct	llama-3.3-70b-instruct	Llama 3.3
accounts/fireworks/models/llama-v3p1-405b-instruct	llama-3.1-405b-instruct	Largest
accounts/fireworks/models/mixtral-8x7b-instruct	mixtral-8x7b-instruct	MoE
accounts/fireworks/models/mixtral-8x22b-instruct	mixtral-8x22b-instruct	Bigger MoE
accounts/fireworks/models/deepseek-v3	deepseek-v3	Frontier MoE
accounts/fireworks/models/deepseek-r1	deepseek-r1	Reasoning
accounts/fireworks/models/qwen2p5-72b-instruct	qwen2.5-72b-instruct	Alibaba
accounts/fireworks/models/firefunction-v2	firefunction-v2	Function calling
accounts/fireworks/models/firellava-13b	firellava-13b	Vision

Test Any AI Model Instantly

Our built-in playground lets you compare models side by side. Find the perfect model for your use case in minutes, not days.

Open Playground

Pricing Comparison (per 1M tokens, May 2026)

Same model, Railwail in EUR

Model	Fireworks (USD)	Railwail (EUR)	Notes
llama-3.3-70b-instruct	$0.90	EUR 0.83	Identical
llama-3.1-405b-instruct	$3.00	EUR 2.76	Identical
mixtral-8x7b-instruct	$0.50	EUR 0.46	Identical
deepseek-v3	$0.90	EUR 0.83	Identical
deepseek-r1	$3.00 / $8.00	EUR 2.76 / 7.36	Input/output
firefunction-v2	$0.90	EUR 0.83	Identical

Why Railwail Over Fireworks

EU billing in EUR with VAT receipts
Frankfurt-region hosting for low-latency EU customers
Same OpenAI-compatible API
Adds Claude, GPT-4o, Gemini to the catalog — Fireworks has no closed-source
Built-in playground for cross-model comparison
Comparable pricing on Fireworks-style open models

FAQ

What about FireFunction and the function-calling models?

FireFunction v2 is mirrored as firefunction-v2 with identical tool/function calling behaviour. For higher-quality function calling, llama-3.3-70b-instruct now matches FireFunction v2 on most benchmarks.

Can I bring my own LoRA?

Custom LoRAs (Fireworks' multi-LoRA feature) are not currently hosted on Railwail. You can keep them on Fireworks and use Railwail for everything else.

What about Fireworks' speculative decoding speedups?

Railwail uses vLLM with speculative decoding on all Llama models — comparable throughput to Fireworks.

How does streaming compare?

Both APIs stream OpenAI-format SSE chunks. First-token latency from EU origins is typically lower via Railwail's Frankfurt edge.

What about embeddings?

POST /v1/embeddings accepts identical request shapes. Railwail adds Voyage and Cohere embedding models alongside the open-source options.

Pay Only for What You Use

Transparent per-token pricing with no monthly minimums. Start with free credits and scale as you grow.

View Pricing

Next Steps

Sign up at railwail.com
Generate an API key
Update baseURL to https://api.railwail.com/v1
Replace long Fireworks model paths with short Railwail slugs
Read the reference at railwail.com/docs
Compare per-token pricing at railwail.com/pricing