TL;DR โ Switch in Under 10 Minutes
- Replace the huggingface_hub InferenceClient with the OpenAI SDK
- All HF-popular models mirrored: Llama, Mixtral, Flux, Stable Diffusion, Whisper, BGE embeddings
- OpenAI Chat Completions schema instead of HF's task-specific endpoints (text-generation, summarization, conversational, etc.)
- Synchronous responses โ no need for the HF wait_for_model retry pattern
- EU hosting, EUR billing, 275+ models on one key
Why Move Off Hugging Face Inference?
Hugging Face is the canonical model hub, and Inference Providers / Endpoints offer a way to call hosted models. The pain points: task-specific endpoints (different request shapes for text-generation, conversational, summarization, etc.), cold starts on rarely-called models, US-default hosting, and per-second compute billing that is hard to predict. Railwail mirrors the top-demand HF models behind the standardised OpenAI Chat Completions schema with always-warm endpoints and per-token pricing.
Step 1 โ Get a Railwail API Key
Sign up at railwail.com and generate a key.
Sponsored
Access 100+ AI Models with One API Key
GPT-4o, Claude, Gemini, Llama, Flux, DALL-E and more โ all through a single, OpenAI-compatible endpoint. No more juggling multiple providers.
Step 2 โ Replace InferenceClient
TypeScript / JavaScript
Before (HF):
import { HfInference } from "@huggingface/inference";
const hf = new HfInference(process.env.HF_TOKEN);
const res = await hf.chatCompletion({
model: "meta-llama/Llama-3.3-70B-Instruct",
messages: [{ role: "user", content: "Hello" }],
max_tokens: 256,
});After (Railwail):import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.RAILWAIL_API_KEY,
baseURL: "https://api.railwail.com/v1",
});
const res = await client.chat.completions.create({
model: "llama-3.3-70b-instruct",
messages: [{ role: "user", content: "Hello" }],
max_tokens: 256,
});Python โ Text Generation
from openai import OpenAI
client = OpenAI(
api_key=os.environ["RAILWAIL_API_KEY"],
base_url="https://api.railwail.com/v1",
)
resp = client.chat.completions.create(
model="llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)Python โ Image Generation
resp = client.images.generate(
model="flux-schnell",
prompt="a cyberpunk city at sunset",
size="1024x1024",
)
print(resp.data[0].url)Python โ Embeddings
resp = client.embeddings.create(
model="bge-large-en-v1.5",
input=["hello world"],
)API Endpoint Mapping
HF Inference endpoint โ Railwail equivalent
| Hugging Face | Railwail | Notes |
|---|---|---|
| POST /models/{repo}/v1/chat/completions (HF Router) | POST /v1/chat/completions | Identical |
| POST /pipeline/text-generation/{repo} | POST /v1/chat/completions | Use messages array |
| POST /pipeline/conversational/{repo} | POST /v1/chat/completions | Use messages array |
| POST /pipeline/summarization/{repo} | POST /v1/chat/completions | Prompt-based summarisation |
| POST /pipeline/feature-extraction/{repo} | POST /v1/embeddings | Embeddings |
| POST /pipeline/text-to-image/{repo} | POST /v1/images/generations | Flux, SDXL |
| POST /pipeline/automatic-speech-recognition/{repo} | POST /v1/audio/transcriptions | Whisper |
| GET /api/models | GET /v1/models | Filter to Railwail-hosted models |
Model Mapping
Hugging Face model โ Railwail
| HF repo | Railwail | Category |
|---|---|---|
| meta-llama/Llama-3.3-70B-Instruct | llama-3.3-70b-instruct | Text |
| mistralai/Mixtral-8x7B-Instruct-v0.1 | mixtral-8x7b-instruct | Text |
| Qwen/Qwen2.5-72B-Instruct | qwen2.5-72b-instruct | Text |
| deepseek-ai/DeepSeek-V3 | deepseek-v3 | Text |
| black-forest-labs/FLUX.1-schnell | flux-schnell | Image |
| stabilityai/stable-diffusion-3.5-large | stable-diffusion-3.5-large | Image |
| openai/whisper-large-v3 | whisper-large-v3 | STT |
| BAAI/bge-large-en-v1.5 | bge-large-en-v1.5 | Embedding |
| intfloat/multilingual-e5-large-instruct | multilingual-e5-large-instruct | Embedding |
| microsoft/Phi-3.5-mini-instruct | phi-3.5-mini-instruct | Text (small) |
Sponsored
Test Any AI Model Instantly
Our built-in playground lets you compare models side by side. Find the perfect model for your use case in minutes, not days.
Pricing Comparison
Same model, Railwail per-token in EUR
| Model | HF Inference (USD) | Railwail (EUR) | Notes |
|---|---|---|---|
| llama-3.3-70b-instruct | ~$0.90 per 1M (PRO) | EUR 0.83 per 1M | Per-token vs per-sec |
| mixtral-8x7b-instruct | ~$0.50 per 1M | EUR 0.46 per 1M | Identical |
| flux-schnell per image | ~$0.003 (compute) | EUR 0.0028 | Per-image |
| whisper-large-v3 per minute | ~$0.006 | EUR 0.0033 | Per-minute |
| bge-large-en-v1.5 per 1M | ~$0.10 | EUR 0.09 | Identical |
Why Railwail Over HF Inference
- EU billing in EUR with VAT receipts
- EU-hosted gateway for low EU latency
- Unified OpenAI Chat Completions schema โ no task-specific endpoints to learn
- Always-warm popular models โ no cold starts
- Per-token / per-call pricing instead of per-second compute
- Adds closed-source frontier models (Claude, GPT-4o, Gemini)
- Built-in playground at railwail.com/models
FAQ
Can I call any HF model via Railwail?
Railwail hosts the most-popular ~200 HF models. For long-tail community models, keep using HF Inference Providers for those specific calls.
Do dedicated Inference Endpoints have a Railwail equivalent?
Dedicated endpoints (private GPU instances) are not a Railwail product. Railwail is multi-tenant shared inference โ like HF Inference Providers, not HF Inference Endpoints. For dedicated capacity needs, contact enterprise@railwail.com.
What about HF tasks like translation, NER, fill-mask?
Most of these tasks are now done with prompt engineering on a chat LLM. Use llama-3.3-70b-instruct or mistral-large with appropriate system prompts.
Are HF private repos accessible?
Private / gated HF repos require HF authentication โ they are not exposed via Railwail. Public repos hosted by Railwail are accessible to anyone with a Railwail API key.
What about safetensors and FP8 quantisations?
Railwail serves quantised variants where available. The default chosen for each model is the highest-quality variant that fits cost-efficiency targets.
Sponsored
Pay Only for What You Use
Transparent per-token pricing with no monthly minimums. Start with free credits and scale as you grow.
Next Steps
- Sign up at railwail.com
- Generate an API key
- Replace huggingface_hub with the OpenAI SDK
- Switch from task-specific endpoints to chat.completions / embeddings / images.generate
- Read the reference at railwail.com/docs
- Browse all models at railwail.com/models
- Compare pricing at railwail.com/pricing