Migrate from Hugging Face Inference to Railwail

TL;DR — Switch in Under 10 Minutes

Replace the huggingface_hub InferenceClient with the OpenAI SDK
All HF-popular models mirrored: Llama, Mixtral, Flux, Stable Diffusion, Whisper, BGE embeddings
OpenAI Chat Completions schema instead of HF's task-specific endpoints (text-generation, summarization, conversational, etc.)
Synchronous responses — no need for the HF wait_for_model retry pattern
EU hosting, EUR billing, 275+ models on one key

Why Move Off Hugging Face Inference?

Hugging Face is the canonical model hub, and Inference Providers / Endpoints offer a way to call hosted models. The pain points: task-specific endpoints (different request shapes for text-generation, conversational, summarization, etc.), cold starts on rarely-called models, US-default hosting, and per-second compute billing that is hard to predict. Railwail mirrors the top-demand HF models behind the standardised OpenAI Chat Completions schema with always-warm endpoints and per-token pricing.

Step 1 — Get a Railwail API Key

Access 100+ AI Models with One API Key

GPT-4o, Claude, Gemini, Llama, Flux, DALL-E and more — all through a single, OpenAI-compatible endpoint. No more juggling multiple providers.

Get Started Free

Step 2 — Replace InferenceClient

TypeScript / JavaScript

Before (HF):
import { HfInference } from "@huggingface/inference"; const hf = new HfInference(process.env.HF_TOKEN); const res = await hf.chatCompletion({ model: "meta-llama/Llama-3.3-70B-Instruct", messages: [{ role: "user", content: "Hello" }], max_tokens: 256, });After (Railwail):
import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.RAILWAIL_API_KEY, baseURL: "https://api.railwail.com/v1", }); const res = await client.chat.completions.create({ model: "llama-3.3-70b-instruct", messages: [{ role: "user", content: "Hello" }], max_tokens: 256, });

Python — Text Generation

from openai import OpenAI client = OpenAI( api_key=os.environ["RAILWAIL_API_KEY"], base_url="https://api.railwail.com/v1", ) resp = client.chat.completions.create( model="llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Hello"}], ) print(resp.choices[0].message.content)

Python — Image Generation

resp = client.images.generate( model="flux-schnell", prompt="a cyberpunk city at sunset", size="1024x1024", ) print(resp.data[0].url)

Python — Embeddings

resp = client.embeddings.create( model="bge-large-en-v1.5", input=["hello world"], )

API Endpoint Mapping

HF Inference endpoint → Railwail equivalent

Hugging Face	Railwail	Notes
POST /models/{repo}/v1/chat/completions (HF Router)	POST /v1/chat/completions	Identical
POST /pipeline/text-generation/{repo}	POST /v1/chat/completions	Use messages array
POST /pipeline/conversational/{repo}	POST /v1/chat/completions	Use messages array
POST /pipeline/summarization/{repo}	POST /v1/chat/completions	Prompt-based summarisation
POST /pipeline/feature-extraction/{repo}	POST /v1/embeddings	Embeddings
POST /pipeline/text-to-image/{repo}	POST /v1/images/generations	Flux, SDXL
POST /pipeline/automatic-speech-recognition/{repo}	POST /v1/audio/transcriptions	Whisper
GET /api/models	GET /v1/models	Filter to Railwail-hosted models

Model Mapping

Hugging Face model → Railwail

HF repo	Railwail	Category
meta-llama/Llama-3.3-70B-Instruct	llama-3.3-70b-instruct	Text
mistralai/Mixtral-8x7B-Instruct-v0.1	mixtral-8x7b-instruct	Text
Qwen/Qwen2.5-72B-Instruct	qwen2.5-72b-instruct	Text
deepseek-ai/DeepSeek-V3	deepseek-v3	Text
black-forest-labs/FLUX.1-schnell	flux-schnell	Image
stabilityai/stable-diffusion-3.5-large	stable-diffusion-3.5-large	Image
openai/whisper-large-v3	whisper-large-v3	STT
BAAI/bge-large-en-v1.5	bge-large-en-v1.5	Embedding
intfloat/multilingual-e5-large-instruct	multilingual-e5-large-instruct	Embedding
microsoft/Phi-3.5-mini-instruct	phi-3.5-mini-instruct	Text (small)

Test Any AI Model Instantly

Our built-in playground lets you compare models side by side. Find the perfect model for your use case in minutes, not days.

Open Playground

Pricing Comparison

Same model, Railwail per-token in EUR

Model	HF Inference (USD)	Railwail (EUR)	Notes
llama-3.3-70b-instruct	~$0.90 per 1M (PRO)	EUR 0.83 per 1M	Per-token vs per-sec
mixtral-8x7b-instruct	~$0.50 per 1M	EUR 0.46 per 1M	Identical
flux-schnell per image	~$0.003 (compute)	EUR 0.0028	Per-image
whisper-large-v3 per minute	~$0.006	EUR 0.0033	Per-minute
bge-large-en-v1.5 per 1M	~$0.10	EUR 0.09	Identical

Why Railwail Over HF Inference

EU billing in EUR with VAT receipts
EU-hosted gateway for low EU latency
Unified OpenAI Chat Completions schema — no task-specific endpoints to learn
Always-warm popular models — no cold starts
Per-token / per-call pricing instead of per-second compute
Adds closed-source frontier models (Claude, GPT-4o, Gemini)
Built-in playground at railwail.com/models

FAQ

Can I call any HF model via Railwail?

Railwail hosts the most-popular ~200 HF models. For long-tail community models, keep using HF Inference Providers for those specific calls.

Do dedicated Inference Endpoints have a Railwail equivalent?

Dedicated endpoints (private GPU instances) are not a Railwail product. Railwail is multi-tenant shared inference — like HF Inference Providers, not HF Inference Endpoints. For dedicated capacity needs, contact [email protected].

What about HF tasks like translation, NER, fill-mask?

Most of these tasks are now done with prompt engineering on a chat LLM. Use llama-3.3-70b-instruct or mistral-large with appropriate system prompts.

Are HF private repos accessible?

Private / gated HF repos require HF authentication — they are not exposed via Railwail. Public repos hosted by Railwail are accessible to anyone with a Railwail API key.

What about safetensors and FP8 quantisations?

Railwail serves quantised variants where available. The default chosen for each model is the highest-quality variant that fits cost-efficiency targets.

Pay Only for What You Use

Transparent per-token pricing with no monthly minimums. Start with free credits and scale as you grow.

View Pricing

Next Steps

Sign up at railwail.com
Generate an API key
Replace huggingface_hub with the OpenAI SDK
Switch from task-specific endpoints to chat.completions / embeddings / images.generate
Read the reference at railwail.com/docs
Browse all models at railwail.com/models
Compare pricing at railwail.com/pricing