Migrate from Hugging Face Inference to Railwail
Migration Guides

Migrate from Hugging Face Inference to Railwail

Switch from Hugging Face Inference Providers and Inference Endpoints to Railwail. Same Llama, Mixtral, Flux models behind one OpenAI-compatible API. EU hosting, EUR billing.

Railwail Teamยท Developer Relations8 min readMay 16, 2026

TL;DR โ€” Switch in Under 10 Minutes

  • Replace the huggingface_hub InferenceClient with the OpenAI SDK
  • All HF-popular models mirrored: Llama, Mixtral, Flux, Stable Diffusion, Whisper, BGE embeddings
  • OpenAI Chat Completions schema instead of HF's task-specific endpoints (text-generation, summarization, conversational, etc.)
  • Synchronous responses โ€” no need for the HF wait_for_model retry pattern
  • EU hosting, EUR billing, 275+ models on one key

Why Move Off Hugging Face Inference?

Hugging Face is the canonical model hub, and Inference Providers / Endpoints offer a way to call hosted models. The pain points: task-specific endpoints (different request shapes for text-generation, conversational, summarization, etc.), cold starts on rarely-called models, US-default hosting, and per-second compute billing that is hard to predict. Railwail mirrors the top-demand HF models behind the standardised OpenAI Chat Completions schema with always-warm endpoints and per-token pricing.

Step 1 โ€” Get a Railwail API Key

Sign up at railwail.com and generate a key.

Sponsored

Access 100+ AI Models with One API Key

GPT-4o, Claude, Gemini, Llama, Flux, DALL-E and more โ€” all through a single, OpenAI-compatible endpoint. No more juggling multiple providers.

Step 2 โ€” Replace InferenceClient

TypeScript / JavaScript

Before (HF):

import { HfInference } from "@huggingface/inference";

const hf = new HfInference(process.env.HF_TOKEN);

const res = await hf.chatCompletion({
  model: "meta-llama/Llama-3.3-70B-Instruct",
  messages: [{ role: "user", content: "Hello" }],
  max_tokens: 256,
});
After (Railwail):
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.RAILWAIL_API_KEY,
  baseURL: "https://api.railwail.com/v1",
});

const res = await client.chat.completions.create({
  model: "llama-3.3-70b-instruct",
  messages: [{ role: "user", content: "Hello" }],
  max_tokens: 256,
});

Python โ€” Text Generation

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["RAILWAIL_API_KEY"],
    base_url="https://api.railwail.com/v1",
)

resp = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

Python โ€” Image Generation

resp = client.images.generate(
    model="flux-schnell",
    prompt="a cyberpunk city at sunset",
    size="1024x1024",
)
print(resp.data[0].url)

Python โ€” Embeddings

resp = client.embeddings.create(
    model="bge-large-en-v1.5",
    input=["hello world"],
)

API Endpoint Mapping

HF Inference endpoint โ†’ Railwail equivalent

Hugging FaceRailwailNotes
POST /models/{repo}/v1/chat/completions (HF Router)POST /v1/chat/completionsIdentical
POST /pipeline/text-generation/{repo}POST /v1/chat/completionsUse messages array
POST /pipeline/conversational/{repo}POST /v1/chat/completionsUse messages array
POST /pipeline/summarization/{repo}POST /v1/chat/completionsPrompt-based summarisation
POST /pipeline/feature-extraction/{repo}POST /v1/embeddingsEmbeddings
POST /pipeline/text-to-image/{repo}POST /v1/images/generationsFlux, SDXL
POST /pipeline/automatic-speech-recognition/{repo}POST /v1/audio/transcriptionsWhisper
GET /api/modelsGET /v1/modelsFilter to Railwail-hosted models

Model Mapping

Hugging Face model โ†’ Railwail

HF repoRailwailCategory
meta-llama/Llama-3.3-70B-Instructllama-3.3-70b-instructText
mistralai/Mixtral-8x7B-Instruct-v0.1mixtral-8x7b-instructText
Qwen/Qwen2.5-72B-Instructqwen2.5-72b-instructText
deepseek-ai/DeepSeek-V3deepseek-v3Text
black-forest-labs/FLUX.1-schnellflux-schnellImage
stabilityai/stable-diffusion-3.5-largestable-diffusion-3.5-largeImage
openai/whisper-large-v3whisper-large-v3STT
BAAI/bge-large-en-v1.5bge-large-en-v1.5Embedding
intfloat/multilingual-e5-large-instructmultilingual-e5-large-instructEmbedding
microsoft/Phi-3.5-mini-instructphi-3.5-mini-instructText (small)

Sponsored

Test Any AI Model Instantly

Our built-in playground lets you compare models side by side. Find the perfect model for your use case in minutes, not days.

Pricing Comparison

Same model, Railwail per-token in EUR

ModelHF Inference (USD)Railwail (EUR)Notes
llama-3.3-70b-instruct~$0.90 per 1M (PRO)EUR 0.83 per 1MPer-token vs per-sec
mixtral-8x7b-instruct~$0.50 per 1MEUR 0.46 per 1MIdentical
flux-schnell per image~$0.003 (compute)EUR 0.0028Per-image
whisper-large-v3 per minute~$0.006EUR 0.0033Per-minute
bge-large-en-v1.5 per 1M~$0.10EUR 0.09Identical

Why Railwail Over HF Inference

  • EU billing in EUR with VAT receipts
  • EU-hosted gateway for low EU latency
  • Unified OpenAI Chat Completions schema โ€” no task-specific endpoints to learn
  • Always-warm popular models โ€” no cold starts
  • Per-token / per-call pricing instead of per-second compute
  • Adds closed-source frontier models (Claude, GPT-4o, Gemini)
  • Built-in playground at railwail.com/models

FAQ

Can I call any HF model via Railwail?

Railwail hosts the most-popular ~200 HF models. For long-tail community models, keep using HF Inference Providers for those specific calls.

Do dedicated Inference Endpoints have a Railwail equivalent?

Dedicated endpoints (private GPU instances) are not a Railwail product. Railwail is multi-tenant shared inference โ€” like HF Inference Providers, not HF Inference Endpoints. For dedicated capacity needs, contact enterprise@railwail.com.

What about HF tasks like translation, NER, fill-mask?

Most of these tasks are now done with prompt engineering on a chat LLM. Use llama-3.3-70b-instruct or mistral-large with appropriate system prompts.

Are HF private repos accessible?

Private / gated HF repos require HF authentication โ€” they are not exposed via Railwail. Public repos hosted by Railwail are accessible to anyone with a Railwail API key.

What about safetensors and FP8 quantisations?

Railwail serves quantised variants where available. The default chosen for each model is the highest-quality variant that fits cost-efficiency targets.

Sponsored

Pay Only for What You Use

Transparent per-token pricing with no monthly minimums. Start with free credits and scale as you grow.

Next Steps

  • Sign up at railwail.com
  • Generate an API key
  • Replace huggingface_hub with the OpenAI SDK
  • Switch from task-specific endpoints to chat.completions / embeddings / images.generate
  • Read the reference at railwail.com/docs
  • Browse all models at railwail.com/models
  • Compare pricing at railwail.com/pricing

Railwail Team

Developer Relations

The Railwail team writes integration guides for developers migrating from single-provider AI APIs to a unified multi-model platform.

Tags:
Hugging Face
Migration
Inference Providers
Inference Endpoints
API