Which LLM is the cheapest on Railwail?

Models like Gemini Flash, GPT-5 Mini, Claude Haiku, and DeepSeek V3 sit at the low end — typically a few cents per million input tokens. The exact ranking changes with every provider price update, so sort the model grid above by input cost to see the current cheapest option.

Which model has the longest context window?

Gemini 2.5 Pro currently leads with 2M-token windows, followed by Claude 4.6 (1M tokens) and GPT-5 (around 400K tokens). For most workloads, 128K is more than enough; reach for the long-context tier only when you genuinely need to read a whole codebase or research paper in one prompt.

Open-source vs proprietary — which should I pick?

Open-weights models (Llama 3, Qwen, DeepSeek, Mistral, Mixtral) are catching up fast and win on price-per-token and data sovereignty. Proprietary flagships still lead on reasoning, multilingual coverage, and tool-use reliability. If you're cost-sensitive or need self-hosting, start open. If you're shipping to end-users, start proprietary and optimize down.

GPT-5 vs Claude 4.6 — which is better?

GPT-5 leads on raw benchmark scores, math, and code generation; Claude 4.6 leads on long-form writing, instruction-following nuance, and refusal-rate calibration. Both are within 5% of each other on most tasks. Run them side-by-side on your real prompts at /compare/gpt-5-vs-claude-4-6 — the differences are workload-specific.

How do I switch models in my code?

Railwail is OpenAI-compatible: you only change the `model` parameter. Same endpoint, same SDK, same request body. Try a new model in production by routing 10% of traffic to it for a week and comparing quality + latency + cost in the dashboard.

Is JSON mode supported?

Yes — pass `response_format: { type: 'json_object' }` on any model that supports structured output. For stricter typing, use `json_schema` with a Zod or Pydantic definition. Roughly 80% of the text models on Railwail support one or both modes; the model detail page lists what each one accepts.

Does Railwail support streaming?

Every text model on Railwail supports server-sent-event streaming via the standard `stream: true` parameter. First-token latency is typically 200-800ms depending on the model and region. Cancel a stream by closing the connection — you only pay for the tokens that were actually generated.

Is the API GDPR-compliant?

Yes. Railwail processes traffic in EU data centers, signs a DPA with every paying customer, and never trains on your prompts. Individual providers have their own residency guarantees — the model detail page lists per-provider data handling so you can pick one that matches your compliance posture.

Text & Chat Models

Powerful language models for conversation, analysis, and content generation

Text and chat models for production AI workloads

Large language models are the workhorse of modern AI: chatbots, agents, summarisers, classifiers, translators. The category is the most crowded on Railwail — OpenAI, Anthropic, Google, Mistral, Meta, DeepSeek, xAI, and dozens of open-weights labs all compete here.

All Text & Chat Image Video Audio Text-to-Speech Speech-to-Text Embeddings Code Multimodal Robotics / VLA

49 models available

Bio_ClinicalBERT

Text & Chathuggingface

Popular

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

medicalresearchnlp

Biomedical NER (all entities)

Text & Chathuggingface

Popular

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

medicalresearchnlp

Claude Opus 4

Text & ChatAnthropic

NewPopular

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free5.0s

flagshipreasoningagentic

Claude Opus 4.8

Text & ChatAnthropic

NewPopular

Anthropic's most capable Opus-tier model. State of the art on long-horizon agentic work, coding and knowledge tasks, with a 1M-token context window at standard pricing.

Text & Chat Models

Text and chat models for production AI workloads

Bio_ClinicalBERT

Biomedical NER (all entities)

Claude Opus 4

Claude Opus 4.8

Claude Sonnet 4

DeepSeek V3.1

DeepSeek V4 Pro

Gemini 2.0 Flash

Gemini 2.5 Pro

GPT-4.1

GPT-4o

GPT-5.5

Grok 4

Grok 4.20 Reasoning

Kimi K2 (Moonshot)

Medical NER (DeBERTa)

MiniMax-01

o3-mini

Perplexity Sonar Pro

AI21 Jamba 1.5 Large

AI21 Jamba 1.5 Mini

BioBERT Disease NER (NCBI)

Claude Haiku 3.5

Clinical Assertion and Negation BERT

Clinical NER (problem, test, treatment)

Cohere Aya 23 35B

Cohere Command Light (legacy)

Cohere Command R (08-2024)

Cohere Command R+ (08-2024)

DeepSeek R1

DeepSeek V3

DeepSeek V4 Flash

GPT-4o Mini

GPT-5 Mini

GPT-5.1

Grok 3

Grok 4.20 (Non-Reasoning)

Grok 4.20 Multi-Agent

Llama 3.3 70B

Mistral Large

OpenAI o3

OpenAI o4-mini

Perplexity Sonar

Perplexity Sonar Reasoning

Qwen 2.5 72B

Qwen 2.5-Max

SeamlessM4T v2 Large (Text)

Snowflake Arctic Instruct

Yi Large

Top text & chat models picks

Popular use cases

Related comparisons

Gemini 2.5 Pro vs GPT-5

DeepSeek V3 vs GPT-5 Mini

Frequently asked questions