Docsinference platformDedicated Servers

Dedicated Servers

Lease a GPU server for exclusive inference workloads.

On this page

Overview
When to use
Lifecycle

Overview

Dedicated servers provide a reserved GPU instance that runs your model container with no cold starts. Billing is per-second.

When to use

Production workloads with consistent traffic
Latency-sensitive applications
24/7 inference requirements
Full control over the runtime environment

Lifecycle

1Create a server deployment with a model and hardware spec
2Platform provisions a GPU instance and deploys the model
3Server is ready — send prediction requests directly
4Stop the server to end billing

Related pages

Serverless Inference Model Catalog Server Deployment Endpoints