Dedicated Servers
Lease a GPU server for exclusive inference workloads.
On this page
Overview
Dedicated servers provide a reserved GPU instance that runs your model container with no cold starts. Billing is per-second.
When to use
- Production workloads with consistent traffic
- Latency-sensitive applications
- 24/7 inference requirements
- Full control over the runtime environment
Lifecycle
- 1Create a server deployment with a model and hardware spec
- 2Platform provisions a GPU instance and deploys the model
- 3Server is ready — send prediction requests directly
- 4Stop the server to end billing