Docsinference platformDedicated Servers

Dedicated Servers

Lease a GPU server for exclusive inference workloads.

Overview

Dedicated servers provide a reserved GPU instance that runs your model container with no cold starts. Billing is per-second.

When to use

  • Production workloads with consistent traffic
  • Latency-sensitive applications
  • 24/7 inference requirements
  • Full control over the runtime environment

Lifecycle

  1. 1Create a server deployment with a model and hardware spec
  2. 2Platform provisions a GPU instance and deploys the model
  3. 3Server is ready — send prediction requests directly
  4. 4Stop the server to end billing