Simple, transparent pricing

Pick the plan that fits your scale. All plans include LiteLLM gateway access for coding agents.

Free

Get started with coding agents

$0/month
Get started free
LiteLLM gateway access
500 requests/day
50 RPM · 200K TPM
1 API key
Community support
Custom models
Priority support
Unlimited API keys
Most popular

Go

For serious developers

$29/month

2400 /month

Upgrade to Go
LiteLLM gateway access
5,000 requests/day
200 RPM · 500K TPM
5 API keys
Priority email support
Custom models
Priority support
Unlimited API keys

Pro

For teams and power users

$99/month

8200 /month

Upgrade to Pro
LiteLLM gateway access
50,000 requests/day
1,000 RPM · 2M TPM
Unlimited API keys
Dedicated support
Custom models
Priority support

How we compare

IN2PETA vs other GPU and AI generation platforms

Feature
IN2PETAYou are here
Kling AIRunwayfal.aiWaveSpeed.ai
Serverless GPU inference
Dedicated server mode
Unlimited generations (server mode)
Credit capCredit capPay-per-usePay-per-use
Bring Your Own Key (BYOK)
Pay-per-second billing
SubscriptionSubscription
No forced subscriptionPay as you goMonthly plansMonthly plans
Video editor (coming soon)
No data stored
Priority supportAll paid usersEnterprise onlyEnterprise onlyEnterprise onlyEnterprise only
Monetisation with content50/50 split
REST API + SDKs (Python & TS)
Limited APILimited API
No Chinese servers🇮🇳 India

Two ways to run

Pick the model that fits your workload

Serverless Inference

Pay per prediction · Scales to zero

Send a request, get a result. No servers to manage, no idle costs. Your workload spins up in milliseconds and shuts down when done. You're billed only for the active GPU seconds consumed by each prediction.

Zero idle cost — pay only when compute is active
Automatic scaling from zero to any load
Per-second billing, no minimum commitments
Best for: sporadic workloads, prototyping, APIs
Models called via REST API or Python/TS SDK

Example cost

Running SDXL at ~2 sec/image = ~0.02 credits per image

Dedicated Server

Per-hour billing · Full GPU control

Lease a dedicated GPU machine for sustained, high-throughput workloads. The server is yours for the duration — run unlimited inferences, bring your own model, and get full control over the runtime environment. Billed per active hour.

Unlimited generations for the duration of the lease
Full GPU access — no cold starts, no queue
Bring your own model code and dependencies
Best for: production pipelines, batch jobs, fine-tuning
Per-hour billing — stop anytime from your dashboard

Example cost

RTX 4090 tier = credits/hour — rate displayed at lease time

Which should I use?

Use Serverless when

You have variable or unpredictable traffic
You're running a production API with infrequent calls
You want zero infrastructure management
Cost per call matters more than throughput

Use Dedicated when

You need consistent low-latency responses
You're running batch jobs or fine-tuning
Your throughput is high enough to fill a GPU
You need full control over the runtime environment