Simple, transparent pricing

Pick the plan that fits your scale. All plans include in2peta gateway access for coding agents.

Free

Get started with coding agents

$0/month

Get started free

25 RPM · 1M TPM

1 API key

Limited access and requests have least priority

Community support

Go

For serious developers

$5/month

≈ ₹500 /month

Upgrade to Go

100 RPM · 5M TPM

5 API keys

2x claude code usage

⚡ Smart routing across models

Priority email support

Pro

For teams and power users

$15/month

≈ ₹1500 /month

Upgrade to Pro

200 RPM · 15M TPM

Unlimited API keys

3x Go plan usage

🚀 Priority queue

Priority customer support

How we compare

IN2PETA vs other GPU and AI generation platforms

Feature	IN2PETAYou are here	Kling AI	Runway	fal.ai	WaveSpeed.ai
Serverless GPU inference
Dedicated server mode
Unlimited generations (server mode)		Credit cap	Credit cap	Pay-per-use	Pay-per-use
Bring Your Own Key (BYOK)
Pay-per-second billing		Subscription	Subscription
No forced subscription	Pay as you go	Monthly plans	Monthly plans
Video editor (coming soon)
No data stored
Priority support	All paid users	Enterprise only	Enterprise only	Enterprise only	Enterprise only
Monetisation with content	50/50 split
REST API + SDKs (Python & TS)		Limited API	Limited API
No Chinese servers	🇮🇳 India

Two ways to run

Pick the model that fits your workload

Serverless Inference

Pay per prediction · Scales to zero

Send a request, get a result. No servers to manage, no idle costs. Your workload spins up in milliseconds and shuts down when done. You're billed only for the active GPU seconds consumed by each prediction.

Zero idle cost — pay only when compute is active

Automatic scaling from zero to any load

Per-second billing, no minimum commitments

Best for: sporadic workloads, prototyping, APIs

Models called via REST API or Python/TS SDK

Example cost

Running SDXL at ~2 sec/image = ~0.02 credits per image

Dedicated Server

Per-hour billing · Full GPU control

Lease a dedicated GPU machine for sustained, high-throughput workloads. The server is yours for the duration — run unlimited inferences, bring your own model, and get full control over the runtime environment. Billed per active hour.

Unlimited generations for the duration of the lease

Full GPU access — no cold starts, no queue

Bring your own model code and dependencies

Best for: production pipelines, batch jobs, fine-tuning

Per-hour billing — stop anytime from your dashboard

Example cost

RTX 4090 tier = credits/hour — rate displayed at lease time

Which should I use?

Use Serverless when

You have variable or unpredictable traffic

You're running a production API with infrequent calls

You want zero infrastructure management

Cost per call matters more than throughput

Use Dedicated when

You need consistent low-latency responses

You're running batch jobs or fine-tuning

Your throughput is high enough to fill a GPU

You need full control over the runtime environment