Simple, transparent pricing

Pick the plan that fits your scale. All plans include in2peta gateway access for coding agents.

Free

Get started with coding agents

$0/month
Get started free
25 RPM · 1M TPM
1 API key
Limited access and requests have least priority
Community support
Most popular

Go

For serious developers

$5/month

500 /month

Upgrade to Go
100 RPM · 5M TPM
5 API keys
2x claude code usage
⚡ Smart routing across models
Priority email support

Pro

For teams and power users

$15/month

1500 /month

Upgrade to Pro
200 RPM · 15M TPM
Unlimited API keys
3x Go plan usage
🚀 Priority queue
Priority customer support

How we compare

IN2PETA vs other GPU and AI generation platforms

Feature
IN2PETAYou are here
Kling AIRunwayfal.aiWaveSpeed.ai
Serverless GPU inference
Dedicated server mode
Unlimited generations (server mode)
Credit capCredit capPay-per-usePay-per-use
Bring Your Own Key (BYOK)
Pay-per-second billing
SubscriptionSubscription
No forced subscriptionPay as you goMonthly plansMonthly plans
Video editor (coming soon)
No data stored
Priority supportAll paid usersEnterprise onlyEnterprise onlyEnterprise onlyEnterprise only
Monetisation with content50/50 split
REST API + SDKs (Python & TS)
Limited APILimited API
No Chinese servers🇮🇳 India

Two ways to run

Pick the model that fits your workload

Serverless Inference

Pay per prediction · Scales to zero

Send a request, get a result. No servers to manage, no idle costs. Your workload spins up in milliseconds and shuts down when done. You're billed only for the active GPU seconds consumed by each prediction.

Zero idle cost — pay only when compute is active
Automatic scaling from zero to any load
Per-second billing, no minimum commitments
Best for: sporadic workloads, prototyping, APIs
Models called via REST API or Python/TS SDK

Example cost

Running SDXL at ~2 sec/image = ~0.02 credits per image

Dedicated Server

Per-hour billing · Full GPU control

Lease a dedicated GPU machine for sustained, high-throughput workloads. The server is yours for the duration — run unlimited inferences, bring your own model, and get full control over the runtime environment. Billed per active hour.

Unlimited generations for the duration of the lease
Full GPU access — no cold starts, no queue
Bring your own model code and dependencies
Best for: production pipelines, batch jobs, fine-tuning
Per-hour billing — stop anytime from your dashboard

Example cost

RTX 4090 tier = credits/hour — rate displayed at lease time

Which should I use?

Use Serverless when

You have variable or unpredictable traffic
You're running a production API with infrequent calls
You want zero infrastructure management
Cost per call matters more than throughput

Use Dedicated when

You need consistent low-latency responses
You're running batch jobs or fine-tuning
Your throughput is high enough to fill a GPU
You need full control over the runtime environment