One API Key for Claude, GPT, Gemini, Grok, DeepSeek
Stop juggling 5 AI vendor accounts. One sk-aic-* key, one base URL, every major model — Claude, GPT, Gemini, Grok, DeepSeek. Setup in 2 minutes.
The Problem Nobody Warns You About
You start a side project. You pick Claude because the writing is great. Two weeks later you want to try GPT-4o for vision. Then a friend tells you Gemini Flash is dirt cheap and you should benchmark. Then DeepSeek ships v3 and the timeline goes nuts about how good the coding is.
Now you have:
Five billing pages. Five sets of usage limits. Five separate "oops you hit your monthly cap" emails. Five different SDK shapes to learn.
For a solo founder this is death by a thousand papercuts. And the worst part? You're paying full sticker price at every one of them.
This post is the pillar explainer for what we built to fix this: one key, one base URL, every major model. 70-80% off.
The Pitch in One Paragraph
aiapi.cheap gives you a single API key (sk-aic-*). You point your existing Anthropic SDK or OpenAI SDK at our proxy, then you can call Claude, GPT, Gemini, Grok, or DeepSeek by changing one field — the model name. Same code, same SDK, every vendor. Pay-per-use, crypto top-ups, no subscription.
That's the whole product. The rest of this post explains why each piece matters and how to set it up.
Why One Key Beats Five
1. Less Billing Mental Overhead
You top up one balance. You watch one usage chart. When the bill is high, you know exactly where to look. No more "wait, did Anthropic charge me twice or did Google forget to invoice me?"
2. Switching Models Is Free
Want to A/B Claude Sonnet vs GPT-4o on the same prompt? Today, that's two SDK installs, two clients, two credit cards. With aiapi.cheap, it's model: "claude-sonnet-4-6" vs model: "gpt-4o" — same client, same key, both responses cost 70-80% less than retail.
3. Future-Proofing
The hottest model six months from now is not the hottest model today. Three years ago people were stuck on GPT-3.5. Then Claude 3 Opus. Then Sonnet 3.5 ate everything. Then GPT-4o. Then DeepSeek V3.2 came out of nowhere and made everyone re-evaluate. If you're locked into one vendor, you eat the migration cost every time the leaderboard shifts. With one universal key, you swap a string.
4. Real Savings
At 70-80% off retail across all five vendors, the math gets serious fast. A vibe coder on Claude Sonnet 4 hours a day might pay $80-120/month at official rates. With us, $15-25/month. Multiply that across 5 vendors and the savings compound when you start mixing models for different tasks.
Setup: Anthropic SDK Format
If your code already uses @anthropic-ai/sdk or the Python anthropic package, here's the change:
export ANTHROPIC_API_KEY="sk-aic-your-key-here"
export ANTHROPIC_BASE_URL="https://aiapi.cheap/api/proxy"Then your existing code just works:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [
{ role: "user", content: "Write a haiku about cheap APIs." }
]
});
console.log(message.content);This hits our /v1/messages endpoint, which mirrors the Anthropic format 1:1. Streaming, tool use, vision, prompt caching — all the same shape as the official SDK.
For the canonical request format reference, the Anthropic API getting-started docs describe what we mirror.
Setup: OpenAI SDK Format (Universal)
This is the magic trick. The OpenAI Chat Completions format has become the industry default. Almost every model speaks it now. We expose a single endpoint that routes to all 5 vendors based on the model name.
export OPENAI_API_KEY="sk-aic-your-key-here"
export OPENAI_BASE_URL="https://aiapi.cheap/api/proxy"Now the same client, same code, hits any vendor:
import OpenAI from "openai";
const client = new OpenAI();
// Call GPT-4o
const gpt = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello from OpenAI!" }]
});
// Same client, change model — now you're calling Claude
const claude = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Hello from Anthropic!" }]
});
// Same client, change model — now you're calling Gemini
const gemini = await client.chat.completions.create({
model: "gemini-3-pro-preview",
messages: [{ role: "user", content: "Hello from Google!" }]
});
// Same client, change model — Grok
const grok = await client.chat.completions.create({
model: "grok-4.2",
messages: [{ role: "user", content: "Hello from xAI!" }]
});
// Same client, change model — DeepSeek
const deepseek = await client.chat.completions.create({
model: "deepseek-v3.2",
messages: [{ role: "user", content: "Hello from DeepSeek!" }]
});Five vendors. One client. One key. The OpenAI Chat Completions request shape is documented in the OpenAI API reference and we accept the same body for all five.
Vendor Coverage at a Glance
Which vendor for which job? Quick map:
| Vendor | Example Model | Best For |
| --- | --- | --- |
| Claude (Anthropic) | claude-sonnet-4-6, claude-opus-4-7 | Long-form writing, code review, agentic loops, refactoring |
| GPT (OpenAI) | gpt-4o, gpt-4o-mini | Vision, function calling, fast cheap tasks, broad ecosystem |
| Gemini (Google) | gemini-3-pro-preview | Massive context windows, multimodal, real-time low-latency |
| Grok (xAI) | grok-4.2 | Real-time data flavor, casual conversational tone |
| DeepSeek | deepseek-v3.2 | Coding, math, the cheapest tier per token |
This isn't a tier list — it's just what people typically reach for first. The point is: your stack should not force you to commit to one of these and abandon the rest.
What "Same Quality" Actually Means
We forward your request to the real vendor. We do not:
If Claude Sonnet 4.6 has a 200K context window on the official API, it has a 200K context window through us. If GPT-4o supports vision, vision works. If Gemini 3 Pro supports the latest tool-use shape, that works.
The one thing we strip from streaming responses is the vendor fingerprint metadata (the headers and IDs that identify the upstream provider on the wire). Functionally that's invisible to your code — you still get the full text, full streaming chunks, full structured responses. We just don't broadcast which upstream credit pool served the request.
Streaming Works the Same Way
Both endpoints support Server-Sent Events streaming exactly like the official APIs:
const stream = await client.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Stream me a long response." }]
});
for await (const event of stream) {
if (event.type === "content_block_delta") {
process.stdout.write(event.delta.text || "");
}
}For OpenAI-format streaming, set stream: true on the chat completion call and iterate the chunks. Same as you'd do with the real OpenAI SDK.
Tools That Already Work
Because we mirror the official wire formats, anything that speaks Anthropic or OpenAI just plugs in:
base_url to the ChatAnthropic / ChatOpenAI constructor.If the tool lets you set OPENAI_BASE_URL or ANTHROPIC_BASE_URL, it works.
A Real Workflow Example
Let's say you're building a content drafting tool. You want:
With five separate accounts, that's three SDK setups, three clients, three billing reconciliations.
With one sk-aic-* key:
import OpenAI from "openai";
const client = new OpenAI();
async function draftAndScore(brief) {
const draft = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: `Draft a blog post: ${brief}` }]
});
const title = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: `Improve this headline: ${draft.choices[0].message.content}` }]
});
const score = await client.chat.completions.create({
model: "gemini-3-pro-preview",
messages: [{ role: "user", content: `Score this draft 1-10 against the brief.\nBrief: ${brief}\nDraft: ${draft.choices[0].message.content}` }]
});
return { draft, title, score };
}Three vendors, one client, one billing line. This is what we mean when we say multi-AI shouldn't be hard.
Pricing in Plain English
Basic (free forever): 70% off all 5 vendors. 200 requests/minute. Pay-per-use, top up when you need it.
Pro ($19 lifetime): 80% off all 5 vendors. 500 requests/minute. One-time payment, never expires.
No monthly subscription. You pay for what you actually use, in tokens, at our discounted rate. Top up via crypto (Oxapay supports 100+ coins). Minimum $5 to start, useful for benchmarking before you commit to anything.
Common Questions
Is this an unofficial API wrapper? No. We forward your request to the real provider. Same models, same versions, same outputs. We just buy the credits in bulk and pass the discount to you.
What about rate limits? Plan-based at our edge (200 RPM Basic, 500 RPM Pro). Beyond that, the upstream vendor's per-model rate still applies — same as if you were calling direct.
Do prompts get logged or trained on? No. We don't store prompt content. We log usage metadata (token counts, timestamps, model) for billing — that's it.
What happens if a vendor goes down? That request fails (same as direct). The other four vendors keep working. This is one of the underrated advantages of multi-vendor — when GPT has a bad afternoon, your Claude calls don't care.
Can I see token usage per model? Yes, your dashboard breaks down spend per vendor and per model so you can spot which one is eating your budget.
Welcome to One-Key Multi-AI
That's the whole shape of the thing. One key, one base URL, every major model, 70-80% cheaper. If your existing code uses the Anthropic SDK or the OpenAI SDK, the migration is two environment variables.
For the broader product context and why we built this, see the welcome post. For tool-specific setup walkthroughs (Claude Code, Cursor, LangChain), the docs cover each in detail.
If you've been putting off trying GPT or Gemini because "ugh, another account," this is the post that says you don't have to. Top up $5, change two env vars, and benchmark for an afternoon. Worst case you've spent a coffee. Best case you save a few hundred dollars a month.
That's the pitch.