Gemini API Cheap: 80% Off via aiapi.cheap (No Google Cloud Required)
Skip the Google Cloud project setup. Hit Gemini 2.5 Pro and Flash via aiapi.cheap with one key, OpenAI-compatible SDK, 80% off. Setup in 60 seconds.
The Friction Nobody Mentions
Want to use Gemini? Cool. Here's what Google asks of you:
1. Sign up for Google Cloud
2. Create a project
3. Enable the Generative Language API (yes, that's the actual name)
4. Create credentials (API key or service account, both with their own quirks)
5. Set billing
6. Worry about Google Cloud's API quotas vs Gemini's separate quotas
For a side project, that's a Saturday gone.
aiapi.cheap lets you call Gemini through the same OpenAI SDK you already use, with one sk-aic-* key. No Google Cloud project, no IAM, no service account JSON files. And 80% cheaper.
The Setup
from openai import OpenAI
client = OpenAI(
api_key="sk-aic-YOUR_API_KEY",
base_url="https://aiapi.cheap/api/proxy/v1",
)
resp = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Hello, Gemini!"}],
)
print(resp.choices[0].message.content)That's it. The OpenAI SDK speaks to Gemini via our proxy. Behind the scenes we translate the OpenAI ChatCompletions format to Gemini's native format and back.
Models Available
| Model | Best For | Pro Plan Pricing (input / output per 1M tokens) |
|---|---|---|
| gemini-2.5-pro | Complex reasoning, long context (1M+ tokens), multimodal | $0.25 / $1.00 |
| gemini-2.5-flash | High-volume cheap tasks, low latency | $0.015 / $0.06 |
Gemini Flash is already one of the cheapest frontier models on the market. At 80% off, it's effectively free for prototyping. Pro is great when you need to stuff massive context (the 1M token window is real and useful).
Official Gemini 2.5 Pro is $1.25/$5.00. Flash is $0.075/$0.30. Pro plan is 80% off both.
When to Pick Gemini
If you need long-form prose Claude is usually better. If you need function calling reliability GPT-4o is more battle-tested. But for the Gemini sweet spots above, nothing else competes.
Streaming
Gemini supports SSE streaming via the OpenAI-compatible endpoint:
stream = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Stream a story."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)Node.js Example
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.AIAPI_KEY!,
baseURL: "https://aiapi.cheap/api/proxy/v1",
});
const resp = await client.chat.completions.create({
model: "gemini-2.5-pro",
messages: [
{
role: "user",
content: "Summarize the implications of frontier-model context length growth.",
},
],
});
console.log(resp.choices[0].message.content);Long-Context Pattern
The killer feature is the 1M+ token window. Use it like this:
with open("entire_codebase.txt") as f:
codebase = f.read() # could be hundreds of thousands of tokens
resp = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{"role": "system", "content": f"You are reviewing this codebase:\n{codebase}"},
{"role": "user", "content": "Find security issues in the auth module."},
],
)No chunking, no embedding pipeline, no RAG complexity. Just dump it in and ask.
Common Mistakes
Pricing Math (Real Workload)
A chatbot doing 50,000 requests/day on Gemini Flash (300 input, 600 output tokens):
Not huge in absolute terms because Flash is already cheap. But scale up to 500K req/day and the savings compound to ~$2,400/month — and there's still no monthly subscription.
Next Steps
No Google Cloud project. No IAM. Just one key.