Updates — Freetokens.dev

Gemini 2.5 Pro is free on AI Studio. 1M context, Google's flagship, no credit card — the headline free-tier deal in AI right now. The catch: Google trains on everything you send through the free tier. Perfect for prototypes. Disastrous if you ship user data through it.

Gemini 2.5 Pro on Google AI Studio Apr 23, 2026

14,400 free requests a day on Llama 3.1 8B via Groq. That's 10 per minute sustained, zero dollars, zero credit card. Stop hunting 70Bs and build your side project on this — it's the most usable free tier in the game.

Llama 3.1 8B on Groq Apr 22, 2026

This is the most underrated free deal in AI. Qwen 3 235B A22B — Alibaba's frontier MoE — running at ~2000 tok/sec on Cerebras wafer-scale, 1M tokens/day free. A flagship model for zero dollars. Stop reading and go sign up.

Qwen 3 235B A22B on Cerebras Apr 20, 2026

Cerebras hands you 1 million free tokens a day on Llama 3.1 8B at 2000+ tok/sec on wafer-scale silicon. Literally nobody talks about this. Get a key, point your SDK at it, move on. Treat it as free speed and stop overthinking.

Llama 3.1 8B on Cerebras Apr 17, 2026

Moonshot's Kimi K2 is quietly on Groq's free tier at 60 RPM. Faster than anyone else serves it and completely free. If you've never run a Chinese frontier model on LPU silicon, this is your on-ramp — five minutes from signup to first call.

Kimi K2 Instruct on Groq Apr 15, 2026

NVIDIA's Nemotron 3 Super — a 120B hybrid Mamba-Transformer MoE — is free on OpenRouter. Weird architecture worth poking at, long context, zero dollars. Prompts may be logged for upstream training, so keep it to experiments and synthetic data.

Nemotron 3 Super 120B A12B on OpenRouter Apr 13, 2026

Microsoft throws $200 of Azure credit at new accounts and — as of March 2026 — the old Azure OpenAI access-request form is finally dead. Anyone with an account can deploy GPT-4o now. Catches: credit card required, 30-day clock on the credit, and free-tier subs still choke on Foundry deployments sometimes. But $200 is $200 for a serious throughput test.

GPT-4o on Microsoft Azure Apr 10, 2026

Nous and Xiaomi just dropped a 2-week free window on MiMo V2 Pro — Xiaomi's ~1T-parameter MoE flagship — routed through Hermes Agent on the Nous Portal. Install Hermes, run hermes update, sign into a free Nous account, and you're calling a trillion-parameter model for nothing until ~April 21. Not OpenAI-compatible (Hermes Agent CLI only), but you're not going to get another shot at 1T free any time soon. Clock's ticking.

MiMo V2 Pro on Nous Research Apr 7, 2026

NVIDIA hands out 1,000 free inference credits to Developer Program members, 40 RPM, OpenAI-compatible, 100+ models behind one endpoint — Llama 3.1 70B, Nemotron, Kimi K2.5, MiniMax. The Dev Program form is 5 minutes of friction for 1,000 free calls. Good trade. Underused because nobody talks about it.

Llama 3.1 70B on NVIDIA NIM Apr 5, 2026

SambaNova gives you $5 of credits on signup — enough to touch Llama 3.1 405B on their RDU silicon, one of the only places you can run a 405B model free. Credits expire in 30 days, rate-limited free tier continues after. No credit card. If you haven't tried RDU inference yet, this is the cheapest test drive there is.

Llama 3.3 70B on SambaNova Cloud Apr 3, 2026

Llama 3.3 70B free on Groq at ~275 tok/sec. The 1K-request-a-day ceiling stops you running a SaaS on it, but for agent loops, evals, and weekend builds, it's the fastest free 70B on the planet. Go.

Llama 3.3 70B on Groq Apr 1, 2026

Alibaba's Qwen 3.6 Plus Preview just landed free on OpenRouter. 1M context. The 'Preview' label means the second Alibaba flips it to GA, the free endpoint dies — and nobody knows when that drops. This is exactly the kind of deal you check your inbox for. Use it now, not next week.

Qwen 3.6 Plus Preview on OpenRouter Mar 31, 2026

Cloudflare Workers AI gives you 10,000 free Neurons a day across Llama, Mistral, Qwen, and more — edge-deployed in a one-line Worker call. Neurons aren't tokens, so small models stretch way further than you'd think. If you already live on CF, this is effectively free inference co-located with your app.

Llama 3.1 8B on Cloudflare Workers AI Mar 28, 2026

GitHub Models lets you hit GPT-4.1, Llama, Mistral, and xAI behind your GitHub PAT for free. Read the fine print: the free tier caps context at 8K/4K — even on GPT-4.1 — and tops out at 50 req/day on the big models. Wrong tool for production. Right tool for one-token multi-provider experiments.

GPT-4.1 on GitHub Models Mar 26, 2026

HuggingFace's free Inference tier is ten cents a month. Yes, ten cents. It's a tasting flight across DeepSeek V3, Llama, Qwen routed through HF's Inference Providers — you get a handful of calls, then you either upgrade to PRO at $9/mo or bounce. At least it's honest about what it is.

DeepSeek V3 on Hugging Face Mar 23, 2026

Dispatches