A local router for many free model quotas
FreeLLMAPI is a self-hosted OpenAI-compatible proxy. Its pitch is direct: put keys from many free LLM providers behind one /v1/chat/completions endpoint, then let a router pick an available provider, fail over on rate limits, and track per-key usage. The README lists Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, Hugging Face, Z.ai, Ollama, Kilo, Pollinations, LLM7, OVH, OpenCode Zen, and custom OpenAI-compatible endpoints.
This is attractive for experiments because many tools only need an OpenAI-shaped endpoint. It is also a risk magnet. A proxy that stores upstream keys, chooses fallback models, and touches provider free tiers sits at the intersection of reliability, cost, privacy, and terms of service.
Install paths
The README’s fastest route is Docker. It fetches an install script from the project site, creates a local directory, generates an encryption key, pulls the container image, and starts the app. The manual route is more auditable:
git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
ENCRYPTION_KEY="$(openssl rand -hex 32)"
printf "ENCRYPTION_KEY=%s\nPORT=3001\n" "$ENCRYPTION_KEY" > .env
docker compose up -d
For local development, the README requires Node.js 20 or newer, npm install, an .env, and npm run dev.
What it supports
FreeLLMAPI implements OpenAI-style chat completions, /v1/models, streaming, tool calls, embeddings, a Responses API shim, request analytics, a dashboard, sticky sessions, health checks, encrypted key storage, and a unified bearer token for clients. It is intentionally single-user. The README says image generation, audio, legacy completions, moderation, multi-output n > 1, and multi-tenant billing are not supported.
The risk profile
The README says “Personal experimentation only”, and that is the right framing. Provider free tiers change. Promotional routes disappear. Some providers restrict production or automated use. A fallback chain can also change model behavior during a conversation, even with sticky sessions and context handoff.
The key custody question is just as important. FreeLLMAPI encrypts upstream keys at rest with AES-256-GCM, but the proxy still decrypts them in memory to make requests. Run it like infrastructure, not like a harmless desktop toy: local bind by default, strong admin password, no public exposure unless you understand the blast radius.
Related
For document conversion before LLM calls, see microsoft/markitdown. For local AI workspace patterns, see pewdiepie-archdaemon/odysseus and open-webui/open-webui.
FAQ
Is FreeLLMAPI an LLM provider? No. It is a proxy that routes requests to upstream providers and local OpenAI-compatible endpoints.
Does it support the OpenAI Responses API? The README says it implements a translating shim over the same router, including streaming events and tool calls.
Is it safe to expose on the internet? Treat that as high risk. The README describes a single-user design, and LAN exposure requires changing the host bind.
Is using many free tiers allowed? That depends on each provider’s terms. The project can route traffic, but it cannot make upstream policy risk disappear.