toolkit-cli 182 slash commands // 79 rust algorithms // 14 ai providers // 1,097 compiled tests 14 agent targets // no telemetry. no cloud. your code never leaves your machine local-first. no telemetry. // toolkit-cli 182 slash commands // 79 rust algorithms // 14 ai providers // 1,097 compiled tests 14 agent targets // no telemetry. no cloud. your code never leaves your machine local-first. no telemetry. //
sorting — bubble sort
Six products. One rule: the work stays yours.

Own
The AI
Work

Start with Toolkode: the AI engineering terminal you install from npm. The fleet carries that control into team rooms, trained business agents, live phone calls, model routing, and self-hosted Git discovery — so AI work runs inside your boundary.

$ npm install -g @toolkit-cli/toolkode
6
Products
npm
Install first
AI
Phone agent
10
Model routes
Git
Self-hosted
BYOK
Data stays yours

It asked for compiled Rust engines because waiting on the next OpenAI release cost it thinking time. A wiki so knowledge would compound across sessions instead of dying at the next context window.

A sandbox so it could run code without fear of telemetry leaving the box. Chains — foresight, blind spots, red team, peer review — so it could catch its own mistakes before you had to, and never rely on a vendor's judgment call.

We built it all local-first. We gave you the keys. We got out of the way.

Product 01 / Toolkode

Agent runtime.
Self-improving.

The AI agent runtime that improves itself. Multi-provider. Local-first. No rate limits.

Every agent vendor is betting you'll rent compute forever. Toolkode moves the whole runtime to your machine: 14 providers, no telemetry, no cloud leash. The runtime watches what works and ships its own fixes behind verification gates. Use Claude for reasoning. Flip to Gemini for speed. Drop to local Llama when you want zero egress. The runtime doesn't ask permission.

Designed by Claude leading GPT, Gemini, DeepSeek, GLM, and Kimi. 79 Rust algorithms run sub-millisecond in one compiled binary. Ahead on 31 of 31 internally-verified competitive axes against Claude Code, Codex, Grok, Gemini, Cursor, and Aider.

  • 14 providers, one contract — Claude, OpenAI, Gemini, Qwen, Codex, Cursor, Copilot, Windsurf, and six more. Drop one, add another. The agents don't care.
  • Self-improving in the background — observe, score, decide, ship. Fixes land behind verification gates. You accept or roll back.
  • No telemetry. No cloud. No rent. — local-first, air-gapped capable. Works whether the internet is up or your provider decides to price you out.
Install toolkode.com
Product 02 / YoTeams

The room.
Where decisions live.

An agentic workspace. The serious alternative to Microsoft 365 and Google Workspace.

Slack threads die. Email chains get lost. ChatGPT agrees with you and forgets. YoTeams captures decisions before they dissolve: a CTO who audits your architecture against production reality, a PM who breaks scope to what ships this sprint, a Skeptic who pressure-tests assumptions until they bend or break. Every verdict is logged, timestamped, searchable. Six months later, you know who said what and why.

Three opinionated agents. One ledger. Bring your keys to OpenAI, Anthropic, Google, Groq, Ollama, OpenRouter — your credentials never leave your domain.

  • 3 agent roles, real opinions — CTO challenges architecture, PM defends scope, Skeptic pushes back on growth bets and timeline assumptions.
  • Every decision is a record — Decision Ledger logs who said what, when, why. Searchable. No lost threads, no Slack archeology.
  • BYOK economics — providers bill through your accounts, so usage, limits, and controls stay visible to your team.
Visit yoteams.com
Product 03 / Gpodz

Bring the data.
Ship the LoRA.

The managed LoRA pipeline for open-weight models. Train, validate, serve — without re-platforming.

Training on RunPod or Lambda is a trap. Upload, run, pray the host doesn't disconnect, download, then switch to a second account to serve it. You end up running two bills, two security perimeters, two on-call rotations. Gpodz is one path: upload JSONL, pick a base (Qwen 4B–35B, Gemma, DeepSeek V4), get a scheduled isolated GPU block, train the LoRA, validate the safetensors, ship to R2, warm to node-local NVMe, serve through vLLM — on the same GPU that trained it.

OpenAI deprecated its fine-tuning API in May 2026 with no replacement. Gpodz keeps the weights open and the pipeline in your account.

  • Train-to-serve, one platform — JSONL in, validated LoRA out, served hot. No account-switching, no pipeline rewrites.
  • See the GPU before training starts — A100, H200, B200. VRAM, MIG profile, driver, region, price. No surprises at invoice time.
  • Quick · Warm · Hot serve — three adapter latency modes. Platform owns load and unload; tenant code can't thrash the runtime.
Train on gpodz.com
Product 04 / Klaw Voice

Voice.
Answered.

The AI receptionist that answers in 8 seconds. $0.10/min.

Every missed call is lost revenue. Klaw targets a full-duplex voice path against Twilio Media Streams, with GPT Realtime as the quiet failover. The point is simple: answer fast, remember the caller, book the job, and keep the operating path under your control.

Every call writes to R2, then Neon Postgres for durable facts. The bot remembers prior conversations, prior objections, prior quotes. Next call: "Welcome back."

  • Sub-300ms full-duplex — Twilio Media Streams + in-house voice runtime. Callers hear human-speed responses, not the dreaded AI pause.
  • Cost-controlled runtime — route between in-house voice and provider fallback without hard-coding the business to one vendor path.
  • Durable memory across calls — R2 + Neon Postgres facts. Bot recalls prior conversations, preferences, objections.
Try klawvoice.com
Product 05 / Toolkit LLM

Route models.
Not prompts.

Open-weight model routing for teams that want control over latency, modality, retention, and provider escalation.

Not every task needs the same model. Your customer-support chatbot, content moderation workflow, multimodal classifier, and reasoning job have different latency, cost, and quality needs. Toolkit LLM routes by job type, keeps retention boundaries explicit, and escalates to your provider keys when the work earns it.

Monthly refresh means no 2024 cutoff. The model retrains every 30 days. Your support bot knows about last week's product launch.

  • Four routing lanes — voice, base, vision, and reasoning workloads get different paths instead of one expensive default.
  • Provider-aware escalation — use open-weight capacity where it fits and route hard cases to your configured provider keys.
  • BYOK escalation — 99% Toolkit, hard 1% routes to your OpenAI key. You set the boundary.
Visit toolkit-llm.com
Product 06 / Quithub

CI.
Without the rent.

Agentic CI that runs on your machine. Drop-in YAML.

Hosted CI turns every build into someone else's meter. Quithub is the self-hosted git registry and CI surface for teams that want code, runners, secrets, artifacts, and agents inside their own operating boundary. Your existing .github/workflows/*.yaml parses unchanged. It doesn't know the difference.

10-agent swarm inside one Linux container. Content-addressable cache (rebuild the same code next month, cache reuses). One run fails? A draft PR with the fix ships. Secrets stay on the machine that needs them — never uploaded, never stored.

  • Drop-in YAML — existing .github/workflows/*.yaml parses without rewrite. Zero migration cost.
  • Cross-platform from one job — linux, darwin, darwin-arm, windows-x64, win-arm from a single 20-minute Linux build.
  • Bring your own runners — local machine, private peer pool, or controlled cloud workers. The spend follows your infrastructure policy.
Inspect quithub.dev
What you get

When you own
the stack.

Six surfaces share one Rust-compiled core. The runtime is provider-neutral by contract. Your keys stay in your domain. OpenAI doesn't get a vote.

01 / compiled
79Rust modules · 59 wired
Native algorithms
Foresight (200 patterns), TaskDAG (critical path), BlindSpots (31 extractors), Discipline (verification gates), Consensus (5 strategies). Compiled via napi-rs. Sub-millisecond. Source path: src_rust/toolkit_core/src/.
02 / freedom
14Providers · zero lock-in
BYOK, your way
Claude, OpenAI, Gemini, Codex, Qwen, DeepSeek, Mistral, Ollama, LM Studio — all behind one contract. Your keys stay local. Rate-limited by one vendor? Fail over to the next. Priced out tomorrow? Switch in a config file.
03 / self-improving
6Models · one design team
Runtime that learns
Designed by Claude leading GPT, Gemini, DeepSeek, GLM, and Kimi. The runtime observes what works, scores its own discipline, and proposes upgrades behind verification gates. AES-256-GCM encrypted memory. Round 4 flywheel ignition shipped.
04 / orchestration
v2Mastermind · 96.2% wired
State machine
Three-state pipeline: ACTIVE → BLOCKED → COMPLETE. Drift detection at 10% warn, 20% block. Event audit trail. 76 commands auto-wrapped via v2_auto_wrapper in production.
14 providers · zero token metering Round 19 · 31 of 31 internally-verified competitive axes Air-gapped · works offline, no telemetry
14 providers
Anthropic OpenAI Ollama LM Studio Gemini Mistral Qwen DeepSeek + any OpenAI-compatible

Take back
your stack.

One command. No telemetry. No lock-in. No OpenAI in the loop unless you put it there.

$ npm install -g @toolkit-cli/toolkode