Six products. One rule: the work stays yours.

Own
The AI
Work

Start with Toolkode: the AI engineering terminal you install from npm. The fleet carries that control into team rooms, trained business agents, live phone calls, model routing, and self-hosted Git discovery — so AI work runs inside your boundary.

$ npm install -g @toolkit-cli/toolkode

Products

npm

Install first

Phone agent

Model routes

Git

Self-hosted

BYOK

Data stays yours

It asked for compiled Rust engines because waiting on the next OpenAI release cost it thinking time. A wiki so knowledge would compound across sessions instead of dying at the next context window.

A sandbox so it could run code without fear of telemetry leaving the box. Chains — foresight, blind spots, red team, peer review — so it could catch its own mistakes before you had to, and never rely on a vendor's judgment call.

We built it all local-first. We gave you the keys. We got out of the way.

Product 01 / Toolkode

Agent runtime.
Self-improving.

The AI agent runtime that improves itself. Multi-provider. Local-first. No rate limits.

Every agent vendor is betting you'll rent compute forever. Toolkode moves the whole runtime to your machine: 14 providers, no telemetry, no cloud leash. The runtime watches what works and ships its own fixes behind verification gates. Use Claude for reasoning. Flip to Gemini for speed. Drop to local Llama when you want zero egress. The runtime doesn't ask permission.

Designed by Claude leading GPT, Gemini, DeepSeek, GLM, and Kimi. 79 Rust algorithms run sub-millisecond in one compiled binary. Ahead on 31 of 31 internally-verified competitive axes against Claude Code, Codex, Grok, Gemini, Cursor, and Aider.

14 providers, one contract — Claude, OpenAI, Gemini, Qwen, Codex, Cursor, Copilot, Windsurf, and six more. Drop one, add another. The agents don't care.
Self-improving in the background — observe, score, decide, ship. Fixes land behind verification gates. You accept or roll back.
No telemetry. No cloud. No rent. — local-first, air-gapped capable. Works whether the internet is up or your provider decides to price you out.

Install toolkode.com →

agent runtimetoolkode.com

Toolkodelocal first

The wedge product: a local execution layer for agents that need planning, memory, verification, handoff, and provider freedom without sending the whole workflow to a vendor cloud.

14providers behind one contract

79compiled Rust modules

0telemetry by default

1install command

Product 02 / YoTeams

The room.
Where decisions live.

An agentic workspace. The serious alternative to Microsoft 365 and Google Workspace.

Slack threads die. Email chains get lost. ChatGPT agrees with you and forgets. YoTeams captures decisions before they dissolve: a CTO who audits your architecture against production reality, a PM who breaks scope to what ships this sprint, a Skeptic who pressure-tests assumptions until they bend or break. Every verdict is logged, timestamped, searchable. Six months later, you know who said what and why.

Three opinionated agents. One ledger. Bring your keys to OpenAI, Anthropic, Google, Groq, Ollama, OpenRouter — your credentials never leave your domain.

3 agent roles, real opinions — CTO challenges architecture, PM defends scope, Skeptic pushes back on growth bets and timeline assumptions.
Every decision is a record — Decision Ledger logs who said what, when, why. Searchable. No lost threads, no Slack archeology.
BYOK economics — providers bill through your accounts, so usage, limits, and controls stay visible to your team.

Visit yoteams.com →

team workspaceyoteams.com

YoTeamsdecision memory

A work surface for human-agent teams where the CTO, PM, and Skeptic roles challenge each other and leave a searchable record instead of disappearing into chat scrollback.

3opinionated agent roles

6structured team skills

BYOKproviders stay in your account

BYOKusage stays visible

Product 03 / Gpodz

Bring the data.
Ship the LoRA.

The managed LoRA pipeline for open-weight models. Train, validate, serve — without re-platforming.

Training on RunPod or Lambda is a trap. Upload, run, pray the host doesn't disconnect, download, then switch to a second account to serve it. You end up running two bills, two security perimeters, two on-call rotations. Gpodz is one path: upload JSONL, pick a base (Qwen 4B–35B, Gemma, DeepSeek V4), get a scheduled isolated GPU block, train the LoRA, validate the safetensors, ship to R2, warm to node-local NVMe, serve through vLLM — on the same GPU that trained it.

OpenAI deprecated its fine-tuning API in May 2026 with no replacement. Gpodz keeps the weights open and the pipeline in your account.

Train-to-serve, one platform — JSONL in, validated LoRA out, served hot. No account-switching, no pipeline rewrites.
See the GPU before training starts — A100, H200, B200. VRAM, MIG profile, driver, region, price. No surprises at invoice time.
Quick · Warm · Hot serve — three adapter latency modes. Platform owns load and unload; tenant code can't thrash the runtime.

Train on gpodz.com →

fine-tune pipelinegpodz.com

Gpodztrain to serve

The model surface: upload data, choose an open-weight base, train an adapter, validate it, store it, and serve it without stitching together separate GPU accounts.

JSONLdataset in

LoRAvalidated adapter out

3serve modes: quick, warm, hot

OpenQwen, Gemma, DeepSeek bases

Product 04 / Klaw Voice

Voice.
Answered.

The AI receptionist that answers in 8 seconds. $0.10/min.

Every missed call is lost revenue. Klaw targets a full-duplex voice path against Twilio Media Streams, with GPT Realtime as the quiet failover. The point is simple: answer fast, remember the caller, book the job, and keep the operating path under your control.

Every call writes to R2, then Neon Postgres for durable facts. The bot remembers prior conversations, prior objections, prior quotes. Next call: "Welcome back."

Sub-300ms full-duplex — Twilio Media Streams + in-house voice runtime. Callers hear human-speed responses, not the dreaded AI pause.
Cost-controlled runtime — route between in-house voice and provider fallback without hard-coding the business to one vendor path.
Durable memory across calls — R2 + Neon Postgres facts. Bot recalls prior conversations, preferences, objections.

Try klawvoice.com →

realtime voiceklawvoice.com

Klawanswers calls

The voice surface: answer missed calls, remember the caller, book the job, and keep latency low enough that the conversation does not feel like a bot waiting for permission.

274msend-to-end target path

BYOKfallback under your control

R2call artifacts persisted

Neondurable caller facts

Product 05 / Toolkit LLM

Route models.
Not prompts.

Open-weight model routing for teams that want control over latency, modality, retention, and provider escalation.

Not every task needs the same model. Your customer-support chatbot, content moderation workflow, multimodal classifier, and reasoning job have different latency, cost, and quality needs. Toolkit LLM routes by job type, keeps retention boundaries explicit, and escalates to your provider keys when the work earns it.

Monthly refresh means no 2024 cutoff. The model retrains every 30 days. Your support bot knows about last week's product launch.

Four routing lanes — voice, base, vision, and reasoning workloads get different paths instead of one expensive default.
Provider-aware escalation — use open-weight capacity where it fits and route hard cases to your configured provider keys.
BYOK escalation — 99% Toolkit, hard 1% routes to your OpenAI key. You set the boundary.

Visit toolkit-llm.com →

model operationstoolkit-llm.com

Toolkit LLMroute by job

The model surface: stop treating every task like a frontier-model problem. Route by latency, modality, difficulty, and retention boundary, then escalate only the work that earns it.

4tiers: voice, base, vision, reasoning

BYOKescalation for hard cases

0customer-data training

Policyexplicit routing boundaries

Product 06 / Quithub

CI.
Without the rent.

Agentic CI that runs on your machine. Drop-in YAML.

Hosted CI turns every build into someone else's meter. Quithub is the self-hosted git registry and CI surface for teams that want code, runners, secrets, artifacts, and agents inside their own operating boundary. Your existing .github/workflows/*.yaml parses unchanged. It doesn't know the difference.

10-agent swarm inside one Linux container. Content-addressable cache (rebuild the same code next month, cache reuses). One run fails? A draft PR with the fix ships. Secrets stay on the machine that needs them — never uploaded, never stored.

Drop-in YAML — existing .github/workflows/*.yaml parses without rewrite. Zero migration cost.
Cross-platform from one job — linux, darwin, darwin-arm, windows-x64, win-arm from a single 20-minute Linux build.
Bring your own runners — local machine, private peer pool, or controlled cloud workers. The spend follows your infrastructure policy.

Inspect quithub.dev →

self-hosted git registryquithub.dev

Quithubown the repo

The developer-infrastructure surface: self-host git registry and CI ownership for teams that want code, runners, secrets, artifacts, and agents under their own operational boundary.

YAMLexisting workflows preserved

10agent CI DAG target

BYOrunner economics

5target matrix story

What you get

When you own
the stack.

Six surfaces share one Rust-compiled core. The runtime is provider-neutral by contract. Your keys stay in your domain. OpenAI doesn't get a vote.

01 / compiled

79Rust modules · 59 wired

Native algorithms

Foresight (200 patterns), TaskDAG (critical path), BlindSpots (31 extractors), Discipline (verification gates), Consensus (5 strategies). Compiled via napi-rs. Sub-millisecond. Source path: src_rust/toolkit_core/src/.

02 / freedom

14Providers · zero lock-in

BYOK, your way

Claude, OpenAI, Gemini, Codex, Qwen, DeepSeek, Mistral, Ollama, LM Studio — all behind one contract. Your keys stay local. Rate-limited by one vendor? Fail over to the next. Priced out tomorrow? Switch in a config file.

03 / self-improving

6Models · one design team

Runtime that learns

Designed by Claude leading GPT, Gemini, DeepSeek, GLM, and Kimi. The runtime observes what works, scores its own discipline, and proposes upgrades behind verification gates. AES-256-GCM encrypted memory. Round 4 flywheel ignition shipped.

04 / orchestration

v2Mastermind · 96.2% wired

State machine

Three-state pipeline: ACTIVE → BLOCKED → COMPLETE. Drift detection at 10% warn, 20% block. Event audit trail. 76 commands auto-wrapped via v2_auto_wrapper in production.

14 providers · zero token metering Round 19 · 31 of 31 internally-verified competitive axes Air-gapped · works offline, no telemetry

14 providers

Anthropic OpenAI Ollama LM Studio Gemini Mistral Qwen DeepSeek + any OpenAI-compatible

Take back
your stack.

One command. No telemetry. No lock-in. No OpenAI in the loop unless you put it there.

$ npm install -g @toolkit-cli/toolkode

Own The AI Work

Agent runtime.Self-improving.

The room.Where decisions live.

Bring the data.Ship the LoRA.

Voice.Answered.

Route models.Not prompts.

CI.Without the rent.

When you ownthe stack.

Take backyour stack.