4 Jul 2026 · 5 min read

Self-hosted AI assistant: keep your data on your own machine

Why run a private, self-hosted AI assistant — mail, keys and memory that never leave your box — plus an honest look at local models, cost and the cloud-vs-own-machine trade-off.

A self-hosted AI assistant is an AI helper that runs on hardware you control instead of a vendor's cloud, so your mail, credentials, and long-term memory stay on your box. It matters because the assistant sees your most sensitive context, and where that context lives is a decision worth making on purpose rather than by default.

Why host your own?

The strongest reason is privacy. A capable assistant reads your inbox, holds API keys, and builds a memory of how you work. When you run it yourself, that data and those secrets never leave your machine. A good setup keeps credentials encrypted at rest (for example, with sops and age) so even the disk isn't a soft target.

The second reason is control and ownership. A hosted product can change its pricing, retention, or terms overnight. Your own instance changes only when you change it. You decide what gets logged, what gets deleted, and what integrations are allowed.

The third is cost predictability for something that runs around the clock. A private AI assistant that lives on a small always-on server has a flat, known footprint: power and the box itself. If you bring your own model subscription, you also avoid a metered per-request bill for everyday use.

Cloud vs. your own machine

Dimension	Cloud (hosted)	Your own machine
Where data lives	Vendor infrastructure	Local disk you control
Setup speed	Minutes, nothing to run	A bit more up-front effort
Local-model option	Rarely available	Available as a fallback
Cost	Often metered per use	Flat hardware + your own subscription
Best for	Trying it, low sensitivity	Privacy-minded, long-lived setups

Neither column is strictly better. The cloud is the fastest way to find out whether an assistant earns a place in your day. Your own machine is where it belongs once it holds things you'd rather not hand to a third party.

What "local models" really means here

It's tempting to read "run AI assistant on your own server" as "runs entirely offline with no quality loss." That's not honest, so here's the real shape of it.

You can run a small local model on the device as a fallback for sensitive tasks. That means certain "brains" -- the ones touching your most private context -- can do their work without any data leaving the machine. For triage, redaction, and simple routing, an on-device model is often good enough.

But the strongest results still come from a top-tier cloud model. A small local model is a genuine step down in reasoning quality, not a free swap. So "fully offline" is a trade-off you choose per task, not a switch that costs nothing. The pragmatic pattern is a strong cloud model as the primary brain, with a local model available for the handful of jobs where keeping data on the device matters more than peak quality.

Frame it as a dial, not a religion. Most people want most of their assistant's intelligence and a small, private lane for the sensitive parts.

The catch

Self-hosting means you own the uptime. If your box reboots, nothing reboots it for you. If a dependency drifts, you fix it. That responsibility is the honest price of keeping data on your own machine.

The way to keep this sane is to keep it simple. A good self-hosted assistant should come up with a single command and lean on your existing AI subscription rather than a tangle of services you have to babysit. The less bespoke plumbing, the fewer 3 a.m. surprises. Aim for a setup you could re-create from scratch in an afternoon, because someday you might have to.

A pragmatic path

Start in the cloud. Let the assistant prove it's worth having before you invest in owning it. Wire up your real workflows, see where it helps, and notice which tasks make you uneasy about where the data goes.

When that unease shows up -- and for anyone handling client work, keys, or personal correspondence, it will -- move the assistant to your own machine. Nothing about the workflow needs to change; only the address of where it runs.

Figaro is one example that runs both ways. You can try it hosted, then move it onto a small home server where your data and sops/age-encrypted credentials stay local, with an optional on-device model fallback for the sensitive brains. It uses your own Claude Max subscription, so there's no per-token inference bill for everyday use, and it lives where you already are -- in Telegram. More at myfigaro.ai.

The point isn't sovereignty for its own sake. It's a calm, unremarkable option: keep the private parts private, keep the setup simple, and let the assistant be useful without making you think about it.