Privacy-Routed LLM Inference: Keeping Sensitive Data Out of the Cloud
How to build a routing layer for AI agents that ensures sensitive data stays on local hardware while leveraging cloud LLMs for non-private tasks.
10 posts
How to build a routing layer for AI agents that ensures sensitive data stays on local hardware while leveraging cloud LLMs for non-private tasks.
Comparing vector databases and activation-based memory for AI agents. Trade-offs in latency, scale, and interpretability.
Moving beyond prompt engineering to implement token-level schema enforcement, pre-execution gates, and shell-safe execution pipelines for AI agents.
How to automate your homelab wiki with self-improving AI infrastructure
Open-sourcing the memory system behind my Claude Code setup: CLAUDE.md, path-scoped rules, wiki, vector search, cognitive memory. With the mistakes.
Implementing Karpathy's LLM Wiki in a homelab with real-world lessons and gotchas
Managing agent credentials with two-tier service accounts: a secure approach for AI agent orchestration
Fixing default runtime misconfigurations in NVIDIA Container Toolkit for GPU workloads
FastMCP makes building Model Context Protocol servers feel like FastAPI. Here's how to go from zero to a working MCP server in under an hour.
A practical guide to designing multi-agent AI systems — orchestrator patterns, trust boundaries, and the tradeoffs I learned running agents in production.