Custom AI agents, LLM-powered internal tools, workflows where the model is one component in a real system. I build AI applications that solve specific business problems — not generic chatbots, not demos that fall apart in production.
Everyone has seen the demo: ask an AI agent to book a flight or summarize a doc, watch it succeed. Most of those demos break the moment they hit real data, real edge cases, or real cost constraints. Moving from 'demo that works 80% of the time' to 'production system that a business can rely on' is a different project entirely.
The real work is the unglamorous part: prompt engineering with version control, evaluation on gold-standard datasets, fallback logic for when the model gets it wrong, observability so you can debug failures, cost tracking so you don't wake up to a $30k monthly bill, integration with your real systems (CRM, database, APIs), and a UX that matches the actual confidence level the AI has.
That's where I work. I build AI agents that connect to your real data, do real work, and survive contact with production. Often the result looks less like a ChatGPT interface and more like an ordinary web app with AI quietly running specific decisions underneath.
Task-specific agents — lead qualification, customer support triage, document processing, data enrichment, research. Connected to your real systems, not sandbox toys.
Web apps with AI running specific decisions underneath. Looks like an ordinary tool, works much smarter. Your team uses it without knowing or caring which model is behind it.
AI assistants that actually know your company — trained on your docs, wiki, tickets, databases. Real citations. No hallucinations on factual questions.
Use the right model for each step. Cheap fast model for classification. Expensive slow model for complex reasoning. Your code doesn't care — the router decides.
Gold-standard test sets. Automated evaluation on every deploy. Full tracing of every AI call. Catch regressions before users do.
Your agents, your prompts, your eval data, your deployment. No SaaS dependency beyond the model provider. Clean handoff if you want to operate it yourself.
I avoid heavy frameworks that abstract away what the model actually sees. Prompts in version control. Real code, not drag-and-drop chains.
Discovery is free. Problem scoping (Phase 1) is fixed-price and produces a go / no-go answer whether or not we continue.
Independent senior engineer. 20+ years software engineering, five years production AI — across agents, RAG, custom integrations, self-hosted deployments.
I'm an engineer who discovered AI late, by the standards of the hype cycle — meaning I came in with twenty years of discipline around production code, evaluation, and observability. That turns out to be exactly the skill set AI applications need.
What I build isn't glamorous. It's agents that handle lead intake for real businesses, RAG systems that actually return correct answers, workflows where LLMs are one trusted component in a larger deterministic system. The stuff that makes a business operation better, not the stuff that goes on a demo reel.
Based in Chicago. Working worldwide. Direct contracts or US entity.
Custom makes sense when your workflow is specific enough that no product fits, or when your data is sensitive enough that you can't use SaaS AI. For generic use cases — just writing, just meeting notes, just summarization — use the product. Custom for everything else.
Depends on the task. Most production agents use 2-3 models: a cheap fast one for routing/classification, a capable one for the main work, maybe a frontier model for edge cases. I'll recommend the mix based on your workload and budget.
Evaluation first. I build eval sets from real data, measure accuracy honestly, and design fallback logic for failures — retry, escalate to human, route to different model, or refuse to act. No 'trust the model and hope'.
For factual tasks (RAG, data extraction), I ground models in your actual data with citations. For generative tasks (drafting, brainstorming), hallucination is sometimes the feature — we constrain where it matters, accept where it doesn't. Evaluation tells us which is which.
Yes — with guardrails. I typically design tiered autonomy: read-only actions automatic, write actions staged for review, high-stakes actions always human-approved. The right cut depends on your risk tolerance and the specific task.
Rarely. I prefer plain code for orchestration — easier to debug, less hidden behavior, faster to change. Frameworks make sense when they save real time without hiding important choices; usually they don't.
Prompt updates as models change. New examples added to eval sets. Performance tuning. New edge cases as usage grows. Typically 10-30% of initial build cost per year as retained engineer — or 0 if you take it over internally.
Yes. Standard mutual NDA before anything sensitive is discussed.
Every agent has an evaluation set. I measure accuracy on real data before claiming anything works.
Every AI call traced — prompt, output, latency, cost. You debug by reading logs, not guessing.
If a task isn't solvable at the quality level you need, I'll tell you in Phase 1 and refund if you want out. Better than an expensive failure.
Custom code, not proprietary platforms. Your agent doesn't depend on me or my tools continuing to exist.
Per-call budgets, usage alerts, rate limits. No surprise bills.
I don't ship fully-autonomous agents on high-stakes actions. Review gates are a feature, not a limitation.
Discovery calls are free. Come with an actual problem — a workflow you'd like to automate, a decision you'd like AI to make, a tool you'd like your team to have. We'll work through whether AI is the right answer, and what it would take.