The lead is the regulated AI work shipped at scale. Public repos back the hands-on claim.
Owned the enterprise generative-AI portfolio for a global Medical Affairs organization: a production RAG platform serving ~2,000 users across 50+ markets, an ~$8M annual GenAI and digital portfolio, governed under GxP and 21 CFR Part 11.
Built and governed retrieval-grounded generation for regulated Medical Affairs workflows: hybrid retrieval, ranking strategies, evaluation rubrics, grounding and drift review, human-in-the-loop gates, LLM-as-a-judge, and audit traces. FastAPI / retrieval / AWS. Confirmed hands-on by a technical peer.
Led Celgene Medical Affairs IT workstreams across 20+ platforms, 1,500+ users, and 50+ markets in 18 months. Delivered $7M in operating efficiency and $4.3M in annual cost avoidance.
Go gateway in front of every MCP tool call, applying Cedar policy, identity binding, Slack approval gates, and audit sinks. Aligned to the OWASP LLM Top 10 as the prompt-injection and excessive-agency boundary.
RAG evaluation pattern measuring recall and ranking with NDCG, MRR, MAP, and P@k, plus paired bootstrap, Holm correction, Cliff's delta, and TOST equivalence. Grounding checks flag unsupported answers.
Semantic memory MCP server (spec 2025-11-25) with hybrid FTS5 keyword plus vector recall, running locally with no per-call token spend.
Label-driven GitHub Actions flow where issues become plans, human-approved work, tests, and pull requests, with scope controls and explicit verification paths.
Live product that turns discovery-call transcripts into client-ready proposals, SOWs, estimates, and HIPAA BAAs, with deterministic WBS estimation, PHI gates, and audit logs.
Live product surface for cost-aware AI video generation: itemized cost preview, budget caps, BYO provider keys, encrypted key vault, Temporal workflow, and production health checks.
Checkpointed agent runs with eval gates, and a typed MCP server scaffold with Zod-validated tool inputs.
Reproducible ML-evaluation research on how metric choices and weak bootstrap assumptions reverse model rankings, and a recompute of a labor-market capability framework against the 2026 frontier.
See all 23 repos and live product surfaces on the main site.