Sam Meyer has shipped and governed generative AI in regulated pharma.

At Bristol Myers Squibb I owned the enterprise generative-AI portfolio for a global Medical Affairs organization: a production RAG platform serving ~2,000 users across 50+ markets, an ~$8M annual book, governed under GxP and 21 CFR Part 11.

How the governance worked

I defined AI governance for regulated GenAI under GxP / 21 CFR Part 11: evaluation rubrics, grounding and drift review, human-in-the-loop gates, and clear tool boundaries. Retrieval and citations were treated as control surfaces, so answers carried visible provenance. Human review and audit traces were designed into the workflow from the start.

Regulated-enterprise track record

Hands-on evidence

A technical peer at BMS confirmed the hands-on work directly: experimenting with retrieval-augmented generation, improving retrieval relevance through ranking strategies, exploring agentic workflows, and using LLM-as-a-judge to assess response quality. A senior AI/ML architect confirmed that under this leadership the team deployed multiple production-grade GenAI applications handling high-volume Medical Affairs data within strict regulatory frameworks.

Where the measurement discipline comes from

The evaluation habit has a long arc. At the U.S. Bureau of Labor Statistics (2002 to 2007) I did natural-language processing and the statistical-modeling and data-pipeline work underneath a machine-learning workflow: NLP-based text classification of free-text survey write-ins into the 840-class occupational classification taxonomy, regression modeling to detect data-quality anomalies across large federal microdata, and automated structured pipelines at 400K+ record scale. Same instinct I bring to LLM evaluation now: define the labels, measure the failure mode, and trust the number only after you have tested it.