Sam Meyer has shipped and governed generative AI in regulated pharma.

At Bristol Myers Squibb I owned the enterprise generative-AI portfolio for a global Medical Affairs organization: a production RAG platform serving ~2,000 users across 50+ markets, an ~$8M annual book, governed under GxP and 21 CFR Part 11.

How the governance worked

I defined AI governance for regulated GenAI under GxP / 21 CFR Part 11: evaluation rubrics, grounding and drift review, human-in-the-loop gates, and clear tool boundaries. Retrieval and citations were treated as control surfaces, so answers carried visible provenance. Human review and audit traces were designed into the workflow from the start.

Regulated-enterprise track record

Owned an ~$8M annual GenAI and digital portfolio serving Global Medical Affairs (2024 to 2026), a production RAG platform serving ~2,000 users across 50+ markets.
Owned a ~$10M multi-year Medical Affairs capital book with VP/SVP business-case accountability (2021 to 2024).
Owned a $15M capital book across Medical Affairs commercialization capabilities (2017 to 2021).
Led a $74B Celgene M&A technology integration across 20+ platforms, 1,500+ users, and 50+ markets in 18 months.
Managed a $17M+ annual federal regulatory IT portfolio at PwC, with 200+ users and 100 to 150 concurrent investigations.

Hands-on evidence

A technical peer at BMS confirmed the hands-on work directly: experimenting with retrieval-augmented generation, improving retrieval relevance through ranking strategies, exploring agentic workflows, and using LLM-as-a-judge to assess response quality. A senior AI/ML architect confirmed that under this leadership the team deployed multiple production-grade GenAI applications handling high-volume Medical Affairs data within strict regulatory frameworks.

Where the measurement discipline comes from

The evaluation habit has a long arc. At the U.S. Bureau of Labor Statistics (2002 to 2007) I did natural-language processing and the statistical-modeling and data-pipeline work underneath a machine-learning workflow: NLP-based text classification of free-text survey write-ins into the 840-class occupational classification taxonomy, regression modeling to detect data-quality anomalies across large federal microdata, and automated structured pipelines at 400K+ record scale. Same instinct I bring to LLM evaluation now: define the labels, measure the failure mode, and trust the number only after you have tested it.