Research & Publications

SAM Audio Judge: A Reference-Free Audio Separation Evaluation Metric

Wang, Chen, Dunn, Hoffman, et al.

Reference-free evaluation metric for audio separation, trained on human annotation data from campaigns I helped co-lead.

January 2026

arXiv
SAM Audio: Segment Anything in Audio

SAM Audio Team, including John Hoffman

Core contributor. Led all human evaluations, created SAM Audio Bench, developed novel evaluation protocol. Foundation model for general audio separation.

January 2026

arXiv
Revisiting Reliability in Large-Scale Machine Learning Research Clusters

Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu

Analysis of dependability challenges across two large GPU clusters processing 4M+ jobs and 150M+ A100 GPU hours, with failure taxonomy, MTTF projections, and metrics for software mitigation effectiveness.

HPCA 2025 March 2025

arXiv DOI
Movie Gen: A Cast of Media Foundation Models

Movie Gen Team, including John Hoffman

Co-led audio evaluation for Movie Gen, Meta's media generation foundation models. 448 citations, 1,100+ HN points.

October 2024

arXiv
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

SeamlessM4T Team, including John Hoffman

Co-led human evaluations for SeamlessM4T, the first unified model for all four translation modalities. 450+ citations, published in Nature.

Nature August 2024

DOI
No Language Left Behind: Scaling Human-Centered Machine Translation

NLLB Team, including John Hoffman

Worked on evaluations for NLLB, a multilingual translation model supporting 200 languages. 1,500+ citations, published in Nature.

Nature July 2024

DOI