Research & Publications
-
SAM Audio Judge: A Reference-Free Audio Separation Evaluation Metric
Reference-free evaluation metric for audio separation, trained on human annotation data from campaigns I helped co-lead.
arXiv -
SAM Audio: Segment Anything in Audio
Core contributor. Led all human evaluations, created SAM Audio Bench, developed novel evaluation protocol. Foundation model for general audio separation.
arXiv -
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Analysis of dependability challenges across two large GPU clusters processing 4M+ jobs and 150M+ A100 GPU hours, with failure taxonomy, MTTF projections, and metrics for software mitigation effectiveness.
arXiv DOI -
Movie Gen: A Cast of Media Foundation Models
Co-led audio evaluation for Movie Gen, Meta's media generation foundation models. 448 citations, 1,100+ HN points.
arXiv -
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
Co-led human evaluations for SeamlessM4T, the first unified model for all four translation modalities. 450+ citations, published in Nature.
DOI -
No Language Left Behind: Scaling Human-Centered Machine Translation
Worked on evaluations for NLLB, a multilingual translation model supporting 200 languages. 1,500+ citations, published in Nature.
DOI