Hengyi Wang | Notion

hengyi DOT wang AT rutgers DOT edu

Bio

I am Hengyi Wang (恒屹王), Applied Scientist II (GenAI) at Amazon Web Services working on production LLM systems.

My work focuses on post-training and evaluation of large language models, with an emphasis on alignment, agentic systems, and scalable evaluation pipelines.

Previously collaborated with Microsoft Research, and have had the opportunity to collaborate with https://www.microsoft.com/en-us/research/people/akshayn/, https://www.microsoft.com/en-us/research/people/taganu/ on long-context multimodal evaluation (Multimodal Needle-in-a-Haystack), deploying and stress-testing foundation models through Azure OpenAI.

Computer Science PhD graduated at Rutgers University. My research explores trustworthy and interpretable multimodal foundation models. I graduated with a B.S. in computer science from Turing Honor Class, https://english.pku.edu.cn/. I am fortunate to be advised by http://www.wanghao.in/.

Research Interests

My research focuses on the intersection of probabilistic machine learning and AI safety. I am currently working on interpretability and evaluation in areas such as multimodal large language models and personalized recommendation systems. The aim of my research is to develop understandable and controllable Artificial General Intelligence systems, with a particular emphasis on interpretability and alignment for the benefit of humanity.

Key areas: • LLM post-training: RLHF, RLAIF, DPO, PPO, GRPO • Agentic systems: prompt engineering, guardrails, tool orchestration • Evaluation: LLM-as-a-judge frameworks, automated model benchmarking • Multimodal LLM research and interpretability

News

(01/22/2025) Multimodal Needle in a Haystack (MMNeedle) was accepted at NAACL 2025 Main (Oral Presentation). See you in Albuquerque！
(09/20/2024) Our paper “Variational Language Concepts for Interpreting Foundation Language Models” was accepted at EMNLP 2024 Findings. See you in Miami!
(06/17/2024) We released the paper, code, and data for Multimodal Needle in a Haystack (MMNeedle) benchmark. Check our **project page** and feel free to use our benchmark！
(05/01/2024) Our paper “Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models” was accepted at ICML 2024.