I’m an Applied Scientist at AWS, where I work on post-training LLMs with GRPO/RLAIF and build reliable post-training and inference infrastructure on Nvidia B200/B300 with VeRL, vLLM, flash-attention, custom kernels. I also work on LLM-based retrieval and entity matching at billion scale for the AWS Entity Resolution Service, where I design, build and ship ML solutions for enterprise customers.
Before AWS I completed my Ph.D. at Rutgers University under Prof. Yongfeng Zhang and Prof. Dong Deng, focused on LLMs for data integration, retrieval, and similarity search. M.Phil. at the University of Queensland with Prof. Xiaofang Zhou (IEEE Fellow) and Prof. Sibo Wang. B.S. from Peking University.
I’m interested in the hardware-software seam of LLM training and inference — RL post-training systems, kernels, and serving infrastructure. I previously interned at AWS (Applied Scientist, 2022) and at Megagon Labs (Research Scientist, 2021), mentored by Dr. Yuliang Li and Dr. Jin Wang.
Research interests
- RL post-training for LLMs and beyond (GRPO, RLAIF, DPO)
- RL post-training infrastructure
- LLM Agents
- LLM-based retrieval and entity matching
- Efficient Large-Scale Similarity Search
Open Source
- CSR Bench - Multi-Agent system for automatic computer science research repo exploration (NAACL 2025 Oral)
- DeltaPQ — Lossless product quantization compression for similarity search (VLDB 2021)
- PAFO — Parallel approximate personalized PageRank (VLDB Journal 2019)
- Declarative Data Pipeline
Publications
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories NAACL 2025 Oral
Yijia Xiao, Runhui Wang, Luyang Kong, Davor Golac, Wei Wang.Language is All a Graph Needs EACL 2024 Findings
Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, Yongfeng Zhang.BPID: A Benchmark for Personal Identity Deduplication EMNLP 2024
Runhui Wang*, Yefan Tao*, Adit Krishnan*, Luyang Kong*, Xuanqing Liu, Yuqian Deng, Yunzhao Yang, Henrik Johnson, Andrew Borthwick, Shobhit Gupta, Aditi Sinha Gundlapalli, Davor Golac.Large Language Models for Entity Blocking: A Reproducibility Study NAACL 2024
Runhui Wang, Yongfeng Zhang.GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security KDD 2024
Xuanqing Liu, Runhui Wang, Yang Song, Luyang Kong.Learning from Natural Language Explanations for Generalizable Entity Matching EMNLP 2024
Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C Wallace, Luyang Kong.Neural Locality Sensitive Hashing for Entity Blocking SDM 2024
Runhui Wang, Luyang Kong, Yefan Tao, Andrew Borthwick, Davor Golac, Henrik Johnson, Shadie Hijazi, Dong Deng, Yongfeng Zhang.Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation ICDE 2023 · Code
Runhui Wang, Yuliang Li, Jin Wang.
Earlier work
DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search VLDB 2021 · Code
Runhui Wang, Dong Deng.Parallelizing Approximate Single-Source Personalized PageRank Queries on Shared-Memory VLDB Journal 2019 · Code
Runhui Wang, Sibo Wang, Xiaofang Zhou.Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries TODS 2019
Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, Nan Tang.
Miscellaneous
I love sports and enjoy professional trainings.
Since highschool, I’ve won championships for multiple provincial badminton competitions, and received the Chinese National Second-level Athelete Certification.
Since college, I’ve trained myself massively in running and powerlifting:
- finished Beijing International Marathon in 4 hours 33 minutes (despite a sprained ankle);
- participated in Nike University Elite Challenge-Wei Ming Relay (May 2015), and won the 3rd Place;
- joined the First World Renowned Universities’ Dragon Boat Competition (Oct 2015), and won the 6th Place in 4000m Race
- powerlifting: 455lbs deadlift, 415 lbs squat, 255lbs bench press
