I’m an Applied Scientist at AWS, where I build RL post-training infrastructure on Nvidia B200/B300 — VeRL, vLLM, flash-attention, custom kernels. I also work on LLM-based retrieval and entity matching at billion scale for the AWS Entity Resolution Service.

Before AWS I completed my Ph.D. at Rutgers University under Prof. Yongfeng Zhang and Prof. Dong Deng, focused on LLMs for data integration, retrieval, and similarity search. M.Phil. at the University of Queensland with Prof. Xiaofang Zhou (IEEE Fellow) and Prof. Sibo Wang. B.S. from Peking University.

I’m interested in the hardware-software seam of LLM training and inference — RL post-training systems, kernels, and serving infrastructure. I previously interned at AWS (Applied Scientist, 2022) and at Megagon Labs (Research Scientist, 2021), mentored by Dr. Yuliang Li and Dr. Jin Wang.

Research interests

  • RL post-training infrastructure
  • LLM serving & inference optimization
  • Kernels and hardware-software co-design
  • LLM-based retrieval and entity matching
  • Efficient Large-Scale Similarity Search

Open Source

  • Personal 4×RTX 4090 home cluster — VeRL, vLLM, and flash-attention experiments
  • DeltaPQ — Lossless product quantization compression for similarity search (VLDB 2021)
  • PAFO — Parallel approximate personalized PageRank (VLDB Journal 2019)

Publications

  1. CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories NAACL 2025 Oral
    Yijia Xiao, Runhui Wang, Luyang Kong, Davor Golac, Wei Wang.

  2. Language is All a Graph Needs EACL 2024 Findings
    Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, Yongfeng Zhang.

  3. BPID: A Benchmark for Personal Identity Deduplication EMNLP 2024
    Runhui Wang*, Yefan Tao*, Adit Krishnan*, Luyang Kong*, Xuanqing Liu, Yuqian Deng, Yunzhao Yang, Henrik Johnson, Andrew Borthwick, Shobhit Gupta, Aditi Sinha Gundlapalli, Davor Golac.

  4. Large Language Models for Entity Blocking: A Reproducibility Study NAACL 2024
    Runhui Wang, Yongfeng Zhang.

  5. GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security KDD 2024
    Xuanqing Liu, Runhui Wang, Yang Song, Luyang Kong.

  6. Learning from Natural Language Explanations for Generalizable Entity Matching EMNLP 2024
    Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C Wallace, Luyang Kong.

  7. Neural Locality Sensitive Hashing for Entity Blocking SDM 2024
    Runhui Wang, Luyang Kong, Yefan Tao, Andrew Borthwick, Davor Golac, Henrik Johnson, Shadie Hijazi, Dong Deng, Yongfeng Zhang.

  8. Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation ICDE 2023 · Code
    Runhui Wang, Yuliang Li, Jin Wang.

Earlier work

  1. DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search VLDB 2021 · Code
    Runhui Wang, Dong Deng.

  2. Parallelizing Approximate Single-Source Personalized PageRank Queries on Shared-Memory VLDB Journal 2019 · Code
    Runhui Wang, Sibo Wang, Xiaofang Zhou.

  3. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries TODS 2019
    Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, Nan Tang.

Miscellaneous

I love sports and enjoy professional trainings.

Since highschool, I’ve won championships for multiple provincial badminton competitions, and received the Chinese National Second-level Athelete Certification.

Since college, I’ve trained myself massively in running and powerlifting:

  • finished Beijing International Marathon in 4 hours 33 minutes (despite a sprained ankle);
  • participated in Nike University Elite Challenge-Wei Ming Relay (May 2015), and won the 3rd Place;
  • joined the First World Renowned Universities’ Dragon Boat Competition (Oct 2015), and won the 6th Place in 4000m Race
  • powerlifting: 455lbs deadlift, 415 lbs squat, 255lbs bench press