I am an Applied Scientist at Amazon Web Services. I obtained my Ph.D. from the Computer Science Department of Rutgers University, advised by Prof. Yongfeng Zhang. My current research focus is on large language models and deep learning for data integration. Previously, I worked on similarity search for high dimensional data, advised by Prof. Dong Deng. I obtained M.Phil. from the University of Queensland (advised by Prof. Xiaofang Zhou and Prof. Sibo Wang), and B.S. in Computer Science from Peking University.

I worked as an Applied Scientist Intern at Amazon Web Services and developed deep learning models for Identity Resolution in 2022. I worked as a Research Scientist Intern at Megagon Labs during the summer of 2021, mentored by Dr. Yuliang Li and Dr. Jin Wang.

Research interests

Recommender Systems
LLM Agents
Entity Resolution
Deep Learning
Similarity Search
Parallel Computing
Graph Algorithms

Publications

CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories NAACL 2025
Yijia, Xiao, Runhui Wang, Luyang Kong, Davor Goalc, Wei Wang.
Neural Locality Sensitive Hashing for Entity Blocking SDM 2024
Runhui Wang, Luyang Kong, Yefan Tao, Andrew Borthwick, Davor Golac, Henrik Johnson, Shadie Hijazi, Dong Deng, Yongfeng Zhang.
Pre-trained Language Models for Entity Blocking: A Reproducibility Study NAACL 2024
Runhui Wang, Yongfeng Zhang.
BPID: A Benchmark for Personal Identity Deduplication EMNLP 2024
Runhui Wang*, Yefan Tao*, Adit Krishnan*, Luyang Kong*, Xuanqing Liu, Yuqian Deng, Yunzhao Yang, Henrik Johnson, Andrew Borthwick, Shobhit Gupta, Aditi Sinha Gundlapalli, Davor Golac.
GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security KDD 2024
Xuanqing Liu, Runhui Wang, Yang Song, Luyang Kong.
Learning from Natural Language Explanations for Generalizable Entity Matching EMNLP 2024
Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C Wallace, Luyang Kong.
Language is All a Graph Needs EACL 2024 Findings
Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, Yongfeng Zhang.
Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation ICDE 2023 Code
Runhui Wang, Yuliang Li, Jin Wang.
DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search. VLDB 2021 Code
Runhui Wang, Dong Deng.
Parallelizing Approximate Single-Source Personalized PageRank Queries on Shared-Memory. VLDB Journal Code
Runhui Wang, Sibo Wang, Xiaofang Zhou.
Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries. TODS
Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, Nan Tang.

Miscellaneous

I love sports and enjoy professional trainings.

Since highschool, I’ve won championships for multiple provincial badminton competitions, and received the Chinese National second-level athelete Certification.

Since college, I’ve trained myself massively in running and powerlifting:

finished Beijing International Marathon in 4 hours 33 minutes (despite a sprained ankle);
participated in Nike University Elite Challenge-Wei Ming Relay (May 2015), and won the 3rd Place;
joined the First World Renowned Universities’ Dragon Boat Competition (Oct 2015), and won the 6th Place in 4000m Race
powerlifting: 455lbs deadlift, 415 lbs squat, 255lbs bench press

Runhui Wang

Research interests

Publications

Miscellaneous