I am an Applied Scientist at Amazon Web Services. I obtained my Ph.D. from the Computer Science Department of Rutgers University, advised by Prof. Yongfeng Zhang. My current research focus is on large language models and deep learning for data integration. Previously, I worked on similarity search for high dimensional data, advised by Prof. Dong Deng. I obtained M.Phil. from the University of Queensland (advised by Prof. Xiaofang Zhou and Prof. Sibo Wang), and B.S. in Computer Science from Peking University.

I worked as an Applied Scientist Intern at Amazon Web Services and developed deep learning models for Identity Resolution in 2022. I worked as a Research Scientist Intern at Megagon Labs during the summer of 2021, mentored by Dr. Yuliang Li and Dr. Jin Wang.

Research interests

  • Recommender Systems
  • Natural Language Processing
  • Entity Resolution
  • Deep Learning
  • Similarity Search
  • Parallel Computing
  • Graph Algorithms

Publications

  1. Neural Locality Sensitive Hashing for Entity Blocking SDM 2024
    Runhui Wang, Luyang Kong, Yefan Tao, Andrew Borthwick, Davor Golac, Henrik Johnson, Shadie Hijazi, Dong Deng, Yongfeng Zhang.

  2. Pre-trained Language Models for Entity Blocking: A Reproducibility Study NAACL 2024
    Runhui Wang, Yongfeng Zhang.

  3. BPID: A Benchmark for Personal Identity Deduplication EMNLP 2024
    Runhui Wang*, Yefan Tao*, Adit Krishnan*, Luyang Kong*, Xuanqing Liu, Yuqian Deng, Yunzhao Yang, Henrik Johnson, Andrew Borthwick, Shobhit Gupta, Aditi Sinha Gundlapalli, Davor Golac.

  4. GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security KDD 2024
    Xuanqing Liu, Runhui Wang, Yang Song, Luyang Kong.

  5. Learning from Natural Language Explanations for Generalizable Entity Matching EMNLP 2024
    Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C Wallace, Luyang Kong.

  6. Language is All a Graph Needs EACL 2024 Findings
    Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, Yongfeng Zhang.

  7. Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation ICDE 2023 Code
    Runhui Wang, Yuliang Li, Jin Wang.

  8. DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search. VLDB 2021 Code
    Runhui Wang, Dong Deng.

  9. Parallelizing Approximate Single-Source Personalized PageRank Queries on Shared-Memory. VLDB Journal Code
    Runhui Wang, Sibo Wang, Xiaofang Zhou.
  10. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries. TODS
    Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, Nan Tang.

Miscellaneous

I love sports and enjoy professional trainings.

Since highschool, I’ve won championships for multiple provincial badminton competitions, and received the Chinese National second-level athelete Certification.

Since college, I’ve trained myself massively in running and powerlifting:

  • finished Beijing International Marathon in 4 hours 33 minutes (despite a sprained ankle);
  • participated in Nike University Elite Challenge-Wei Ming Relay (May 2015), and won the 3rd Place;
  • joined the First World Renowned Universities’ Dragon Boat Competition (Oct 2015), and won the 6th Place in 4000m Race
  • powerlifting: 455lbs deadlift, 415 lbs squat, 255lbs bench press