Date of Award
2025
Document Type
Campus Access Dissertation
Degree Name
Doctor of Philosophy in Computer Science (PhD)
Administrative Home Department
Department of Computer Science
Advisor 1
Jianhui Yue
Committee Member 1
Soner Onder
Committee Member 2
Zhenlin Wang
Committee Member 3
Xiaoyong Yuan
Abstract
Approximate Nearest Neighbor (ANN) search has become a critical component in modern retrieval systems, supporting applications such as recommendation engines, semantic search, and large language models. Despite the effectiveness of embedding-based methods, performing ANN search over billion-scale, high-dimensional datasets remains computationally intensive. To address these challenges, this dissertation explores three system-level accelerator designs that improve the performance, scalability, and efficiency of ANN search. Our first work focuses on graph-based ANN search and addresses key bottlenecks in memory access and distance computation. We propose a near-memory accelerator that performs distance calculations within DRAM and transfers only compact results to the compute engine. To further enhance throughput, we propose parallel vertex expansion with bounded staleness and a prefetching strategy that improves memory access efficiency. This design achieves a 93.9% improvement in throughput, and further optimizations yield a 2.22× speedup over a state-of-the-art graph-based accelerator. Our second work targets Inverted File Product Quantization (IVFPQ)-based ANN search. Existing PQ accelerators suffer from excessive data movement and inefficient memory usage. We design a near-memory accelerator with in-DRAM distance computation, along with memory-aware cluster placement to balance workload. We also propose distance filters to eliminate non-contributing values and an asymmetric quantization method to reduce LUT storage without recall degradation. This design delivers an 11.3× throughput improvement and a 91.1% reduction in memory traffic compared to the ANNA accelerator. Our third work addresses scalability limitations by leveraging Compute Express Link (CXL) to access disaggregated memory. We propose CXL-ANNX, a distributed near-memory accelerator that executes sub-queries across multiple remote DRAM modules connected via CXL. To mitigate CXL’s latency and bandwidth constraints, we develop several system-level techniques: Termination Execution of Unnecessary Sub-queries, Speculative Search, and Memory-Aware Cluster Placement. Additionally, we integrate a lightweight learning-based Early Exit mechanism that reduces remote memory accesses by dynamically predicting termination points. Across four real-world datasets, CXL-ANNX with ALL-OPT achieves 13.1× throughput improvement over DiskANN on billion-scale datasets. Together, these contributions demonstrate a scalable and efficient architecture stack for high-performance ANN search, spanning graph- and IVFPQ-based methods, near-memory computing, and disaggregated memory systems.
Recommended Citation
Deng, Yifu, "Near Memory Accelerators for Approximate Nearest Neighbor Search", Campus Access Dissertation, Michigan Technological University, 2025.
https://digitalcommons.mtu.edu/etdr/1961