Date of Award

2025

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy in Computer Science (PhD)

Administrative Home Department

Department of Computer Science

Advisor 1

Jianhui Yue

Committee Member 1

Soner Onder

Committee Member 2

Zhenlin Wang

Committee Member 3

Xiaoyong Yuan

Abstract

Approximate Nearest Neighbor (ANN) search has become a critical component in modern retrieval systems, supporting applications such as recommendation engines, semantic search, and large language models. Despite the effectiveness of embedding-based methods, performing ANN search over billion-scale, high-dimensional datasets remains computationally intensive. To address these challenges, this dissertation explores three system-level accelerator designs that improve the performance, scalability, and efficiency of ANN search. Our first work focuses on graph-based ANN search and addresses key bottlenecks in memory access and distance computation. We propose a near-memory accelerator that performs distance calculations within DRAM and transfers only compact results to the compute engine. To further enhance throughput, we propose parallel vertex expansion with bounded staleness and a prefetching strategy that improves memory access efficiency. This design achieves a 93.9% improvement in throughput, and further optimizations yield a 2.22× speedup over a state-of-the-art graph-based accelerator. Our second work targets Inverted File Product Quantization (IVFPQ)-based ANN search. Existing PQ accelerators suffer from excessive data movement and inefficient memory usage. We design a near-memory accelerator with in-DRAM distance computation, along with memory-aware cluster placement to balance workload. We also propose distance filters to eliminate non-contributing values and an asymmetric quantization method to reduce LUT storage without recall degradation. This design delivers an 11.3× throughput improvement and a 91.1% reduction in memory traffic compared to the ANNA accelerator. Our third work addresses scalability limitations by leveraging Compute Express Link (CXL) to access disaggregated memory. We propose CXL-ANNX, a distributed near-memory accelerator that executes sub-queries across multiple remote DRAM modules connected via CXL. To mitigate CXL’s latency and bandwidth constraints, we develop several system-level techniques: Termination Execution of Unnecessary Sub-queries, Speculative Search, and Memory-Aware Cluster Placement. Additionally, we integrate a lightweight learning-based Early Exit mechanism that reduces remote memory accesses by dynamically predicting termination points. Across four real-world datasets, CXL-ANNX with ALL-OPT achieves 13.1× throughput improvement over DiskANN on billion-scale datasets. Together, these contributions demonstrate a scalable and efficient architecture stack for high-performance ANN search, spanning graph- and IVFPQ-based methods, near-memory computing, and disaggregated memory systems.

Recommended Citation

Deng, Yifu, "Near Memory Accelerators for Approximate Nearest Neighbor Search", Campus Access Dissertation, Michigan Technological University, 2025.

https://doi.org/10.37099/mtu.dc.etdr/1961

Download

Available for download on Saturday, August 01, 2026

COinS

ORCID

0009-0002-4265-5534

Dissertations, Master's Theses and Master's Reports

Near Memory Accelerators for Approximate Nearest Neighbor Search

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Recommended Citation

ORCID

LINKS

Browse

Search

Author Corner

Dissertations, Master's Theses and Master's Reports

Near Memory Accelerators for Approximate Nearest Neighbor Search

Author

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Recommended Citation

Share

ORCID

LINKS

Browse

Search

Author Corner