Date of Award

2025

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy in Computer Science (PhD)

Administrative Home Department

Department of Computer Science

Advisor 1

Jianhui Yue

Committee Member 1

Soner Onder

Committee Member 2

Zhenlin Wang

Committee Member 3

Xiaoyong Yuan

Abstract

Approximate Nearest Neighbor (ANN) search has become a critical component in modern retrieval systems, supporting applications such as recommendation engines, semantic search, and large language models. Despite the effectiveness of embedding-based methods, performing ANN search over billion-scale, high-dimensional datasets remains computationally intensive. To address these challenges, this dissertation explores three system-level accelerator designs that improve the performance, scalability, and efficiency of ANN search. Our first work focuses on graph-based ANN search and addresses key bottlenecks in memory access and distance computation. We propose a near-memory accelerator that performs distance calculations within DRAM and transfers only compact results to the compute engine. To further enhance throughput, we propose parallel vertex expansion with bounded staleness and a prefetching strategy that improves memory access efficiency. This design achieves a 93.9% improvement in throughput, and further optimizations yield a 2.22× speedup over a state-of-the-art graph-based accelerator. Our second work targets Inverted File Product Quantization (IVFPQ)-based ANN search. Existing PQ accelerators suffer from excessive data movement and inefficient memory usage. We design a near-memory accelerator with in-DRAM distance computation, along with memory-aware cluster placement to balance workload. We also propose distance filters to eliminate non-contributing values and an asymmetric quantization method to reduce LUT storage without recall degradation. This design delivers an 11.3× throughput improvement and a 91.1% reduction in memory traffic compared to the ANNA accelerator. Our third work addresses scalability limitations by leveraging Compute Express Link (CXL) to access disaggregated memory. We propose CXL-ANNX, a distributed near-memory accelerator that executes sub-queries across multiple remote DRAM modules connected via CXL. To mitigate CXL’s latency and bandwidth constraints, we develop several system-level techniques: Termination Execution of Unnecessary Sub-queries, Speculative Search, and Memory-Aware Cluster Placement. Additionally, we integrate a lightweight learning-based Early Exit mechanism that reduces remote memory accesses by dynamically predicting termination points. Across four real-world datasets, CXL-ANNX with ALL-OPT achieves 13.1× throughput improvement over DiskANN on billion-scale datasets. Together, these contributions demonstrate a scalable and efficient architecture stack for high-performance ANN search, spanning graph- and IVFPQ-based methods, near-memory computing, and disaggregated memory systems.

Available for download on Saturday, August 01, 2026

Share

COinS