ANG: Accelerating NFA processing on GPUs via Exploring Multi-Level Fine-Grained Parallelism

Document Type

Conference Proceeding

Publication Date

12-16-2025

Department

Department of Computer Science

Abstract

Finite Automata (FA) processing is a core computation in various real-world applications. Over the past decades, extensive efforts have been dedicated to accelerating FA processing on modern parallel platforms, particularly GPUs, due to their high memory bandwidth and massive hardware parallelism. As Non-deterministic Finite Automata (NFA)-based applications have strong and growing demands for real-time data analytics nowadays, reducing latency in automata processing has become a critical priority. However, existing approaches face significant challenges when limited parallelism is exposed in NFA computations. In this work, we explore opportunities of introducing fine-grained parallelism from various sources and addressing the limitations of fast NFA processing. Specifically, by analyzing different NFA parallelization schemes, we identify the major performance issue caused by insufficient state-level parallelism in conventional designs. To overcome the bottleneck, this work introduces speculative parallelization tailored for GPU-based NFA processing, thus effectively exploiting fine-grained parallelism across multilevels, with a particular focus on input-chunk-level parallelism. To realize speculative parallelization in practice, we develop ANG, a latency-oriented NFA processing framework that overcomes key implementation challenges on GPUs. We evaluate the efficiency of ANG on a set of representative NFAs with diverse properties. Experimental results demonstrate that ANG achieves significant performance improvement compared to state-of-theart techniques, with reaching 11.74× speedup on average (and up to 49.88× in extreme cases).

Publication Title

2025 34th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Share

COinS