Uncovering Risks of Data-Free Feature Vector Inversion Attacks Against Vector Databases

Document Type

Article

Publication Date

1-1-2025

Abstract

The vector database stores data as high-dimensional feature vectors. Some recently proposed attack techniques enable an adversary to launch feature vector inversion (FVI) attacks against vector databases. In FVI attacks, an adversary trains an FVI attack network to reconstruct the original private data from their feature vectors based on the assumption that an auxiliary dataset is available to the adversary. However, such a data-available assumption is too strong, making such FVI attacks unrealistic in many real-world scenarios. In this paper, we make the first systematic study on FVI attacks against vector databases in the data-free setting. To tackle the issue of no training data, we develop an output-to-input data generation technique that helps to generate synthetic fake samples for the FVI attack network training. In addition, to ensure the high quality of generated fake samples, we develop the accelerable complete bipartite graph (CBG) search strategy and the downstream-classifier-aided generator training strategy. Furthermore, as the key insight of this work, we find that the proposed output-to-input data generation technique can be employed to launch the other three ML attacks. Intriguingly, we find that the proposed FVI attack technique in the data-free setting can be directly employed to boost the attack performance of FVI attacks in the auxiliary-dataset-available setting. Finally, we propose and study defenses against the proposed attacks.

Publication Title

IEEE Transactions on Dependable and Secure Computing

Share

COinS