Uncovering Risks of Data-Free Feature Vector Inversion Attacks Against Vector Databases
Document Type
Article
Publication Date
1-1-2025
Abstract
The vector database stores data as high-dimensional feature vectors. Some recently proposed attack techniques enable an adversary to launch feature vector inversion (FVI) attacks against vector databases. In FVI attacks, an adversary trains an FVI attack network to reconstruct the original private data from their feature vectors based on the assumption that an auxiliary dataset is available to the adversary. However, such a data-available assumption is too strong, making such FVI attacks unrealistic in many real-world scenarios. In this paper, we make the first systematic study on FVI attacks against vector databases in the data-free setting. To tackle the issue of no training data, we develop an output-to-input data generation technique that helps to generate synthetic fake samples for the FVI attack network training. In addition, to ensure the high quality of generated fake samples, we develop the accelerable complete bipartite graph (CBG) search strategy and the downstream-classifier-aided generator training strategy. Furthermore, as the key insight of this work, we find that the proposed output-to-input data generation technique can be employed to launch the other three ML attacks. Intriguingly, we find that the proposed FVI attack technique in the data-free setting can be directly employed to boost the attack performance of FVI attacks in the auxiliary-dataset-available setting. Finally, we propose and study defenses against the proposed attacks.
Publication Title
IEEE Transactions on Dependable and Secure Computing
Recommended Citation
Qin, S.,
Lei, X.,
Mu, N.,
Huang, H.,
Xie, T.,
&
Zhang, X.
(2025).
Uncovering Risks of Data-Free Feature Vector Inversion Attacks Against Vector Databases.
IEEE Transactions on Dependable and Secure Computing.
http://doi.org/10.1109/TDSC.2025.3605268
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/2043