Beyond Class-Level Privacy Leakage: Breaking Record-Level Privacy in Federated Learning
Federated learning (FL) enables multiple clients to collaboratively build a global learning model without sharing their own raw data for privacy protection. Unfortunately, recent research still found privacy leakage in FL, especially on image classification tasks, such as the reconstruction of class representatives. Nevertheless, such analysis on image classification tasks is not applicable to uncover the privacy threats against natural language processing (NLP) tasks, whose records composed of sequential texts cannot be grouped as class representatives. The finer (record-level) granularity in NLP tasks not only makes it more challenging to extract individual text records, but also exposes more serious threats. This article presents the first attempt to explore the record-level privacy leakage against NLP tasks in FL. We propose a framework to investigate the exposure of the records of interest in federated aggregations by leveraging the perplexity of language modeling. Through monitoring the exposure patterns, we propose two correlation attacks to identify the corresponding clients when extracting their specific records. Extensive experimental results demonstrate the effectiveness of the proposed attacks. We have also examined several countermeasures and shown that they are ineffective to mitigate such attacks, and hence further research is expected.