Shiqiang Wang, Nathalie Baracaldo Angel, et al.
NeurIPS 2022
Federated Learning (FL) has emerged as a privacy-preserving framework for training models on data generated at the edge. However, the heterogeneity of data silos (e.g., label skew and domain shift) often leads to inconsistent learning objectives and suboptimal model performance. Inspired by the data-driven approach, we propose Flick, a novel data generation framework for heterogeneous Federated Learning with Commonsense Knowledge from Large Language Models (LLMs). In Flick, the client performs the local data summary to capture client-specific knowledge in textual form. The central server then distills task-relevant, high-quality knowledge from the out-of-the-box LLM -- guided by cross-client-specific insights -- to generate informative text prompts. These prompts direct a generative model in producing synthetic data, enabling global model fine-tuning and local data compensation. This process gradually aligns the label and feature distributions across clients. Extensive results demonstrate that Flick improves the global model accuracy by up to 11.43%, and accelerates convergence by up to 12.9x, validating its effectiveness in addressing data heterogeneity.
Shiqiang Wang, Nathalie Baracaldo Angel, et al.
NeurIPS 2022
Vidushi Sharma, Andy Tek, et al.
NeurIPS 2025
Yuanzhe Liu, Ryan Deng, et al.
NeurIPS 2025
Zheyu Chen, Kin K. Leung, et al.
IEEE TCCN