Clustered Federated Learning

Publication Type:
Issue Date:
Full metadata record
Heterogeneous federated learning without assuming any structure is challenging due to the conflicts among non-identical data distributions of clients. In practice, clients often comprise near-homogeneous clusters, so training a server-side model per cluster mitigates the conflicts, which is called clustered FL. With new insights and perspectives, we propose a unified bi-level optimization framework for clustered FL methodologies. Based on this, we present a fundamental method called Weighted Clustered Federated Learning (WeCFL). Additionally, we introduce a novel theoretical analysis framework for its convergence analysis. This framework factors in the clusterability among clients to measure the effects of intra-cluster non-IIDness, and a linear convergence rate of O(1/T) is achieved. To enhance the robustness of clustering, we propose a methodology termed Clustered FL with Contrastive Learning (CFL-CON), which can be integrated into our previously proposed clustered FL frameworks and many other clustered FL methods. We propose two variants based on the space of representation and parameters respectively. To address the lack of knowledge sharing due to robust clustering and to improve performance, we propose another generic add-on technique, Clustered FL with Clustered Knowledge Sharing (CFL-CKS). We conduct a theoretical analysis of the term’s simplification, convergence, and interpretation, providing a comprehensive understanding. Furthermore, to bridge the trade-off between these two add-ons, we propose Clustered iv FL with Contrastive Learning and Clustered Knowledge Sharing (CFL-CON&CKS). This method applies contrastive learning to the head of the neural network to create distance, and knowledge sharing to the backbone of the neural network to facilitate knowledge sharing. Lastly, to address the problem of clustering collapse and to stabilize clustered FL, we propose Clustered Additive Modeling (CAM). This method applies a globally shared model along with the cluster-wise models. The global model captures the features shared by all clusters, so cluster-wise models are enforced to focus on the differences among clusters. The asymptotic convergence rate is proved. Experimental simulations also demonstrate the superiority of our methods in terms of robustness, stability of clustering, effectiveness in mitigating clustering collapse and performance. All methods are implemented with unified datasets, non-IID settings, models, optimizers, and baselines, as detailed in the appendix, to ensure consistency.
Please use this identifier to cite or link to this item: