Unlocking the Potential of Prompt-Tuning in Federated Learning
A new paper from Vector Faculty Member Xiaoxiao Li presents a new approach combining generalized and personalized learning into an efficient system capable of handling data heterogeneity. Called shared and […] The post Unlocking the Potential of Prompt-Tuning in Federated Learning appeared first on Vector Institute for Artificial Intelligence .
A new paper from Vector Faculty Member Xiaoxiao Li presents a new approach combining generalized and personalized learning into an efficient system capable of handling data heterogeneity. Called shared and group prompt tuning (SGPT), the method improves performance and enhances safety, and interpretability.
TLDR: Uncover groundbreaking AI research in 3 minutes
This concise summary bridges the gap between complex scientific advancements and everyday understanding. Ideal for enthusiasts and non-researchers, start listening now.
“Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning,” co-authored by Wenlong Deng and Christos Thrampoulidis, showcases how this innovative approach combines the strengths of generalized learning (where an AI learns from various sources) and personalized learning (where an AI is tailored to specific users). The design allows the algorithm to capture both common and specialized features, facilitating better alignment with diverse local data distributions without requiring local fine-tuning.
Federated learning aims to train machine learning models across multiple clients without sharing their data, making it crucial in domains like computer vision. However, data heterogeneity, characterized by domain discrepancies or imbalanced class distributions, presents a significant hurdle. Traditional generalized federated learning methods, which learn a single global model, often struggle with significant data heterogeneity. Personalized federated learning methods, which tailor models to individual clients, can lead to overfitting.
Background and Motivation
Traditional FL approaches can be broadly categorized into generalized FL (GFL) and personalized FL (PFL). GFL aims to learn a single global model that generalizes well across all clients, while PFL focuses on tailoring models to individual clients or client groups. Both approaches have limitations: GFL struggles with significant data heterogeneity, while PFL may overfit to local data and fail to generalize to out-of-federation clients.
To tackle these challenges, the authors introduce SGPT, a novel algorithm that blends the advantages of both GFL and PFL. SGPT harnesses the power of vision transformers (ViTs), which, while traditionally seen as computationally intensive, have recently benefited from parameter-efficient tuning methods like prompt tuning that greatly improve their efficiency, making them well-suited for FL. By applying prompt-tuning techniques, SGPT establishes a flexible and efficient FL framework optimized for model tuning in distributed environments.
SGPT Methodology
The core idea behind SGPT is to learn both shared prompts and group-specific prompts, allowing the model to capture common features across all clients while also adapting to group-specific characteristics. Here’s a breakdown of the key components:
-
Shared prompts: These are designed to capture common representations across all clients. They are attached to the early layers of the ViT model, where features tend to be more uniform across different classes.
-
Group prompts: These prompts are designed to extract specialized information for different data groups. They are inserted into higher layers of the ViT, where features become more diverse and specialized.
-
Prompt selection module: This module uses a similarity-based clustering approach to assign data points to specific groups. It learns a set of keys for each group and selects the appropriate group prompt based on the similarity between the input features and the learned keys.
-
Block coordinate descent (BCD) optimization: To effectively train the prompts, SGPT employs a BCD approach. It first optimizes the shared prompts to learn common information, then optimizes the group prompts to extract more specialized knowledge
The authors introduce several techniques to improve the stability and effectiveness of their approach:
-
Calibration of the selection function using accumulated selection probability to avoid collapse into few groups.
-
Momentum parameter aggregation for both keys and group prompts to ensure selection consistency and knowledge consistency.
Theoretical Analysis
The paper provides a theoretical analysis of the gap between the global and local performance of the SGPT model. The authors identify two key factors affecting this gap:
-
Generalization: related to the number of samples in each group.
-
Distribution discrepancy: the difference between the global group distribution and the local group distribution of each client.
SGPT addresses these factors by using shared prompts in early layers to maximize the sample size for common features, and group prompts in higher layers to minimize distribution discrepancy for diverse features.
Experimental Setup and Results
The authors conducted extensive experiments on various datasets to evaluate SGPT’s performance under both label heterogeneity and feature heterogeneity conditions:
Label Heterogeneity:
-
CIFAR-100: 100 clients, with each client assigned data from a specific number of classes (s).
-
Five-dataset: a sequence of 5 datasets (SVHN, CIFAR10, not-MNIST, Fashion-MNIST, and MNIST) distributed across 20 clients.
Feature Heterogeneity:
-
Office-Caltech10: four data domains with 10 classes each.
-
DomainNet: six domains with the top ten most frequent classes.
The experiments compared SGPT against several baseline methods, including FedVPT, FedMix, pFedPG, FedEM, and FedPR. The results demonstrated that SGPT consistently outperformed these baselines across different heterogeneity levels and datasets.
Key findings include
-
SGPT achieved higher global accuracy and worst-local accuracy compared to other methods, indicating better performance on both global and local data distributions.
-
SGPT showed robustness to increasing levels of data heterogeneity, with smaller performance drops compared to other methods as heterogeneity increased.
-
In feature heterogeneity experiments, SGPT achieved the highest average accuracies on both Office-Caltech10 and DomainNet datasets.
The authors also conducted ablation studies to analyze the impact of different components of SGPT:
-
The combination of shared and group prompts led to significant improvements in both global and worst-local accuracy.
-
The proposed Block Coordinate Descent optimization strategy proved crucial for effective training of the prompts.
-
The prompt selection module with momentum updating improved clustering performance and stability.
Conclusion and Implications
The SGPT algorithm represents a significant advancement in federated learning, effectively bridging the gap between generalized and personalized approaches. By leveraging prompt-tuning techniques and the power of vision transformers, SGPT demonstrates superior performance in handling data heterogeneity across clients.
The key innovations of SGPT – shared and group prompts, the prompt selection module, and the BCD optimization strategy – provide a flexible framework that can adapt to both global and local data distributions without requiring local fine-tuning. This approach not only improves performance but also maintains efficiency, with significantly fewer trainable parameters compared to traditional FL methods.
As federated learning continues to gain importance in privacy-preserving machine learning applications, methods like SGPT that can effectively handle heterogeneous data distributions will be crucial for real-world deployments. Future research could explore the application of similar prompt-tuning techniques to other types of models beyond vision transformers, as well as investigating the scalability and communication efficiency of such approaches in large-scale federated learning systems.
Created by AI, edited by humans, about AI
This blog post is part of our ‘ANDERS – AI Noteworthy Developments Explained & Research Simplified’ series. Here we utilize AI Agents to create initial drafts from research papers, which are then carefully edited and refined by our humans. The goal is to bring you clear, concise explanations of cutting-edge research conducted by Vector researchers. Through ANDERS, we strive to bridge the gap between complex scientific advancements and everyday understanding, highlighting why these developments are important and how they impact our world.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
paperAI maps science papers to predict research trends two to three years ahead - Tech Xplore
<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTE5aTkZYTWdaRDZwTXNRMldpMG1WZ1YzWDZTOHN5M183Z3A1ZTFYbnhEWTdPRmpvZnZFU0xodlRsNWxFaGxTcEpwalhJNmJpQWE5VjhaRS1tOXJIeTc5Z0JNblJ3dFd4WjRYZGJOX0NrWGt6ZmZJVTBpRm5wWQ?oc=5" target="_blank">AI maps science papers to predict research trends two to three years ahead</a> <font color="#6f6f6f">Tech Xplore</font>

AI maps science papers to predict research trends two to three years ahead
The number of scientific papers is growing so rapidly that scientists are no longer able to keep track of all of them, even in their own research area. Researchers from the Karlsruhe Institute of Technology (KIT), in collaboration with scientific partners, have shown how new research ideas can still be obtained from this wealth of information. Using artificial intelligence (AI), they systematically analyzed materials science publications to identify potential new avenues of research. Their results have been published in Nature Machine Intelligence.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Implementing Zero Trust Architecture in IoT-Heavy Enterprise Networks
<h2> The Paradigm Shift: From Castle-and-Moat to Zero Trust Edge </h2> <p>For decades, the standard for enterprise security was the "castle-and-moat" model. This architectural philosophy assumed that anything inside the network perimeter was inherently trustworthy, while everything outside was potentially malicious. However, the explosion of the Internet of Things (IoT) and the decentralization of the workforce have rendered this model obsolete. In a modern enterprise environment, the perimeter has dissolved. Today, a smart thermostat, an industrial PLC (Programmable Logic Controller), or a VoIP phone acts as a potential gateway for sophisticated adversaries. To secure these environments, organizations must transition to <strong>Zero Trust Architecture (ZTA)</strong>.</p> <p>As defined by
Buffer Overflows on x64 Windows: A Practical Beginners Guide (Part 2): Exploitation
<h2> Introduction </h2> <p>Welcome back. Mirrai here. In part 1 we covered the theory. The stack, RIP, and what a buffer overflow actually is. Now we get our hands dirty. By the end of this guide you should have a working exploit that gives you control of RIP and redirects execution to your own code.<br> Before we start, make sure you have x64dbg and pwntools installed from part 1. You'll also need the vulnerable program we wrote. If you haven't read part 1, go do that first. Buckle up, this might take a while.</p> <p>For your convenience, here's the old vuln program code<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight c"><code><span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"> #include</span> <span class="cpf"><windows.h></span><s
DeepSource for Python: Static Analysis and Autofix
<p><strong>DeepSource provides one of the most thorough Python static analysis experiences available in 2026.</strong> Its Python analyzer covers over 150 rules across bug detection, security scanning, performance optimization, and code style enforcement - with a sub-5% false positive rate that keeps findings actionable rather than noisy. Combined with Autofix, which generates ready-to-apply code changes for detected issues, DeepSource turns Python static analysis from a reporting exercise into an automated remediation workflow.</p> <p>This guide covers everything you need to set up DeepSource for Python projects - from the initial <code>.deepsource.toml</code> configuration to advanced features like type checking integration, Django and Flask security rules, and coverage reporting with py
How I built an AI that reads bank contracts the way bankers do (not the way customers do)
<h1> How I built an AI that reads bank contracts the way bankers do (not the way customers do) </h1> <p>The problem started in 2009. I was a banker. I watched loan officers use internal scoring grids that customers never saw. The information asymmetry wasn't illegal — it was just never shared.</p> <p>Fifteen years later, the asymmetry got worse. Banks now run LLMs on customer data before any human reviews it. The customer still signs without understanding what they're signing.</p> <p>So I built the reverse.</p> <h2> The core insight: bankers read contracts differently than customers </h2> <p>A customer reads a loan contract linearly — page by page, looking for the monthly payment.</p> <p>A banker reads it dimensionally — simultaneously scanning for:</p> <ul> <li> <strong>Covenant triggers<

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!