Causality-inspired Federated Learning for Dynamic Spatio-Temporal Graphs
arXiv:2603.29384v1 Announce Type: new Abstract: Federated Graph Learning (FGL) has emerged as a powerful paradigm for decentralized training of graph neural networks while preserving data privacy. However, existing FGL methods are predominantly designed for static graphs and rely on parameter averaging or distribution alignment, which implicitly assume that all features are equally transferable across clients, overlooking both the spatial and temporal heterogeneity and the presence of client-specific knowledge in real-world graphs. In this work, we identify that such assumptions create a vicious cycle of spurious representation entanglement, client-specific interference, and negative transfer, degrading generalization performance in Federated Learning over Dynamic Spatio-Temporal Graphs (F
View PDF HTML (experimental)
Abstract:Federated Graph Learning (FGL) has emerged as a powerful paradigm for decentralized training of graph neural networks while preserving data privacy. However, existing FGL methods are predominantly designed for static graphs and rely on parameter averaging or distribution alignment, which implicitly assume that all features are equally transferable across clients, overlooking both the spatial and temporal heterogeneity and the presence of client-specific knowledge in real-world graphs. In this work, we identify that such assumptions create a vicious cycle of spurious representation entanglement, client-specific interference, and negative transfer, degrading generalization performance in Federated Learning over Dynamic Spatio-Temporal Graphs (FSTG). To address this issue, we propose a novel causality-inspired framework named SC-FSGL, which explicitly decouples transferable causal knowledge from client-specific noise through representation-level interventions. Specifically, we introduce a Conditional Separation Module that simulates soft interventions through client conditioned masks, enabling the disentanglement of invariant spatio-temporal causal factors from spurious signals and mitigating representation entanglement caused by client heterogeneity. In addition, we propose a Causal Codebook that clusters causal prototypes and aligns local representations via contrastive learning, promoting cross-client consistency and facilitating knowledge sharing across diverse spatio-temporal patterns. Experiments on five diverse heterogeneity Spatio-Temporal Graph (STG) datasets show that SC-FSGL outperforms state-of-the-art methods.
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2603.29384 [cs.LG]
(or arXiv:2603.29384v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.29384
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yuxuan Liu [view email] [v1] Tue, 31 Mar 2026 07:52:56 UTC (2,666 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
neural networktrainingannouncev0.16.0
Axolotl v0.16.0 Release Notes We’re very excited to share this new packed release. We had ~80 new commits since v0.15.0 (March 6, 2026). Highlights Async GRPO — Asynchronous Reinforcement Learning Training ( #3486 ) Full support for asynchronous Group Relative Policy Optimization with vLLM integration. Includes async data producer with replay buffer, streaming partial-batch training, native LoRA weight sync to vLLM, and FP8 compatibility. Supports multi-GPU via FSDP1/FSDP2 and DeepSpeed ZeRO-3. Achieves up to 58% faster step times (1.59s/step vs 3.79s baseline on Qwen2-0.5B). Optimization Step Time Improvement Baseline 3.79s — + Batched weight sync 2.52s 34% faster + Liger kernel fusion 2.01s 47% faster + Streaming partial batch 1.79s 53% faster + Element chunking + re-roll fix (500 steps)
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Show HN: MicroSafe-RL – Sub-microsecond safety layer for Edge AI 1.18µs latency
I built MicroSafe-RL to solve the "Hardware Drift" problem in Reinforcement Learning. When RL agents move from simulation to real hardware, they often encounter unknown states and destroy expensive parts. Key specs: 1.18µs latency (85 cycles on STM32 @ 72MHz) 20 bytes of RAM (no malloc) Model-free: It adapts to mechanical wear-and-tear using EMA/MAD stats. Includes a Python Auto-Tuner to generate C++ parameters from 2 mins of telemetry. Check it out: https://github.com/Kretski/MicroSafe-RL Comments URL: https://news.ycombinator.com/item?id=47621536 Points: 1 # Comments: 0





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!