Manifold Generalization Provably Proceeds Memorization in Diffusion Models
Diffusion models often generate novel samples even when the learned score is only \emph{coarse} -- a phenomenon not accounted for by the standard view of diffusion training as density estimation. In this paper, we show that, under the \emph{manifold hypothesis}, this behavior can instead be explained by coarse scores capturing the \emph{geometry} of the data while discarding the fine-scale distributional structure of the population measure~$μ_{\scriptscriptstyle\mathrm{data}}$. Concretely, whereas estimating the full data distribution $μ_{\scriptscriptstyle\mathrm{data}}$ supported on a $k$-di — Zebang Shen, Ya-Ping Hsieh, Niao He
View PDF HTML (experimental)
Abstract:Diffusion models often generate novel samples even when the learned score is only \emph{coarse} -- a phenomenon not accounted for by the standard view of diffusion training as density estimation. In this paper, we show that, under the \emph{manifold hypothesis}, this behavior can instead be explained by coarse scores capturing the \emph{geometry} of the data while discarding the fine-scale distributional structure of the population measure~$\mu_{\scriptscriptstyle\mathrm{data}}$. Concretely, whereas estimating the full data distribution $\mu_{\scriptscriptstyle\mathrm{data}}$ supported on a $k$-dimensional manifold is known to require the classical minimax rate $\tilde{\mathcal{O}}(N^{-1/k})$, we prove that diffusion models trained with coarse scores can exploit the \emph{regularity of the manifold support} and attain a near-parametric rate toward a \emph{different} target distribution. This target distribution has density uniformly comparable to that of~$\mu_{\scriptscriptstyle\mathrm{data}}$ throughout any $\tilde{\mathcal{O}}\bigl(N^{-\beta/(4k)}\bigr)$-neighborhood of the manifold, where $\beta$ denotes the manifold regularity. Our guarantees therefore depend only on the smoothness of the underlying support, and are especially favorable when the data density itself is irregular, for instance non-differentiable. In particular, when the manifold is sufficiently smooth, we obtain that \emph{generalization} -- formalized as the ability to generate novel, high-fidelity samples -- occurs at a statistical rate strictly faster than that required to estimate the full population distribution~$\mu_{\scriptscriptstyle\mathrm{data}}$.
Comments: The first two authors contributed equally
Subjects:
Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2603.23792 [cs.LG]
(or arXiv:2603.23792v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.23792
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Zebang Shen [view email] [v1] Tue, 24 Mar 2026 23:50:09 UTC (708 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Obsolescence without hostility: optimization, uniformity, and the erosion of human meaning in a post-AI world
Most contemporary discussions of artificial intelligence focus on misalignment, loss of control, or catastrophic harm. This paper examines a different and comparatively neglected possibility: that advanced AI may erode the social conditions under which human meaning has historically been generated, without conflict, coercion, or displacement. The central question is not whether AI dominates humanity, but whether human participation remains causally significant once AI systems outperform humans across core instrumental domains. The argument is conditional and long-horizon in scope. It proceeds from the observation that existing limits on AI superiority are primarily technological and economic rather than principled. If these constraints are progressively overcome, and AI systems come to out

Advancing human–AI teams: evolving from instrumental tools to trusted partners
Human–Computer Interaction (HCI) has undergone fundamental transformations as AI capabilities have advanced, necessitating new theoretical frameworks for understanding human–AI collaboration (HAIC). This review traces HCI’s evolution through four paradigm shifts: the Equipment Era (pre-1970s), Interactive System Era (1980s–1990s), Autonomous Agent Era (1990s–2010s), and the emerging Coexistential AI Era (2020s–present), reflecting changing metaphors from tools to dialog partners, autonomous agents, and co-creative partners. The analysis reveals how anthropomorphism and affective computing have become central to contemporary AI systems, enabling emotional intelligence and pseudo-intimate relationships that fundamentally alter human–technology dynamics. Traditional performance metrics such a

The algorithmic blind spot: bias, moral status, and the future of robot rights
Contemporary debates in AI ethics increasingly foreground the prospective moral status of artificial intelligence and the possibility of extending moral or legal rights to artificial agents. While such discussions raise substantive philosophical questions, they often proceed alongside a comparatively limited engagement with the empirically documented harms generated by algorithmic systems already embedded within social, legal, and economic institutions. We conceptualize this asymmetry as an algorithmic blind spot: a discursive-structural pattern in which disproportionate ethical investment in speculative future artificial agents marginalizes empirically documented and asymmetrically distributed harms affecting human populations. The paper analyzes prominent strands of the robot-rights lite
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction
arXiv:2604.01756v1 Announce Type: new Abstract: Realistic lip synchronization is essential for the natural human-robot non-verbal interaction of humanoid robots. Motivated by this need, this paper presents a lip motion generation framework based on 3D dynamic viseme and coarticulation modeling. By analyzing Chinese pronunciation theory, a 3D dynamic viseme library is constructed based on the ARKit standard, which offers coherent prior trajectories of lips. To resolve motion conflicts within continuous speech streams, a coarticulation mechanism is developed by incorporating initial-final (Shengmu-Yunmu) decoupling and energy modulation. After developing a strategy to retarget high-dimensional spatial lip motion to a 14-DOF lip actuation system of a humanoid head platform, the efficiency and

3-D Relative Localization for Multi-Robot Systems with Angle and Self-Displacement Measurements
arXiv:2604.01703v1 Announce Type: new Abstract: Realizing relative localization by leveraging inter-robot local measurements is a challenging problem, especially in the presence of measurement noise. Motivated by this challenge, in this paper we propose a novel and systematic 3-D relative localization framework based on inter-robot interior angle and self-displacement measurements. Initially, we propose a linear relative localization theory comprising a distributed linear relative localization algorithm and sufficient conditions for localizability. According to this theory, robots can determine their neighbors' relative positions and orientations in a purely linear manner. Subsequently, in order to deal with measurement noise, we present an advanced Maximum a Posterior (MAP) estimator by a

Coupler Position Optimization and Channel Estimation for Flexible Coupler Antenna Aided Multiuser Communication
arXiv:2602.11319v2 Announce Type: replace-cross Abstract: In this paper, we propose a distributed flexible coupler antenna (FCA) array to enhance communication performance with low hardware cost. At each FCA, there is one fixed-position active antenna and multiple passive couplers that can move within a designated region around the active antenna. Moreover, each FCA is equipped with a local processing unit (LPU). All LPUs exchange signals with a central processing unit (CPU) for joint signal processing. We study an FCA-aided multiuser multiple-input multiple-output (MIMO) system, where an FCA array base station (BS) is deployed to enhance the downlink communication between the BS and multiple single-antenna users. We formulate optimization problems to maximize the achievable sum rate of us

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!