Coarse-Guided Visual Generation via Weighted h-Transform Sampling
arXiv:2603.12057v2 Announce Type: replace-cross Abstract: Coarse-guided visual generation, which synthesizes fine visual samples from degraded or low-fidelity coarse references, is essential for various real-world applications. While training-based approaches are effective, they are inherently limited by high training costs and restricted generalization due to paired data collection. Accordingly, recent training-free works propose to leverage pretrained diffusion models and incorporate guidance during the sampling process. However, these training-free methods either require knowing the forward — Yanghao Wang, Ziqi Jiang, Zhen Wang, Long Chen
View PDF HTML (experimental)
Abstract:Coarse-guided visual generation, which synthesizes fine visual samples from degraded or low-fidelity coarse references, is essential for various real-world applications. While training-based approaches are effective, they are inherently limited by high training costs and restricted generalization due to paired data collection. Accordingly, recent training-free works propose to leverage pretrained diffusion models and incorporate guidance during the sampling process. However, these training-free methods either require knowing the forward (fine-to-coarse) transformation operator, e.g., bicubic downsampling, or are difficult to balance between guidance and synthetic quality. To address these challenges, we propose a novel guided method by using the h-transform, a tool that can constrain stochastic processes (e.g., sampling process) under desired conditions. Specifically, we modify the transition probability at each sampling timestep by adding to the original differential equation with a drift function, which approximately steers the generation toward the ideal fine sample. To address unavoidable approximation errors, we introduce a noise-level-aware schedule that gradually de-weights the term as the error increases, ensuring both guidance adherence and high-quality synthesis. Extensive experiments across diverse image and video generation tasks demonstrate the effectiveness and generalization of our method.
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.12057 [cs.CV]
(or arXiv:2603.12057v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.12057
arXiv-issued DOI via DataCite
Submission history
From: Wang Yanghao [view email] [v1] Thu, 12 Mar 2026 15:26:19 UTC (10,754 KB) [v2] Mon, 30 Mar 2026 15:10:02 UTC (10,754 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Google DeepMind s Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It Outperformed the Experts
Designing algorithms for Multi-Agent Reinforcement Learning (MARL) in imperfect-information games — scenarios where players act sequentially and cannot see each other s private information, like poker — has historically relied on manual iteration. Researchers identify weighting schemes, discounting rules, and equilibrium solvers through intuition and trial-and-error. Google DeepMind researchers proposes AlphaEvolve, an LLM-powered evolutionary coding agent [ ] The post Google DeepMind s Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It Outperformed the Experts appeared first on MarkTechPost .

Researchers build Wi-Fi chip that can operate inside a nuclear reactor — receiver uses special materials and design to withstand high doses of radiation for at least six months
Researchers build Wi-Fi chip that can operate inside a nuclear reactor — receiver uses special materials and design to withstand high doses of radiation for at least six months

Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!