Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessBuild a Price Comparison Tool in 15 Minutes with the Marketplace Price APIDEV CommunityKubernetes - A Beginner's Guide to Container OrchestrationDEV Community5 Free Copilot Alternatives That Actually Work in 2026DEV CommunityCodiumAI vs Codium (Open Source): They Are NOT the SameDEV CommunityHow Bifrost Reduces GPT Costs and Response Times with Semantic CachingDEV Community[New Research] You need Slack to be an effective agentLessWrong AIAn interview with Galen Buckwalter, a BCI recipient in a Caltech brain implant study, on his recent ability to use the implant to produce musical tones (Emily Mullin/Wired)TechmemeA startup founder explains why she built 9 AI employees: 'I am a breathless OpenClaw bro'Business InsiderTop 5 Enterprise AI Gateways to Track Claude Code CostsDEV CommunityAntigravity: My Approach to Deliver the Most Assured Value for the Least MoneyDEV CommunityTrading My Body for Logic: The Physical Decay We IgnoreDEV CommunityGetting Started with Apache Kafka: What I Learned Building Event-Driven Microservices at EricssonDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessBuild a Price Comparison Tool in 15 Minutes with the Marketplace Price APIDEV CommunityKubernetes - A Beginner's Guide to Container OrchestrationDEV Community5 Free Copilot Alternatives That Actually Work in 2026DEV CommunityCodiumAI vs Codium (Open Source): They Are NOT the SameDEV CommunityHow Bifrost Reduces GPT Costs and Response Times with Semantic CachingDEV Community[New Research] You need Slack to be an effective agentLessWrong AIAn interview with Galen Buckwalter, a BCI recipient in a Caltech brain implant study, on his recent ability to use the implant to produce musical tones (Emily Mullin/Wired)TechmemeA startup founder explains why she built 9 AI employees: 'I am a breathless OpenClaw bro'Business InsiderTop 5 Enterprise AI Gateways to Track Claude Code CostsDEV CommunityAntigravity: My Approach to Deliver the Most Assured Value for the Least MoneyDEV CommunityTrading My Body for Logic: The Physical Decay We IgnoreDEV CommunityGetting Started with Apache Kafka: What I Learned Building Event-Driven Microservices at EricssonDEV Community

GenMask: Adapting DiT for Segmentation via Direct Mask

HuggingFace PapersMarch 25, 20268 min read0 views
Source Quiz

Generative models trained directly for segmentation tasks outperform indirect adaptation methods by using a novel timestep sampling strategy that enables joint training for both image generation and binary mask synthesis. (2 upvotes on HuggingFace)

Published on Mar 25

Authors:

,

,

,

,

,

,

Abstract

Generative models trained directly for segmentation tasks outperform indirect adaptation methods by using a novel timestep sampling strategy that enables joint training for both image generation and binary mask synthesis.

AI-generated summary

Recent approaches for segmentation have leveraged pretrained generative models as feature extractors, treating segmentation as a downstream adaptation task via indirect feature retrieval. This implicit use suffers from a fundamental misalignment in representation. It also depends heavily on indirect feature extraction pipelines, which complicate the workflow and limit adaptation. In this paper, we argue that instead of indirect adaptation, segmentation tasks should be trained directly in a generative manner. We identify a key obstacle to this unified formulation: VAE latents of binary masks are sharply distributed, noise robust, and linearly separable, distinct from natural image latents. To bridge this gap, we introduce timesteps sampling strategy for binary masks that emphasizes extreme noise levels for segmentation and moderate noise for image generation, enabling harmonious joint training. We present GenMask, a DiT trains to generate black-and-white segmentation masks as well as colorful images in RGB space under the original generative objective. GenMask preserves the original DiT architecture while removing the need of feature extraction pipelines tailored for segmentation tasks. Empirically, GenMask attains state-of-the-art performance on referring and reasoning segmentation benchmarks and ablations quantify the contribution of each component.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2603.23906

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.23906 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.23906 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.23906 in a Space README.md to link it from this page.

Collections including this paper 1

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
GenMask: Ad…researchpaperarxivpretrained …feature ext…segmentationHuggingFace…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 220 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers