Research Papers research paper arxiv Next-Token Prediction autoregressive modeling multimodal systems

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

HuggingFace PapersMarch 29, 20262 min read1 views

Discrete Native Autoregressive framework enables unified multimodal processing by representing diverse modalities in a shared discrete space through a novel visual transformer architecture. (43 upvotes on HuggingFace)

Published on Mar 29

Authors:

Abstract

Discrete Native Autoregressive framework enables unified multimodal processing by representing diverse modalities in a shared discrete space through a novel visual transformer architecture.

AI-generated summary

The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and suboptimal integration. To transcend this limitation, we introduce Discrete Native Autoregressive (DiNA), a unified framework that represents multimodal information within a shared discrete space, enabling a consistent and principled autoregressive modeling across modalities. A key innovation is the Discrete Native Any-resolution Visual Transformer (dNaViT), which performs tokenization and de-tokenization at arbitrary resolutions, transforming continuous visual signals into hierarchical discrete tokens. Building on this foundation, we develop LongCat-Next, a native multimodal model that processes text, vision, and audio under a single autoregressive objective with minimal modality-specific design. As an industrial-strength foundation model, it excels at seeing, painting, and talking within a single framework, achieving strong performance across a wide range of multimodal benchmarks. In particular, LongCat-Next addresses the long-standing performance ceiling of discrete vision modeling on understanding tasks and provides a unified approach to effectively reconcile the conflict between understanding and generation. As an attempt toward native multimodality, we open-source the LongCat-Next and its tokenizers, hoping to foster further research and development in the community. GitHub: https://github.com/meituan-longcat/LongCat-Next

View arXiv page View PDF Project page GitHub 278 Add to collection

Get this paper in your agent:

hf papers read 2603.27538

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.27538 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.27538 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Original source

HuggingFace Papers

https://huggingface.co/papers/2603.27538

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Countries

From climate storytelling to AI innovation: Rice researchers take on global challenges at SXSW - Rice University

From climate storytelling to AI innovation: Rice researchers take on global challenges at SXSW Rice University

GNews AI climate

1m16 days ago

Research PapersLive

🔮 Autoresearch and the experimental society - exponentialview.co

🔮 Autoresearch and the experimental society exponentialview.co

Google News: Machine Learning

1mabout 1 hour ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m2 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 179 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

🔮 Autoresearch and the experimental society - exponentialview.co

🔮 Autoresearch and the experimental society exponentialview.co

Google News: Machine Learning

1mabout 1 hour ago

Research PapersLive

Springing into AI: PyTorch Conference Europe and ICLR 2026

Article URL: https://www.collabora.com/news-and-blog/news-and-events/springing-into-ai-pytorch-conference-europe-and-iclr-2026.html Comments URL: https://news.ycombinator.com/item?id=47619120 Points: 2 # Comments: 0

Hacker News AI Top

1mabout 1 hour ago

Research Papers

Vector researchers presenting more than 98 papers at NeurIPS 2024

Leading researchers from Vector are presenting groundbreaking research at this year s Conference on Neural Information Processing Systems (NeurIPS). The conference, taking place December 10-15 in Vancouver and online, showcases innovative [ ] The post Vector researchers presenting more than 98 papers at NeurIPS 2024 appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mover 1 year ago

Research Papers

Enterprise AI vs. Consumer AI: What’s the Difference? - Oracle

Enterprise AI vs. Consumer AI: What’s the Difference? Oracle

GNews AI UK

1m24 days ago