Research Papers research paper arxiv computer-vision image-recognition

LitePT: Lighter Yet Stronger Point Transformer

arXivMarch 31, 20262 min read0 views

arXiv:2512.13689v2 Announce Type: replace Abstract: Modern neural architectures for 3D point cloud processing contain both convolutional layers and attention blocks, but the best way to assemble them remains unclear. We analyse the role of different computational blocks in 3D point cloud networks and find an intuitive behaviour: convolution is adequate to extract low-level geometry at high-resolution in early layers, where attention is expensive without bringing any benefits; attention captures high-level semantics and context in low-resolution, deep layers more efficiently, where convolution — Yuanwen Yue, Damien Robert, Jianyuan Wang, Sunghwan Hong, Jan Dirk Wegner, Christian Rupprecht, Konrad Schindler

View PDF HTML (experimental)

Abstract:Modern neural architectures for 3D point cloud processing contain both convolutional layers and attention blocks, but the best way to assemble them remains unclear. We analyse the role of different computational blocks in 3D point cloud networks and find an intuitive behaviour: convolution is adequate to extract low-level geometry at high-resolution in early layers, where attention is expensive without bringing any benefits; attention captures high-level semantics and context in low-resolution, deep layers more efficiently, where convolution inflates the parameter count. Guided by this design principle, we propose a new, improved 3D point cloud backbone that employs convolutions in early stages and switches to attention for deeper layers. To avoid the loss of spatial layout information when discarding redundant convolution layers, we introduce a novel, parameter-free 3D positional encoding, PointROPE. The resulting LitePT model has $3.6\times$ fewer parameters, runs $2\times$ faster, and uses $2\times$ less memory than the state-of-the-art Point Transformer V3, but nonetheless matches or outperforms it on a range of tasks and datasets. Code and models are available at: this https URL.

Comments: CVPR 2026, Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2512.13689 [cs.CV]

(or arXiv:2512.13689v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2512.13689

arXiv-issued DOI via DataCite

Submission history

From: Yuanwen Yue [view email] [v1] Mon, 15 Dec 2025 18:59:57 UTC (17,362 KB) [v2] Mon, 30 Mar 2026 09:02:05 UTC (17,676 KB)

Original source

arXiv

https://arxiv.org/abs/2512.13689

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Market NewsLive

Incentivizing Truthful Data Contributions in a Marketplace for Mean Estimation

arXiv:2502.16052v4 Announce Type: replace Abstract: We study a data marketplace where a broker intermediates between buyers, who seek to estimate the mean $\mu$ of an unknown normal distribution $\Ncal(\mu, \sigma^2)$, and contributors, who can collect data from this distribution at a cost. The broker delegates data collection work to contributors, aggregates reported datasets, sells it to buyers, and redistributes revenue as payments to contributors. We aim to maximize welfare or profit under key constraints: individual rationality for buyers and contributors, incentive compatibility (contributors are incentivized to comply with data collection instructions and truthfully report the collected data), and budget balance (total contributor payments equals total revenue). We first compute

arXiv cs.GT

1m12 minutes ago

Research PapersLive

Semantic MIMO: Revisiting Linear Precoding in the Generative AI Era

arXiv:2604.01409v1 Announce Type: new Abstract: This paper revisits linear precoding, namely match-filter (MF) and zero-forcing (ZF), in a semantic multiple-input multiple-output (MIMO) system empowered by generative AI. The aim is to examine whether interference, channel state information (CSI) accuracy, and scalability limitations in conventional MIMO systems remain critical. Theoretical analysis, which is based on the generative inference model and Lipschitz continuous assumptions, reveals reduced sensitivity to interference and channel imperfections, as well as performance inferiority in high-SINR regimes compared to conventional MIMO systems. Simulation results validate the analysis and show that MF achieves semantic performance comparable to ZF under both perfect and imperfect CSI. T

arXiv eess.SP

1m12 minutes ago

ProductsLive

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

arXiv:2604.01524v1 Announce Type: new Abstract: Many speaker localization methods can be found in the literature. However, speaker localization under strong reverberation still remains a major challenge in the real-world applications. This paper proposes two algorithms for localizing speakers using microphone array recordings of reverberated sounds. To separate concurrent speakers, the first algorithm decomposes microphone signals spectrotemporally into subbands via an auditory filterbank. To suppress reverberation, we propose a novel speech onset detection approach derived from the speech signal and impulse response models, and further propose to formulate the multi-channel cross-correlation coefficient (MCCC) of encoded speech onsets in each subband. The subband results are combined to e

arXiv eess.AS

1m12 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 341 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Semantic MIMO: Revisiting Linear Precoding in the Generative AI Era

arXiv eess.SP

1m12 minutes ago

Research PapersLive

Designing for Patient Voice in Interactive Health

arXiv:2604.01558v1 Announce Type: new Abstract: Interactive Health (IH) research increasingly engages patients through participatory and user-centred approaches. However, patients' lived experiences are typically treated more as data to be analysed than as knowledge in their own right. In this paper, I argue that 'patient voice' in the field of IH is both an inclusion issue and an epistemic one. More specifically, it concerns how experiential accounts are recognised and circulated. I examine how methodological conventions, authorship norms, review criteria, and publication formats tend to position patients as participants rather than as authors of evidence. Looking to patient-partnered practices in medical publishing, including The BMJ, JAMA, and British Journal of Sports Medicine, I outli

arXiv cs.HC

1m12 minutes ago

Research PapersLive

Complexity of Linear Subsequences of $k$-Automatic Sequences

arXiv:2512.10017v5 Announce Type: replace Abstract: We construct automata with input(s) in base $k$ recognizing some basic relations and study their number of states. We also consider some basic operations on $k$-automatic sequences $(h(i))_{i \geq 0}$ and discuss their state complexity. We find a relationship between subword complexity of the interior sequence $(h'(i))_{i \geq 0}$ and state complexity of the linear subsequence $(h(ni+c))_{i \geq 0}$. We resolve a recent question of Zantema and Bosma about linear subsequences of $k$-automatic sequences with input in most-significant-digit-first format. We also discuss the state complexity and runtime complexity of using a reasonable interpretation of B\"uchi arithmetic to actually construct some of the studied automata recognizing relation

arXiv cs.FL

1m12 minutes ago

Research PapersLive

Balancing Morality and Economics: Population Games with Herding and Inertia

arXiv:2604.02030v1 Announce Type: cross Abstract: The adoption of clean technologies (CTs) plays an important role in reducing carbon dioxide (CO$_2$) emissions. We study CT adoption in a large population of consumers with heterogeneous behavioral tendencies. We model the interaction among the agents as a multi-type mean-field game in which the agents choose between clean and polluting technology based products and may either behave as rationals (trading off price and moral incentives), herding agents (just follow the majority), or lethargic agents exhibiting inertia toward adopting the new technologies. We characterize equilibrium CT adoption levels using the recently introduced notion of $\boldsymbol{\alpha}$-Rational Nash Equilibrium ($\boldsymbol{\alpha}$-RNE) and its multi-type extens

arXiv cs.GT

1m12 minutes ago