Research Papers research paper arxiv VLM-diffusion-based model multimodal large language models latent hidden-state injection

Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models

HuggingFace PapersMarch 24, 20268 min read0 views

Source Quiz

Know3D integrates multimodal large language models with 3D generation through latent hidden-state injection, enabling language-controlled back-view synthesis by bridging semantic understanding and geometric reconstruction. (4 upvotes on HuggingFace)

Published on Mar 24

Authors:

Abstract

AI-generated summary

Recent advances in 3D generation have improved the fidelity and geometric details of synthesized 3D assets. However, due to the inherent ambiguity of single-view observations and the lack of robust global structural priors caused by limited 3D training data, the unseen regions generated by existing models are often stochastic and difficult to control, which may sometimes fail to align with user intentions or produce implausible geometries. In this paper, we propose Know3D, a novel framework that incorporates rich knowledge from multimodal large language models into 3D generative processes via latent hidden-state injection, enabling language-controllable generation of the back-view for 3D assets. We utilize a VLM-diffusion-based model, where the VLM is responsible for semantic understanding and guidance. The diffusion model acts as a bridge that transfers semantic knowledge from the VLM to the 3D generation model. In this way, we successfully bridge the gap between abstract textual instructions and the geometric reconstruction of unobserved regions, transforming the traditionally stochastic back-view hallucination into a semantically controllable process, demonstrating a promising direction for future 3D generation models.

View arXiv page View PDF Project page GitHub 65 Add to collection

Get this paper in your agent:

hf papers read 2603.22782

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.22782 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.22782 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 2

Original source

HuggingFace Papers

https://huggingface.co/papers/2603.22782

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1m29 days ago

ModelsFresh

A Player Selection Network for Scalable Game-Theoretic Prediction and Planning

arXiv:2505.00213v3 Announce Type: replace Abstract: While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large optimization problems where the number of variables increases with the number of agents, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose 1) PSN Game-a learning-based, game-theoretic prediction and planning framework that reduces game size by learning a Player Selection Network (PSN); and 2) a Goal Inference Network (GIN) that makes it possible to use the PSN in incomplete-information games where other agents' intentions are unknown to the ego agent. A PSN outputs a player selection mask that distinguishes influential players from less relevant

arXiv cs.RO

2mabout 10 hours ago

Research PapersFresh

The Indirect Method for Generating Libraries of Optimal Periodic Trajectories and Its Application to Economical Bipedal Walking

arXiv:2410.09512v2 Announce Type: replace Abstract: Trajectory optimization is an essential tool for generating efficient, dynamically consistent gaits in legged locomotion. This paper explores the indirect method of trajectory optimization, emphasizing its application in creating optimal periodic gaits for legged systems and contrasting it with the more common direct method. While the direct method provides flexibility in implementation, it is limited by its need for an input space parameterization. In contrast, the indirect method improves accuracy by computing the control input from states and costates obtained along the optimal trajectory. In this work, we tackle the convergence challenges associated with indirect shooting methods by utilizing numerical continuation methods. This is pa

arXiv cs.RO

2mabout 10 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 230 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1m29 days ago

Research PapersFresh

The Indirect Method for Generating Libraries of Optimal Periodic Trajectories and Its Application to Economical Bipedal Walking

arXiv cs.RO

2mabout 10 hours ago

Research PapersFresh

Bistable Quad-Nets Composed of Four-Bar Linkages

arXiv:2604.00527v1 Announce Type: cross Abstract: We study mechanical structures composed of spatial four-bar linkages that are bistable, that is, they allow for two distinct configurations. They have an interpretation as quad nets in the Study quadric which can be used to prove existence of arbitrarily large structures of this type. We propose a purely geometric construction of such examples, starting from infinitesimally flexible quad nets in Euclidean space and applying Whiteley de-averaging. This point of view situates the problem within the broader framework of discrete differential geometry and enables the construction of bistable structures from well-known classes of quad nets, such as discrete minimal surfaces. The proposed construction does not rely on numerical optimization and a

arXiv cs.RO

1mabout 10 hours ago

Research PapersFresh

Implicit Primal-Dual Interior-Point Methods for Quadratic Programming

arXiv:2604.00364v1 Announce Type: cross Abstract: This paper introduces a new method for solving quadratic programs using primal-dual interior-point methods. Instead of handling complementarity as an explicit equation in the Karush-Kuhn-Tucker (KKT) conditions, we ensure that complementarity is implicitly satisfied by construction. This is achieved by introducing an auxiliary variable and relating it to the duals and slacks via a retraction map. Specifically, we prove that the softplus function has favorable numerical properties compared to the commonly used exponential map. The resulting KKT system is guaranteed to be spectrally bounded, thereby eliminating the most pressing limitation of primal-dual methods: ill-conditioning near the solution. These attributes facilitate the solution of

arXiv cs.RO

1mabout 10 hours ago