VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation
arXiv:2604.02467v1 Announce Type: new Abstract: Cinematic camera control relies on a tight feedback loop between director and cinematographer, where camera motion and framing are continuously reviewed and refined. Recent generative camera systems can produce diverse, text-conditioned trajectories, but they lack this "director in the loop" and have no explicit supervision of whether a shot is visually desirable. This results in in-distribution camera motion but poor framing, off-screen characters, and undesirable visual aesthetics. In this paper, we introduce VERTIGO, the first framework for visual preference optimization of camera trajectory generators. Our framework leverages a real-time graphics engine (Unity) to render 2D visual previews from generated camera motion. A cinematically fin
View PDF HTML (experimental)
Abstract:Cinematic camera control relies on a tight feedback loop between director and cinematographer, where camera motion and framing are continuously reviewed and refined. Recent generative camera systems can produce diverse, text-conditioned trajectories, but they lack this "director in the loop" and have no explicit supervision of whether a shot is visually desirable. This results in in-distribution camera motion but poor framing, off-screen characters, and undesirable visual aesthetics. In this paper, we introduce VERTIGO, the first framework for visual preference optimization of camera trajectory generators. Our framework leverages a real-time graphics engine (Unity) to render 2D visual previews from generated camera motion. A cinematically fine-tuned vision-language model then scores these previews using our proposed cyclic semantic similarity mechanism, which aligns renders with text prompts. This process provides the visual preference signals for Direct Preference Optimization (DPO) post-training. Both quantitative evaluations and user studies on Unity renders and diffusion-based Camera-to-Video pipelines show consistent gains in condition adherence, framing quality, and perceptual realism. Notably, VERTIGO reduces the character off-screen rate from 38% to nearly 0% while preserving the geometric fidelity of camera motion. User study participants further prefer VERTIGO over baselines across composition, consistency, prompt adherence, and aesthetic quality, confirming the perceptual benefits of our visual preference post-training.
Comments: 28 pages, 10 figures, ECCV 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2604.02467 [cs.CV]
(or arXiv:2604.02467v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2604.02467
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yuwei Lu [view email] [v1] Thu, 2 Apr 2026 18:58:56 UTC (32,475 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modeltraining
Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework
arXiv:2512.16284v2 Announce Type: replace Abstract: Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the relate

Voting by mail: a Markov chain model for managing the security risks of election systems
arXiv:2410.13900v3 Announce Type: replace Abstract: The scrutiny surrounding vote-by-mail (VBM) in the United States has increased in recent years, highlighting the need for a rigorous quantitative framework to evaluate the resilience of the absentee voting infrastructure. This paper addresses these issues by introducing a dynamic mathematical modeling framework for performing a risk assessment of VBM processes. We introduce a discrete-time Markov chain (DTMC) to model the VBM process and assess election performance and risk with a novel layered network approach that considers the interplay between VBM processes, malicious and non-malicious threats, and security mitigations. The time-inhomogeneous DTMC framework captures dynamic risks and evaluates performance over time. The DTMC model acc
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework
arXiv:2512.16284v2 Announce Type: replace Abstract: Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the relate

A technical, 100% local writeup on how I replicated and then surpassed the Secret Detection model from Wiz (and the challenges along the way) - including labeling an entire dataset with local AI
Hey everybody, I have a strong interest in offloading work to small, specialized models that I can parallelize - this lets me scale work significantly (plus, I am less dependent on proprietary APIs) Some time ago, I saw a blog post from Wiz about fine-tuning Llama 3.2-1B for secret detection in code. They got 86% Precision and 82% Recall. I wanted to see if I can replicate (or beat) those numbers using purely local AI and produce a local specialized model. After a couple of weekends of trying it out I managed to get a Llama 3.2-1B hitting 88% Precision and 84.4% Recall simultaneously! I also benchmarked Qwen 3.5-2B and 4B - expectedly, they outperformed Llama 1B at the cost of more VRAM and longer inference time. I’ve put together a full write-up with the training stats, examples, and a st

I open-sourced a tool that compiles raw documents into an AI-navigable wiki with persistent memory; runs 100% locally
After seeing Karpathy's tweet about using LLMs to build personal wikis from research documents, I realized I'd already been using something similar like this internally for our R D. So I cleaned it up and open-sourced it. What it does: You drop a folder of raw documents (PDFs, papers, notes, code, 60+ formats) and the LLM compiles them into a structured markdown wiki with backlinked articles, concept pages, and a master index. It then compresses everything into a .aura archive optimized for RAG retrieval (~97% smaller than raw source data). How it works: pip install aura-research research init my-project # copy docs into raw/ research ingest raw/ research compile research query "your question" Key design decisions: No embeddings, no vector databases. Uses SimHash + Bloom Filters instead. Z

Gemma4:26b's reasoning capabilities are crazy.
Been experimenting with it, first on my buddy's compute he let me borrow, and then with the Gemini SDK so that I don't need to keep stealing his macbook from 600 miles away. Originally my home agent was run through Gemini-3-Flash because no other model I've tried has been able to match it's reasoning ability. The script(s) I have it running through are a re-implementation of a multi-speaker smart home speaker setup, with several rasperry pi zeroes functioning as speaker satellites for a central LLM hub, right now a raspberry pi 5, soon to be an M4 mac mini prepped for full local operation. It also has a dedicated discord bot I use to interact with it from my phone and PC for more complicated tasks, and those requiring information from an image, like connector pinouts I want help with. I've



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!