Research Papers research paper arxiv ai artificial-intelligence

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

arXivby [Submitted on 27 Mar 2026]March 30, 20261 min read1 views

arXiv:2603.26648v1 Announce Type: cross Abstract: Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype ima — Zehai He, Wenyi Hong, Zhen Yang, Ziyang Pan, Mingdao Liu, Xiaotao Gu, Jie Tang

View PDF HTML (experimental)

Abstract:Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype images and 1,255 test cases. To support flexible, thorough and reliable evaluation, we propose workflow-based agent verification paradigm based on two complementary components: a GUI agent verifier and a VLM-based judge. We evaluate multiple visual language models instantiated under different coding-agent frameworks, revealing substantial performance gaps at all task levels, with state-of-the-art models still struggling on full-stack development.

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26648 [cs.SE]

(or arXiv:2603.26648v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.26648

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zehai He [view email] [v1] Fri, 27 Mar 2026 17:50:45 UTC (25,879 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26648

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m3 days ago

Self-Evolving AIFresh

AI agents accelerate cyberattacks, researchers warn - mezha.net

AI agents accelerate cyberattacks, researchers warn mezha.net

Google News - AI Ukraine

1mabout 2 hours ago

ProductsLive

Akira Hackers Shrink Encryption Timeline to Under One Hour

A notorious ransomware group has been observed leveraging long‑standing exploits and stolen credentials to slip past MFA protections and execute attacks in as little as one hour. Tracking the well-known Akira ransomware group, security researchers from Halcyon witnessed hackers abusing CVE-2024-40766 to gain unauthorised access to SonicWall management interfaces and configuration backups on unpatched devices. [ ] The post Akira Hackers Shrink Encryption Timeline to Under One Hour appeared first on DIGIT .

Digit.fyi

3mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 208 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Switzerland hosts 'CERN of semiconductor research'

Article URL: https://www.swissinfo.ch/eng/swiss-ai/switzerland-hosts-cern-of-semiconductor-research/91015332 Comments URL: https://news.ycombinator.com/item?id=47624879 Points: 16 # Comments: 2

Hacker News Top

5mabout 2 hours ago

Research PapersRecent

T5Gemma-TTS Technical Report

Encoder-decoder codec language model with cross-attention and PM-RoPE achieves improved voice cloning and duration control for multilingual speech synthesis. (2 upvotes on HuggingFace)

HuggingFace Papers

2m1 day ago

Research PapersRecent

DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

DynaVid addresses limitations in video diffusion models by using synthetic motion data represented as optical flow to improve realistic video synthesis with dynamic motions and fine-grained motion control. (2 upvotes on HuggingFace)

HuggingFace Papers

2m1 day ago

Research PapersRecent

Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation

Omni123 is a 3D-native foundation model that unifies text-to-2D and text-to-3D generation using a shared sequence space with cross-modal consistency as an implicit structural constraint. (1 upvotes on HuggingFace)

HuggingFace Papers

2m1 day ago