Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWhy Privileged Access is Becoming the Control Plane for Agentic AI - Security BoulevardGNews AI agenticI’m a college admissions counselor. I’ve changed my mind about students using ChatGPT - San Francisco ChronicleGoogle News: AIChatGPT Ads: New Acquisition Channel Or Just Another Brand Tax? - Search Engine JournalGoogle News: ChatGPTAnthropic Finds “Emotions” in Claude — What Does AI “Feel”? - incryptedGoogle News: ClaudeThe Morning After: NASA’s Artemis II is on a voyage around the MoonEngadgetGoogle Gemini in Android Auto Starts Rolling Out More Widely - Thurrott.comGoogle News: GeminiDesktop Canary v2.1.48-canary.27LobeChat Releases🔥 sponsors/LearningCircuitGitHub Trending🔥 oumi-ai/oumiGitHub Trending🔥 microsoft/BitNetGitHub Trending🔥 PostHog/posthogGitHub Trending🔥 microsoft/apmGitHub TrendingBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWhy Privileged Access is Becoming the Control Plane for Agentic AI - Security BoulevardGNews AI agenticI’m a college admissions counselor. I’ve changed my mind about students using ChatGPT - San Francisco ChronicleGoogle News: AIChatGPT Ads: New Acquisition Channel Or Just Another Brand Tax? - Search Engine JournalGoogle News: ChatGPTAnthropic Finds “Emotions” in Claude — What Does AI “Feel”? - incryptedGoogle News: ClaudeThe Morning After: NASA’s Artemis II is on a voyage around the MoonEngadgetGoogle Gemini in Android Auto Starts Rolling Out More Widely - Thurrott.comGoogle News: GeminiDesktop Canary v2.1.48-canary.27LobeChat Releases🔥 sponsors/LearningCircuitGitHub Trending🔥 oumi-ai/oumiGitHub Trending🔥 microsoft/BitNetGitHub Trending🔥 PostHog/posthogGitHub Trending🔥 microsoft/apmGitHub Trending
AI NEWS HUBbyEIGENVECTOREigenvector

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

arXivby [Submitted on 27 Mar 2026]March 30, 20261 min read1 views
Source Quiz

arXiv:2603.26648v1 Announce Type: cross Abstract: Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype ima — Zehai He, Wenyi Hong, Zhen Yang, Ziyang Pan, Mingdao Liu, Xiaotao Gu, Jie Tang

View PDF HTML (experimental)

Abstract:Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype images and 1,255 test cases. To support flexible, thorough and reliable evaluation, we propose workflow-based agent verification paradigm based on two complementary components: a GUI agent verifier and a VLM-based judge. We evaluate multiple visual language models instantiated under different coding-agent frameworks, revealing substantial performance gaps at all task levels, with state-of-the-art models still struggling on full-stack development.

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26648 [cs.SE]

(or arXiv:2603.26648v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.26648

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zehai He [view email] [v1] Fri, 27 Mar 2026 17:50:45 UTC (25,879 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Vision2Web:…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 208 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!