Models model announce billion policy multimodal arxiv

EcoScratch: Cost-Effective Multimodal Repair for Scratch Using Execution Feedback

arXiv cs.SEby Yuan Si, Ming Wang, Daming Li, Hanyuan Shi, Jialu ZhangApril 1, 20262 min read0 views

arXiv:2603.29624v1 Announce Type: new Abstract: Scratch is the most popular programming environment for novices, with over 1.15 billion projects created worldwide. Unlike traditional languages, correctness in Scratch is defined by visible behavior on the stage rather than by code structure alone, so programs that appear correct in the workspace can still fail at runtime due to timing, event ordering, or cross-sprite interactions. Visual execution evidence such as gameplay videos can therefore be essential for diagnosis and repair. However, capturing and processing this evidence inside an automated repair loop introduces substantial overhead. Probing execution, recording stage behavior, rebuilding executable .sb3 projects, and verifying candidate fixes consume time, monetary cost, and resou

View PDF HTML (experimental)

Abstract:Scratch is the most popular programming environment for novices, with over 1.15 billion projects created worldwide. Unlike traditional languages, correctness in Scratch is defined by visible behavior on the stage rather than by code structure alone, so programs that appear correct in the workspace can still fail at runtime due to timing, event ordering, or cross-sprite interactions. Visual execution evidence such as gameplay videos can therefore be essential for diagnosis and repair. However, capturing and processing this evidence inside an automated repair loop introduces substantial overhead. Probing execution, recording stage behavior, rebuilding executable .sb3 projects, and verifying candidate fixes consume time, monetary cost, and resources across an entire repair trajectory rather than a single model call. We present EcoScratch, a repair pipeline that uses lightweight runtime signals to decide whether the next attempt stays text-only or escalates to multimodal prompting. The controller also sets the JSON Patch budget and verification effort, so evidence choice and repair budget are coupled inside the same decision. EcoScratch rebuilds candidate fixes into executable .sb3 projects and records per-trajectory traces, monetary cost, local-runtime energy. We evaluate 12 models on 100 executable Scratch repair projects under four controller settings, yielding 4800 repair trajectories. In this matrix, a selective multimodal policy gives the strongest observed success-cost-energy tradeoff. It reaches the highest generation success (30.3%) while using less average cost and local-runtime energy than the two non-adaptive multimodal baselines under the same bounded trajectory budget; text-only remains the lowest-cost floor. Across the evaluated matrix, multimodal evidence helps most when it is used to control escalation within a bounded trajectory budget rather than applied uniformly.

Subjects:

Software Engineering (cs.SE)

Cite as: arXiv:2603.29624 [cs.SE]

(or arXiv:2603.29624v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.29624

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yuan Si [view email] [v1] Tue, 31 Mar 2026 11:45:36 UTC (3,754 KB)

Original source

arXiv cs.SE

https://arxiv.org/abs/2603.29624

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannouncebillion

ModelsFresh

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted - wired.com

<a href="https://news.google.com/rss/articles/CBMijAFBVV95cUxOSWM1R1Y2THUxVzRaX2E1ZHBkekdrSGktcG0tbFFzV3k4emJXUWpDVkpJMWhKM1g4VXB2WktnWWl4dWQwSWhVQTF1ZzFMVlhJdnluTks5UzNEeXh5bWZsVUIyYktJMnUwNC14LTJ3TDZnRXNDS0FPelEwNWtHSFFpQ0xqd2dfNU45Zi1fag?oc=5" target="_blank">AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted</a> wired.com

Google News: AI

1mabout 3 hours ago

Market NewsFresh

OpenAI’s Fund Raise Shows ChatGPT Parent Worth $852 Billion Ahead of IPO. Who Bought. - Barron's

<a href="https://news.google.com/rss/articles/CBMijANBVV95cUxOajVCNkl6M0N0MjV0UkhxWURFRGt4NWxDZXQydkxhMXNSYV81ckNuejdIN2JrYTA4V1EzOWZ0ejFJN0NrZWZWUXhIc1FxQlJHRkU3RVU1MERfNlpmeWVFMUhoaVNIOHFpYWZrcm1ONl9HZVlpZDNHa213NFFrNU10dTRreDJSY2RESUlhZFZ3Y0JnZnROSkV3Vmh0bktwQ2RYUGpYR21ubThOUHlPZm1uUE9rUF9sNm00Wld6TnRSZVBIR2ZXYVRsNzVzNUN5NExYZDB3d2RKMk5jWlZHRHZtS2tfX2RYc0xha2dSa3B2bExQZkhwVmxITGwxSEtLMTVHekh1UjBWM29BOXRiMkZtZGxXbVZIWENrNHZaQkRmSWxCTjYyWWNDNXZRQURIMXExZDNqbVotZVFNeHgyRWNDcVhaU2pMTWVuTkZIXzg0RWZHeU9IdDVQc1E4MVpVMmlDWGd6ZjVjdm1YLWEwV2p6OGk4NnV0SWVQOFRydm9XbGl4b1dYVVZiekd2S3Y?oc=5" target="_blank">OpenAI’s Fund Raise Shows ChatGPT Parent Worth $852 Billion Ahead of IPO. Who Bought.</a> Barron's

Google News: OpenAI

1mabout 8 hours ago

Market NewsFresh

OpenAI’s Fund Raise Shows ChatGPT Parent Worth $852 Billion Ahead of IPO. Who Bought. - Barron's

<a href="https://news.google.com/rss/articles/CBMijANBVV95cUxOVnYtRnBTTTVyRWJLV0M1Ym5ySnlsd0JRMXNRQ0VTQ0w2S2szdlpJVmZKa0p5UDZHUUxBbUhYVzRNNGU5em1KUUpQMm5RRXFIVlUxUjRJWlR0VmxkamJLMFhaVjZMSTFLODY5VjBocEQzRUdvVmhhYl8ybzBQeFY5WUI3ZVpNTDRwNG56TzJHWjdld190cWJNa3dDUnZrWUg0YVFDUmNyNGxMb0lyS1NnRkdvaDhjRGNmbUZWdGlKZTkta0ZkMTI4LVUxRFJIY3FLSlh3S0xNUnAyUkp1aU9tazhzQTJtQ1c0elp0RHc3dllSWTZWQ3ozQTNoM3hNWlV5ZEZRTHBsRndVcE83R0RZa05Wa20wdjFsUUR0anNPMXc0NnZsSDNmUVVtX0NZcEVBNzR5eW9yQjhuYzE0VnU1anpiYjhaVHZNdFlOc0VyeEgyX3dEV1Bsck1hODFIeVRzcXNjYWEtck81b1VPb2RnLWY1bFV2cXVXUUQxR25MMEV6VHJlSDlzdGxoRF8?oc=5" target="_blank">OpenAI’s Fund Raise Shows ChatGPT Parent Worth $852 Billion Ahead of IPO. Who Bought.</a> Barron's

Google News: ChatGPT

1mabout 8 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 168 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted - wired.com

Google News: AI

1mabout 3 hours ago

ModelsLive

Tagged: claude ai - Crowdfund Insider

<a href="https://news.google.com/rss/articles/CBMiW0FVX3lxTFBMd3FCREZHVTY2ZWVyQUVILU1CQ0VZNG43MExCbjdiNGV5WW9XaTlqR2txUlJRN2dkeTJ1ZDE0bnZsNW5GVmlYS01tWWhwSzFhVkkzT0dRWWthbGc?oc=5" target="_blank">Tagged: claude ai</a> Crowdfund Insider

Google News: Claude

1mabout 1 hour ago

ModelsLive

Anthropic Executive Blames Claude Code Leak on ‘Process Errors’ - Bloomberg.com

<a href="https://news.google.com/rss/articles/CBMisgFBVV95cUxPNEZxSzM5eEUza21meWE4dndaM2N2QWhhTFFXdmdLeE84bzNyaU1lcy1QLU43RlZ3VWxtbElzcDJkVWJDV25PV0ZIRWNzbG5hazlLeEZpWm9VZXFEQkxaQ0N0LTdLYm9HcW13STZIa3FadU1HQTVHSkFKX25UOU1JQUhqY2pneGhNcE5PMDZiT3JVWG9kOGJjYUJXNE10REJLQ0EtalYxT05DTElMOC1BcHpR?oc=5" target="_blank">Anthropic Executive Blames Claude Code Leak on ‘Process Errors’</a> Bloomberg.com

Google News: Claude

1mabout 2 hours ago

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - wsj.com

<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxPb0E5MnYyZ0VibktLdklKaXBOOWtKRC1HS3pPaVNCNGVqRVZFVEI3empFUGZEM2hlMHNicGI4V2l5ZXdkMkgwZFBnTHF6SXpCclgtNTRrZGpiWk5JcEdsb0gtYlI1OG1YVnBCQUhxNWNLcGFSeVdHSGdqNHYwZHkwVm9fTUhJOWtxYmp3RWUzalo3Vi1yV3UwaTd4WWR2cFo2czcyQlI3U1dvUHlWVFBjMDhBT2NzYnE2dWtvQkc5bmFOblhEWW8tUDNEQUg0WG5uUlE5RWNPOGs3T3QzSWEzencySnRNSXNWVjVVMGhDeFRXSW5TQ0gtYnc5UmRjX2IwVHFld21BSkpkaHFkV3ZsdXF2T0VSTDlFaGFXSU1pcEp5NGRkNVAtT2dpdzlGazhGbC16c2poZlpBV0YyLXduTTg0UjZZNGlIY0xNd3ppQU54MVlZT0loYlA2LU9DMk1MMGNTYlRHa3NYMDFweVFZZDZFNEZnZHRCZVhPQXpSMlU0dEU5VGdjcnB3T3ByZEUtODFSSTMzWTY3TWJoaU10eEd3?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> wsj.com

Google News: OpenAI

1m3 days ago