The Closure Challenge: a benchmark task for machine learning in turbulence modelling
arXiv:2603.28884v1 Announce Type: cross Abstract: We introduce a field-wide benchmark challenge for machine learning in Reynolds-averaged Navier-Stokes (RANS) turbulence modelling. Though open-source datasets exist for training data-driven turbulence closure models, the field has been notably lacking a standard benchmark metric and test dataset. The Closure Challenge is a curated collection of open-source datasets and evaluation code that remedies this problem. We provide a variety of high-fidelity training data in a standardized format, including mean velocity gradients. The test cases (periodic hills, square duct, and NASA wall-mounted hump) evaluate Reynolds number and geometry generalization, two key issues in the field. We present results from three early submissions to the challenge.
View PDF HTML (experimental)
Abstract:We introduce a field-wide benchmark challenge for machine learning in Reynolds-averaged Navier-Stokes (RANS) turbulence modelling. Though open-source datasets exist for training data-driven turbulence closure models, the field has been notably lacking a standard benchmark metric and test dataset. The Closure Challenge is a curated collection of open-source datasets and evaluation code that remedies this problem. We provide a variety of high-fidelity training data in a standardized format, including mean velocity gradients. The test cases (periodic hills, square duct, and NASA wall-mounted hump) evaluate Reynolds number and geometry generalization, two key issues in the field. We present results from three early submissions to the challenge. This is an ongoing challenge, intended to continuously spur innovation in machine learning for turbulence modelling. Our goal is for this benchmark to become the standard evaluation for new machine learning frameworks in RANS. The Closure Challenge is available at this https URL.
Subjects:
Fluid Dynamics (physics.flu-dyn); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an)
Cite as: arXiv:2603.28884 [physics.flu-dyn]
(or arXiv:2603.28884v1 [physics.flu-dyn] for this version)
https://doi.org/10.48550/arXiv.2603.28884
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Ryley McConkey [view email] [v1] Mon, 30 Mar 2026 18:11:15 UTC (697 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarktraining
We absolutely need Qwen3.6-397B-A17B to be open source
The benchmarks may not show it but it's a substantial improvement over 3.5 for real world tasks. This model is performing better than GLM-5.1 and Kimi-k2.5 for me, and the biggest area of improvement has been reliability. It feels as reliable as claude in getting shit done end to end and not mess up half way and waste hours. This is the first OS model that has actually felt like I can compare it to Claude Sonnet. We have been comparing OS models with claude sonnet and opus left and right months now, they do show that they are close in benchmarks but fall apart in the real world, the models that are claimed to be close to opus haven't even been able to achieve Sonnet level quality in my real world usage. This is the first model I can confidently say very closely matches Sonnet. And before s

The Clarity Reckoning: How Precise Prompting with AI Is Rewriting the Rules of Executive Leadership
The Clarity Reckoning: How Precise Prompting with AI Agents Is Rewriting the Rules of Executive Leadership From ‘forward this and pls fix’ emails to true leverage: why precise prompting has quietly become the rarest — and most powerful — executive skill The leap from casual “forward this and pls fix” emails to disciplined agent orchestration is quietly exposing decades of hidden execution gaps — while handing clear-thinking leaders the single greatest leverage opportunity in modern business. The rain hammered the windows of our Hong Kong office as I sat alone at 11:47 p.m., the harbor lights smearing into a neon haze beyond the glass. A senior relationship manager from one of our key clients – a multinational institution navigating cross-border payments and FX volatility – had just forward
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

We absolutely need Qwen3.6-397B-A17B to be open source
The benchmarks may not show it but it's a substantial improvement over 3.5 for real world tasks. This model is performing better than GLM-5.1 and Kimi-k2.5 for me, and the biggest area of improvement has been reliability. It feels as reliable as claude in getting shit done end to end and not mess up half way and waste hours. This is the first OS model that has actually felt like I can compare it to Claude Sonnet. We have been comparing OS models with claude sonnet and opus left and right months now, they do show that they are close in benchmarks but fall apart in the real world, the models that are claimed to be close to opus haven't even been able to achieve Sonnet level quality in my real world usage. This is the first model I can confidently say very closely matches Sonnet. And before s





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!