Benchmarking Scientific Machine Learning Models for Air Quality Data
arXiv:2603.21039v2 Announce Type: replace Abstract: Accurate air quality index (AQI) forecasting is essential for the protecting public health in rapidly growing urban regions, and the practical model evaluation and selection are often challenged by the lack of rigorous, region-specific benchmarking on standardized datasets. Physics-guided machine learning and deep learning models could be a good and effective solution to resolve such issues with more accurate and efficient AQI forecasting. This research study presents an explainable and comprehensive benchmark that enables a guideline and pro — Khawja Imran Masud, Venkata Sai Rahul Unnam, Sahara Ali
View PDF HTML (experimental)
Abstract:Accurate air quality index (AQI) forecasting is essential for the protecting public health in rapidly growing urban regions, and the practical model evaluation and selection are often challenged by the lack of rigorous, region-specific benchmarking on standardized datasets. Physics-guided machine learning and deep learning models could be a good and effective solution to resolve such issues with more accurate and efficient AQI forecasting. This research study presents an explainable and comprehensive benchmark that enables a guideline and proposed physics-guided best model by benchmarking classical time-series, machine-learning, and deep-learning approaches for multi-horizon AQI forecasting in North Texas (Dallas County). Using publicly available U.S. Environmental Protection Agency (EPA) daily observations of air quality data from 2022 to 2024, we curate city-level time series for PM2.5 and O3 by aggregating station measurements and constructing lag-wise forecasting datasets for LAG in {1,7,14,30} days. For benchmarking the best model, linear regression (LR), SARIMAX, multilayer perceptrons (MLP), and LSTM networks are evaluated with the proposed physics-guided variants (MLP+Physics and LSTM+Physics) that incorporate the EPA breakpoint-based AQI formulation as a consistency constraint through a weighted loss. Experiments using chronological train-test splits and error metrics MAE, RMSE showed that deep-learning models outperform simpler baselines, while physics guidance improves stability and yields physically consistent pollutant with AQI relationships, with the largest benefits observed for short-horizon prediction and for PM2.5 and O3. Overall, the results provide a practical reference for selecting AQI forecasting models in North Texas and clarify when lightweight physics constraints meaningfully improve predictive performance across pollutants and forecast horizons.
Comments: Accepted at IEEE IGARSS 2026; 22 pages, 6 figures;
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2603.21039 [cs.LG]
(or arXiv:2603.21039v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.21039
arXiv-issued DOI via DataCite
Submission history
From: Khawja Imran Masud [view email] [v1] Sun, 22 Mar 2026 03:31:24 UTC (10,657 KB) [v2] Thu, 26 Mar 2026 18:51:03 UTC (10,657 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivBuilding an Engineering & Security News Aggregator (10 Sources, No APIs)
<p>We built a curated engineering and security news aggregator that pulls from 10 high-signal sources, deduplicates content, and updates every 6 hours.</p> <p>No paid APIs. No scraping. No login. Just clean, structured news for developers.</p> <p>This post breaks down exactly how it works.</p> <h2> What This Is </h2> <p>A lightweight news wire combining:</p> <ul> <li>Hacker News </li> <li>Lobsters </li> <li>InfoQ </li> <li>Cloudflare Blog </li> <li>Krebs on Security </li> <li>The Hacker News (Security) </li> <li>NIST NVD (vulnerabilities) </li> <li>GitHub Blog </li> <li>OpenAI Blog </li> <li>Anthropic Research </li> </ul> <p>The goal: <strong>high-quality signal, zero noise, zero cost</strong>.</p> <h2> Why Build This? </h2> <p>Most engineering/news aggregators fail in one of these ways:</
DCReg: Decoupled Characterization for Efficient Degenerate LiDAR Registration
arXiv:2509.06285v2 Announce Type: replace Abstract: LiDAR point cloud registration is fundamental to robotic perception and navigation. In geometrically degenerate environments (e.g., corridors), registration becomes ill-conditioned: certain motion directions are weakly constrained, causing unstable solutions and degraded accuracy. Existing detect-then-mitigate methods fail to reliably detect, physically interpret, and stabilize this ill-conditioning without corrupting the optimization. We introduce DCReg (Decoupled Characterization for Ill-conditioned Registration), establishing a detect-characterize-mitigate paradigm that systematically addresses ill-conditioned registration via three innovations. First, DCReg achieves reliable ill-conditioning detection by employing Schur complement dec
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno
<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> <font color="#6f6f6f">University of Nevada, Reno</font>
AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> <font color="#6f6f6f">Forbes</font>
Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st
<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> <font color="#6f6f6f">simplywall.st</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!