Research Papers research paper arxiv machine-learning deep-learning

Differentially Private Linear Regression and Synthetic Data Generation with Statistical Guarantees

arXivMarch 31, 202610 min read0 views

arXiv:2510.16974v3 Announce Type: replace Abstract: In the social sciences, small- to medium-scale datasets are common, and linear regression is canonical. In privacy-aware settings, much work has focused on differentially private (DP) linear regression, but mostly on point estimation with limited attention to uncertainty quantification. Meanwhile, synthetic data generation (SDG) is increasingly important for reproducibility studies, yet current DP linear regression methods do not readily support it. Mainstream DP-SDG approaches either are tailored to discrete or discretized data, making them — Shurong Lin, Aleksandra Slavkovi\'c, Deekshith Reddy Bhoomireddy

View PDF HTML (experimental)

Abstract:In the social sciences, small- to medium-scale datasets are common, and linear regression is canonical. In privacy-aware settings, much work has focused on differentially private (DP) linear regression, but mostly on point estimation with limited attention to uncertainty quantification. Meanwhile, synthetic data generation (SDG) is increasingly important for reproducibility studies, yet current DP linear regression methods do not readily support it. Mainstream DP-SDG approaches either are tailored to discrete or discretized data, making them less suitable for analyses involving continuous variables, or rely on deep learning models that require large datasets, limiting their use for the smaller-scale data typical in social science. We propose a method for linear regression with valid inference under Gaussian DP. It includes a bias-corrected estimator with asymptotic confidence intervals (CIs) and a general SDG procedure such that the corresponding regression on the synthetic data matches our DP linear regression procedure. Our approach is effective in small- to moderate-dimensional settings. Experiments show that our method (1) improves accuracy over existing methods for DP linear regression, (2) provides valid CIs, and (3) produces more reliable synthetic data for downstream statistical and machine learning tasks than current DP synthesizers.

Subjects:

Machine Learning (cs.LG); Machine Learning (stat.ML)

Cite as: arXiv:2510.16974 [cs.LG]

(or arXiv:2510.16974v3 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2510.16974

arXiv-issued DOI via DataCite

Submission history

From: Shurong Lin [view email] [v1] Sun, 19 Oct 2025 19:30:41 UTC (61 KB) [v2] Sun, 8 Feb 2026 02:26:54 UTC (61 KB) [v3] Sat, 28 Mar 2026 20:26:46 UTC (61 KB)

Original source

arXiv

https://arxiv.org/abs/2510.16974

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ProductsFresh

UniLGL: Learning Uniform Place Recognition for FOV-limited/Panoramic LiDAR Global Localization

arXiv:2507.12194v3 Announce Type: replace Abstract: Existing LGL methods typically consider only partial information (e.g., geometric features) from LiDAR observations or are designed for homogeneous LiDAR sensors, overlooking the uniformity in LGL. In this work, a uniform LGL method is proposed, termed UniLGL, which simultaneously achieves spatial and material uniformity, as well as sensor-type uniformity. The key idea of the proposed method is to encode the complete point cloud, which contains both geometric and material information, into a pair of BEV images (i.e., a spatial BEV image and an intensity BEV image). An end-to-end multi-BEV fusion network is designed to extract uniform features, equipping UniLGL with spatial and material uniformity. To ensure robust LGL across heterogeneous

arXiv cs.RO

2mabout 10 hours ago

ReleasesFresh

Generation of Indoor Open Street Maps for Robot Navigation from CAD Files

arXiv:2507.00552v3 Announce Type: replace Abstract: The deployment of autonomous mobile robots is predicated on the availability of environmental maps, yet conventional generation via SLAM (Simultaneous Localization and Mapping) suffers from significant limitations in time, labor, and robustness, particularly in dynamic, large-scale indoor environments where map obsolescence can lead to critical localization failures. To address these challenges, this paper presents a complete and automated system for converting architectural Computer-Aided Design (CAD) files into a hierarchical topometric OpenStreetMap (OSM) representation, tailored for robust life-long robot navigation. Our core methodology involves a multi-stage pipeline that first isolates key structural layers from the raw CAD data an

arXiv cs.RO

2mabout 10 hours ago

ModelsFresh

Distributed Predictive Control Barrier Functions: Towards Scalable Safety Certification in Modular Multi-Agent Systems

arXiv:2603.29560v1 Announce Type: cross Abstract: We consider safety-critical multi-agent systems with distributed control architectures and potentially varying network topologies. While learning-based distributed control enables scalability and high performance, a lack of formal safety guarantees in the face of unforeseen disturbances and unsafe network topology changes may lead to system failure. To address this challenge, we introduce structured control barrier functions (s-CBFs) as a multi-agent safety framework. The s-CBFs are augmented to a distributed predictive control barrier function (D-PCBF), a predictive, optimization-based safety layer that uses model predictions to guarantee recoverable safety at all times. The proposed approach enables a permissive yet formal plug-and-play p

arXiv cs.RO

1mabout 10 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 185 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Passive iFIR filters for data-driven velocity control in robotics

arXiv:2603.29882v1 Announce Type: new Abstract: We present a passive, data-driven velocity control method for nonlinear robotic manipulators that achieves better tracking performance than optimized PID with comparable design complexity. Using only three minutes of probing data, a VRFT-based design identifies passive iFIR controllers that (i) preserve closed-loop stability via passivity constraints and (ii) outperform a VRFT-tuned PID baseline on the Franka Research 3 robot in both joint-space and Cartesian-space velocity control, achieving up to a 74.5% reduction in tracking error for the Cartesian velocity tracking experiment with the most demanding reference model. When the robot end-effector dynamics change, the controller can be re-learned from new data, regaining nominal performance.

arXiv cs.RO

1mabout 10 hours ago

Research PapersFresh

AI Inspires New Research Topics In Materials Science - miragenews.com

<a href="https://news.google.com/rss/articles/CBMihwFBVV95cUxQRlVFdkRBaHRvYkJJdFRlMTZmajEzeFRPU0hGWWdfbi02V1FnTUdVQ2pmY2VZLUV2NlB4V3BFdEVlSVZkUlhRSTZaNWFKMmcyWXJYbnNqbUhMTmp0NnFtMEppOXlPZkJSNHJfck5VSEVYcmUtX1k2QkJlR1BvUEdTTkp3UmlYRkk?oc=5" target="_blank">AI Inspires New Research Topics In Materials Science</a> <font color="#6f6f6f">miragenews.com</font>

Google News: Machine Learning

1mabout 5 hours ago

Research Papers

From brain scans to alloys: Teaching AI to make sense of complex research data - Penn State University

<a href="https://news.google.com/rss/articles/CBMiwAFBVV95cUxPZDFHdkptQ2VUM2hmWjhqQkxoRnBiTWoxMXRRR21MUG5TamdUMlFRWmhvYVNHaFVNREVKU3VmSnVOdDVZYnNLb2ppYXRVRTZmVFVMV1pLTlVhUm9ybTNZbGtvZTdIMnIyMHNpOEk5aU9TSmxxS2Y4V2MwazYwY3JlX1Axbk1nd3pfcWhFdUJaaDJWRXJaMFIyTTROcmFHeXI3ZzFudXJ2M1h6UHI1LW1Ca1dta2RkM3BiYndocGk3Yjg?oc=5" target="_blank">From brain scans to alloys: Teaching AI to make sense of complex research data</a> <font color="#6f6f6f">Penn State University</font>

GNews AI materials

1m3 months ago

Research PapersFresh

Locating Risk: Task Designers and the Challenge of Risk Disclosure in RAI Content Work

arXiv:2505.24246v4 Announce Type: replace Abstract: As AI systems are increasingly tested and deployed in open-ended and high-stakes domains, crowdworkers are often tasked with responsible AI (RAI) content work. These tasks include labeling violent content, moderating disturbing text, or simulating harmful behavior for red teaming exercises to shape AI system behaviors. While prior research efforts have highlighted the risks to worker well-being associated with RAI content work, far less attention has been paid to how these risks are communicated to workers by task designers or individuals who design and post RAI tasks. Existing transparency frameworks and guidelines, such as model cards, datasheets, and crowdworksheets, focus on documenting model information and dataset collection process

arXiv cs.HC

2mabout 10 hours ago