On computing and the complexity of computing higher-order $U$-statistics, exactly
arXiv:2508.12627v2 Announce Type: replace Abstract: Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill this gap by presenting several results related to the computational aspect of $U$-statistics. First, we derive a useful decomposition from a $m$-th order $U$-statistic to a linear combination of $V$-statistics with orders not exceeding $m$, which are generally more feasible to compute. Second, we explore the connection between exactly computing $V$-statistics and Einstein summation, a tool often used in computational mathematics an
View PDF HTML (experimental)
Abstract:Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill this gap by presenting several results related to the computational aspect of $U$-statistics. First, we derive a useful decomposition from a $m$-th order $U$-statistic to a linear combination of $V$-statistics with orders not exceeding $m$, which are generally more feasible to compute. Second, we explore the connection between exactly computing $V$-statistics and Einstein summation, a tool often used in computational mathematics and quantum computing to accelerate tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing $U$-statistics, based on the treewidth of a particular graph associated with the $U$-statistic kernel. The above ingredients lead to (1) a new, much more runtime-efficient algorithm to exactly compute general higher-order $U$-statistics, and (2) a more streamlined characterization of runtime complexity of computing $U$-statistics. We develop an accompanying open-source package called \texttt{u-stats} in both Python (this https URL) and R (this https URL). We demonstrate through three examples in statistics that \texttt{u-stats} achieves impressive runtime performance compared to existing benchmarks. This paper also aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas to further advance the algorithmic development of $U$-statistics and (2) to lift the burden of implementing higher-order $U$-statistics from practitioners.
Comments: Comments are welcome! 71 pages, 8 tables, 5 figures. An accompanying Python package is available at: this https URL or this https URL
Subjects:
Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA); Computation (stat.CO); Methodology (stat.ME)
Cite as: arXiv:2508.12627 [stat.ML]
(or arXiv:2508.12627v2 [stat.ML] for this version)
https://doi.org/10.48550/arXiv.2508.12627
arXiv-issued DOI via DataCite
Submission history
From: Ruiqi Zhang [view email] [v1] Mon, 18 Aug 2025 05:01:10 UTC (49 KB) [v2] Tue, 31 Mar 2026 06:15:41 UTC (68 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
benchmarkannounceopen-sourceAI #162: Visions of Mythos
Anthropic had some problem with leaks this week. We learned that they are sitting on a new larger-than-Opus AI model, Mythos, that they believe offers a step change in cyber capabilities. We also got a full leak of the source for Claude Code. Oh, and Axios was compromised, on the heels of LiteLLM. This looks to be getting a lot more common. Defense beats offense in most cases, but offense is getting a lot more shots on goal than it used to. The AI Doc: Or How I Became an Aplocayloptimist came out this week. I gave it 4.5/5 stars, and I think the world would be better off if more people saw it. I am not generally a fan of documentary movies, but this is probably my new favorite, replacing The King of Kong: A Fistful of Quarters. There was also the usual background hum of quite a lot of thin
Noom acquires compounding pharmacy to expand beyond weight loss
Weight-loss company Noom has acquired 503A pharmacy Tailor Made Compounding, which Noom said will allow it to expand its behavior change programs and move beyond weight health. Tailor Made provides sterile and non-sterile compounding through its pharmacy practice focused on aging. It offers compounded drugs, including hormone replacement therapies and peptide therapies, as well as pharmacist-formulated supplements and cosmetics.
Korean hospitals outpace global peers in digital maturity: pilot study
South Korean hospitals assessed in a pilot digital maturity study scored higher in the HIMSS Digital Health Indicator than global averages, according to a joint report by the Korea Health Industry Development Institute and Healthcare Information and Management Systems Society. Last year in April, KHIDI signed a memorandum of understanding with HIMSS on the Korea Digital Health Indicator (Ko-DHI), an initiative to assess the digital maturity of Korean hospitals.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Beyond Metadata: Multimodal, Policy-Aware Detection of YouTube Scam Videos
arXiv:2509.23418v2 Announce Type: replace Abstract: YouTube is a major platform for information and entertainment, but its wide accessibility also makes it attractive for scammers to upload deceptive or malicious content. Prior detection approaches rely largely on textual or statistical metadata, such as titles, descriptions, view counts, or likes, which are effective in many cases but can be evaded through benign-looking text, manipulated statistics, or other obfuscation strategies (e.g., 'Leetspeak'), while ignoring visual cues. In this study, we systematically investigate multimodal approaches for detecting YouTube scams. Our dataset consolidates established scam categories and augments them with full-length videos and policy-grounded reasoning annotations. Experiments show that a text-

Online Flow Time Minimization: Tight Bounds for Non-Preemptive Algorithms
arXiv:2511.03485v3 Announce Type: replace Abstract: This paper studies the online scheduling problem of minimizing total flow time for $n$ jobs on $m$ identical machines. A classical $\Omega(n)$ lower bound shows that no deterministic single-machine algorithm can beat the trivial greedy, even when $n$ is known in advance. However, this barrier is specific to deterministic algorithms on a single machine, leaving open what randomization, multiple machines, or the kill-and-restart capability can achieve. We give a nearly complete answer. For randomized non-preemptive algorithms, we establish a tight $\Theta(\sqrt{n/m})$ competitive ratio, which also improves the best offline approximation to $O(\sqrt{n/m})$. For deterministic non-preemptive algorithms on multiple machines, we prove an $O(n/m^

On the average-case complexity landscape for Tensor-Isomorphism-complete problems over finite fields
arXiv:2604.00591v1 Announce Type: cross Abstract: In Grochow and Qiao (SIAM J. Comput., 2021), the complexity class Tensor Isomorphism (TI) was introduced and isomorphism problems for groups, algebras, and polynomials were shown to be TI-complete. In this paper, we study average-case algorithms for several TI-complete problems over finite fields, including algebra isomorphism, matrix code conjugacy, and $4$-tensor isomorphism. Our main results are as follows. Over the finite field of order $q$, we devise (1) average-case polynomial-time algorithms for algebra isomorphism and matrix code conjugacy that succeed in a $1/\Theta(q)$ fraction of inputs and (2) an average-case polynomial-time algorithm for the $4$-tensor isomorphism that succeeds in a $1/q^{\Theta(1)}$ fraction of inputs. Prior t

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!