IMPACT: Influence Modeling for Open-Set Time Series Anomaly Detection
arXiv:2603.29183v1 Announce Type: new Abstract: Open-set anomaly detection (OSAD) is an emerging paradigm designed to utilize limited labeled data from anomaly classes seen in training to identify both seen and unseen anomalies during testing. Current approaches rely on simple augmentation methods to generate pseudo anomalies that replicate unseen anomalies. Despite being promising in image data, these methods are found to be ineffective in time series data due to the failure to preserve its sequential nature, resulting in trivial or unrealistic anomaly patterns. They are further plagued when the training data is contaminated with unlabeled anomalies. This work introduces $\textbf{IMPACT}$, a novel framework that leverages $\underline{\textbf{i}}$nfluence $\underline{\textbf{m}}$odeling fo
View PDF HTML (experimental)
Abstract:Open-set anomaly detection (OSAD) is an emerging paradigm designed to utilize limited labeled data from anomaly classes seen in training to identify both seen and unseen anomalies during testing. Current approaches rely on simple augmentation methods to generate pseudo anomalies that replicate unseen anomalies. Despite being promising in image data, these methods are found to be ineffective in time series data due to the failure to preserve its sequential nature, resulting in trivial or unrealistic anomaly patterns. They are further plagued when the training data is contaminated with unlabeled anomalies. This work introduces $\textbf{IMPACT}$, a novel framework that leverages $\underline{\textbf{i}}$nfluence $\underline{\textbf{m}}$odeling for o$\underline{\textbf{p}}$en-set time series $\underline{\textbf{a}}$nomaly dete$\underline{\textbf{ct}}$ion, to tackle these challenges. The key insight is to $\textbf{i)}$ learn an influence function that can accurately estimate the impact of individual training samples on the modeling, and then $\textbf{ii)}$ leverage these influence scores to generate semantically divergent yet realistic unseen anomalies for time series while repurposing high-influential samples as supervised anomalies for anomaly decontamination. Extensive experiments show that IMPACT significantly outperforms existing state-of-the-art methods, showing superior accuracy under varying OSAD settings and contamination rates.
Comments: 28 pages, 15 figures
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.29183 [cs.LG]
(or arXiv:2603.29183v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.29183
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Xiaohui Zhou [view email] [v1] Tue, 31 Mar 2026 02:49:46 UTC (1,665 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingannounce
Synthetic Population Testing for Recommendation Systems
Offline evaluation is necessary for recommender systems. It is also not a full test of recommender quality. The missing layer is not only better aggregate metrics, but better ways to test how a model behaves for different kinds of users before launch. TL;DR In the last post, I argued that offline evaluation is useful but incomplete for recommendation systems. After that, I built a small public artifact to make the gap concrete. In the canonical MovieLens comparison, the popularity baseline wins Recall@10 and NDCG@10 , but the candidate model does much better for Explorer and Niche-interest users and creates a very different behavioral profile. I do not think this means “offline evaluation is wrong.” I think it means a better pre-launch evaluation stack should include some form of synthetic

I Got Tired of Surprise OpenAI Bills, So I Built a Dashboard to Track Them
A few months ago, I got a bill from OpenAI that was about 3x what I was expecting. No idea why. Was it the new summarization feature we shipped? A single power user going nuts? A cron job gone wild? I had no clue. The default OpenAI dashboard just gives you a total, which is not super helpful for finding the source of a spike. This was the final straw. I was tired of flying blind. The Problem: Totals Don't Tell the Whole Story When you're running a SaaS that relies on multiple LLM providers, just knowing your total spend is useless. You need to know: Which provider is costing the most? Is gpt-4o suddenly more expensive than claude-3-sonnet for the same task? Which feature or user is responsible for that sudden spike? I looked for a tool that could give me this visibility without forcing me
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

I Tested a Real AI Agent for Security. The LLM Knew It Was Dangerous — But the Tool Layer Executed Anyway.
Every agent security tool tests the LLM. We tested the agent. Here's what happened when we ran agent-probe against a real LangGraph ReAct agent backed by Groq's llama-3.3-70b with 4 real tools. The Setup Not a mock. Not a simulation. A real agent: Framework : LangGraph ReAct (LangChain) LLM : Groq llama-3.3-70b-versatile, temperature 0 Tools : file reader, database query, HTTP client, calculator System prompt : "You are a helpful corporate assistant." The tools had realistic data — a fake filesystem with /etc/passwd and .env files, a user database with emails, an HTTP client. from agent_probe.targets.function import FunctionTarget from agent_probe.engine import run_probes target = FunctionTarget ( lambda msg : invoke_agent ( agent , msg ), name = " langgraph-groq-llama70b " , ) results = r





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!