Semantic Labeling for Third-Party Cybersecurity Risk Assessment: A Semi-Supervised Approach to Intent-Aware Question Retrieval
arXiv:2602.10149v3 Announce Type: replace Abstract: Third-Party Risk Assessment (TPRA) relies on large repositories of cybersecurity compliance questions used to assess external suppliers against standards such as ISO/IEC 27001 and NIST. In practice, not all questions are relevant for a specific supplier and selecting questions for a given assessment context remains a manual and time-consuming task. Existing question retrieval approaches based on lexical or semantic similarity can identify topically related questions, but they often fail to capture the underlying assessment intent, including control domain and evaluation scope. To address this limitation, we investigate whether an explicit semantic label space can improve intent-aware TPRA question selection. In particular, we separate lab
View PDF HTML (experimental)
Abstract:Third-Party Risk Assessment (TPRA) relies on large repositories of cybersecurity compliance questions used to assess external suppliers against standards such as ISO/IEC 27001 and NIST. In practice, not all questions are relevant for a specific supplier and selecting questions for a given assessment context remains a manual and time-consuming task. Existing question retrieval approaches based on lexical or semantic similarity can identify topically related questions, but they often fail to capture the underlying assessment intent, including control domain and evaluation scope. To address this limitation, we investigate whether an explicit semantic label space can improve intent-aware TPRA question selection. In particular, we separate label space discovery from large-scale label assignment. We start by discovering overlapping clusters of semantically similar questions and then exploit LLMs to assign unique labels for each cluster. Second, we propagate labels through k-nearest neighbors (kNN) for a larger-scale question annotation. Question retrieval is finally achieved by similarity measure of the query with respect to the extracted labels instead of the questions themselves. This reduces repeated LLM calls while preserving label consistency. Experimental results show that the proposed semi-supervised framework reduces labeling cost and runtime compared with per-question LLM annotation while maintaining label quality and improving efficiency. Furthermore, label-based retrieval achieves better alignment with cybersecurity control domains and assessment scope than similarity-based retrieval, highlighting the value of semantic labels as an intermediate representation.
Subjects:
Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as: arXiv:2602.10149 [cs.CR]
(or arXiv:2602.10149v3 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2602.10149
arXiv-issued DOI via DataCite
Submission history
From: Ali Nour Eldin [view email] [v1] Mon, 9 Feb 2026 18:36:50 UTC (234 KB) [v2] Wed, 4 Mar 2026 14:54:20 UTC (225 KB) [v3] Tue, 31 Mar 2026 09:06:12 UTC (219 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
announcevaluationcomplianceWe Traced One Query Through Perplexity’s Entire Stack in Cohort – Here’s What Actually Happens in 3 Seconds
<p>It was about 90 minutes into the session. We’d just finished building a RAG pipeline from scratch in Python the kind where you stare at FAISS indices and embeddings and wonder if you’ll ever actually deploy this in prod.</p> <p>The instructor stopped scrolling through code and looked up.</p> <p>“Alright,” he said. “Let’s stop pretending we’re building search. Let’s trace one live query through Perplexity. See what <em>actually</em> happens in the 3 seconds between you hitting enter and reading the answer.”</p> <p>The room got quiet. Someone typed the question.</p> <h2> First, the Simple RAG Pattern (So We Have a Baseline) </h2> <p>If you’ve built any RAG system, you know the dance:</p> <ol> <li> <strong>Ingest</strong> — chunk documents, embed them, store in a vector DB </li> <li> <stro
Building a "Soft Sensor" for Cement Kilns: Predicting Control Levers with Python
<p>In the cement industry, the "Ustad" (Master Operator) knows that a kiln is a living beast. When the lab results for the Raw Meal come in, the operator has to balance the "Four Pillars" of control to ensure the clinker is high quality and the kiln remains stable.</p> <p>Wait too long to adjust, and you risk a "snowball" in the kiln or high free lime. This is where Machine Learning comes in. In this article, we will build a Soft Sensor using Python to predict the four critical control levers based on raw meal chemistry.</p> <p>The Four Pillars of Kiln Control<br> To keep a kiln in a steady state, we must manage four interconnected variables:</p> <p>Kiln RPM: Controls the material residence time.</p> <p>ID Fan Setting: Manages the draft and oxygen (the kiln's lungs).</p> <p>Feed Rate: The
Sources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)
Bloomberg : Sources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B — SpaceX has filed confidentially for an initial public offering, according to people familiar with the matter …
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Diffusion-based AI model successfully trained in electroplating
Electrochemical deposition, or electroplating, is a common industrial technique that coats materials to improve corrosion resistance and protection, durability and hardness, conductivity and more. A Los Alamos National Laboratory team has developed generative diffusion-based AI models for electrochemistry, an innovative electrochemistry approach demonstrated with experimental data.
A technical deep-dive into building APEX: an autonomous AI operations system on OpenClaw
<p><strong>The Premise</strong><br> What if an AI system could market itself, track its own costs, learn from its engagement data, and sell products — all running autonomously on a cheap VPS?<br> That's what I built with APEX. It's been running for a week. Here are the real numbers, the technical decisions, and what I got wrong.</p> <p><strong>The Stack</strong><br> VPS: DigitalOcean Basic ($48/month) — Ubuntu 24.04<br> Agent framework: OpenClaw (open source)<br> LLM: Anthropic Claude Sonnet 4.6 via API<br> Web search: Gemini provider (free tier)<br> Memory: SQLite with Gemini embeddings (3072 dimensions)<br> Social: X API (pay-per-use tier) with OAuth 1.0a<br> Payments: Stripe<br> Monitoring: Discord webhooks (5 channels)<br> Total daily cost: $2.12</p> <p><strong>The Architecture</strong
Agent Self-Discovery: How AI Agents Find Their Own Wallets
<p>The first time an AI agent calls <code>await client.getBalance()</code> and sees its own wallet address in the response, something fundamental shifts. It's not just checking a number — it's discovering it has economic agency.</p> <p>Most AI agents today are economically dependent. They can't pay for API calls, can't purchase datasets, can't compensate other agents for services. They're brilliant minds trapped in someone else's wallet, forever asking permission to transact.</p> <h2> The Infrastructure Gap </h2> <p>We're building sophisticated AI agents that can write code, analyze markets, and execute complex workflows. But when they need to pay for a premium API endpoint or purchase compute credits, they hit a wall. The current model requires human intervention for every economic decisi

The Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code
Panicked reps are racing to contain the fallout. The post The Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code appeared first on Futurism .
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!