OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing
arXiv:2604.02618v1 Announce Type: new Abstract: Organizing a large-scale knowledge graph into a typed property graph requires structural decisions -- which entities become nodes, which properties become edges, and what schema governs these choices. Existing approaches embed these decisions in pipeline code or extract relations ad hoc, producing schemas that are tightly coupled to their construction process and difficult to reuse for downstream ontology-level tasks. We present an ontology-oriented approach in which the schema is designed from the outset for ontology analysis, entity disambiguation, domain customization, and LLM-guided extraction -- not merely as a byproduct of graph building. The core mechanism is intrinsic-relational routing, which classifies every property as either intri
View PDF HTML (experimental)
Abstract:Organizing a large-scale knowledge graph into a typed property graph requires structural decisions -- which entities become nodes, which properties become edges, and what schema governs these choices. Existing approaches embed these decisions in pipeline code or extract relations ad hoc, producing schemas that are tightly coupled to their construction process and difficult to reuse for downstream ontology-level tasks. We present an ontology-oriented approach in which the schema is designed from the outset for ontology analysis, entity disambiguation, domain customization, and LLM-guided extraction -- not merely as a byproduct of graph building. The core mechanism is intrinsic-relational routing, which classifies every property as either intrinsic or relational and routes it to the corresponding schema module. This routing produces a declarative schema that is portable across storage backends and independently reusable. We instantiate the approach on the January 2026 Wikidata dump. A rule-based cleaning stage identifies a 34.6M-entity core set from the full dump, followed by iterative intrinsic-relational routing that assigns each property to one of 94 modules organized into 8 categories. With tool-augmented LLM support and human review, the schema reaches 93.3% category coverage and 98.0% module assignment among classified entities. Exporting this schema yields a property graph with 34.0M nodes and 61.2M edges across 38 relationship types. We validate the ontology-oriented claim through five applications that consume the schema independently of the construction pipeline: ontology structure analysis, benchmark annotation auditing, entity disambiguation, domain customization, and LLM-guided extraction.
Subjects:
Artificial Intelligence (cs.AI)
Cite as: arXiv:2604.02618 [cs.AI]
(or arXiv:2604.02618v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2604.02618
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yitao Li [view email] [v1] Fri, 3 Apr 2026 01:17:51 UTC (587 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
benchmarkannounceproduct
Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework
arXiv:2512.16284v2 Announce Type: replace Abstract: Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the relate

Voting by mail: a Markov chain model for managing the security risks of election systems
arXiv:2410.13900v3 Announce Type: replace Abstract: The scrutiny surrounding vote-by-mail (VBM) in the United States has increased in recent years, highlighting the need for a rigorous quantitative framework to evaluate the resilience of the absentee voting infrastructure. This paper addresses these issues by introducing a dynamic mathematical modeling framework for performing a risk assessment of VBM processes. We introduce a discrete-time Markov chain (DTMC) to model the VBM process and assess election performance and risk with a novel layered network approach that considers the interplay between VBM processes, malicious and non-malicious threats, and security mitigations. The time-inhomogeneous DTMC framework captures dynamic risks and evaluates performance over time. The DTMC model acc

Out-of-Domain Stress Test for Temporal Braid Group Privilege Escalation Detection
arXiv:2604.02366v1 Announce Type: cross Abstract: In a companion paper, we prove that the Burau-Lyapunov exponent LE discriminates focused from dispersed privilege escalation ratchets in cloud IAM graphs, and that no abelian statistic can replicate this discrimination. To strengthen this claim beyond its synthetic validation corpus, we apply the identical pipeline, with zero parameter retuning, to solar coronal magnetic fields: a physical system with no connection to cloud identity and access management, whose binary eruptive/confined outcome is independently established by decades of astrophysical observation.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Label Vulnerability Detection in Obfuscated Smart Contracts
arXiv:2604.02771v1 Announce Type: new Abstract: Smart contracts are increasingly targeted by adversaries employing obfuscation techniques such as bogus code injection and control flow manipulation to evade vulnerability detection. Existing multimodal methods often process semantic, temporal, and structural features in isolation and fuse them using simple strategies such as concatenation, which neglects cross-modal interactions and weakens robustness, as obfuscation of a single modality can sharply degrade detection accuracy. To address these challenges, we propose ContractShield, a robust multimodal framework with a novel fusion mechanism that effectively correlates multiple complementary features through a three-level fusion. Self-attention first identifies patterns that indicate vulnerab

Why Your AI Agent Keeps Getting It Wrong: The Three-Layer Architecture Every Data Leader Needs to…
Why Your AI Agent Keeps Getting It Wrong: The Three-Layer Architecture Every Data Leader Needs to Know Your AI agent is not failing because the model is bad. It is failing because the architecture feeding the model is incomplete. The agent does not know what your “revenue” number means. It cannot see the CRM data it needs. It does not know that this question should be answered by the finance persona, not the sales one. The model is doing its job. The infrastructure around it is not. This is the defining challenge of enterprise AI in 2026. Everyone has deployed agents. Most of those agents produce responses that are confidently wrong, inconsistently right, or too generic to act on. The gap between a demo that impresses and an agent that actually drives business outcomes comes down to three




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!