Efficient Path Query Processing in Relational Database Systems

arXiv cs.DBby Diego Rivera Correa, Mirek RiedewaldApril 6, 20262 min read0 views

arXiv:2604.02553v1 Announce Type: new Abstract: Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering

View PDF

Abstract:Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering techniques for any type of property constraint for which such early filtering can be derived. No matter how complex the constraint, one only needs to implement (1) an NFA-style state transition function and (2) a handful of functions that mirror those needed for user-defined aggregates. We show that when using ReCAP, a standard relational DBMS like DuckDB can effectively push property constraints deep into the query plan, beating the SOA graph and relational DBMS by a factor up to 400,000 over a variety of queries and input graphs.

Subjects:

Databases (cs.DB)

Cite as: arXiv:2604.02553 [cs.DB]

(or arXiv:2604.02553v1 [cs.DB] for this version)

https://doi.org/10.48550/arXiv.2604.02553

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Diego Rivera Correa [view email] [v1] Thu, 2 Apr 2026 22:07:13 UTC (556 KB)

Original source

arXiv cs.DB

https://arxiv.org/abs/2604.02553

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announcevaluationarxiv

ProductsLive

I Built 3 APIs for Turkey’s Used-Car Market with Apify

Turkey’s used-car market is massive, fragmented, and surprisingly hard to work with if you want structured data. Listings live across marketplaces, dealer pages are inconsistent, pricing changes fast, and even simple questions like “What is this car worth?” or “Which dealers dominate Istanbul for this brand?” are harder than they should be. So I built three focused APIs on top of Apify to solve different layers of the problem: A listing extraction API for Arabam A valuation API for Arabam + Sahibinden A dealer intelligence API for Arabam + Sahibinden All three are built for developers, analysts, insurers, lenders, marketplaces, and automotive businesses that need clean Turkish vehicle data instead of brittle scraping scripts. 1. Arabam.com Vehicle Scraper API The first API is the raw data

DEV Community

5mabout 1 hour ago

ReleasesFresh

Simple parallel estimation of the partition ratio for Gibbs distributions

arXiv:2505.18324v2 Announce Type: replace-cross Abstract: We consider the problem of estimating the partition function $Z(\beta)=\sum_x \exp(\beta(H(x))$ of a Gibbs distribution with the Hamiltonian $H:\Omega\rightarrow\{0\}\cup[1,n]$. As shown in [Harris & Kolmogorov 2024], the log-ratio $q=\ln (Z(\beta_{\max})/Z(\beta_{\min}))$ can be estimated with accuracy $\epsilon$ using $O(\frac{q \log n}{\epsilon^2})$ calls to an oracle that produces a sample from the Gibbs distribution for parameter $\beta\in[\beta_{\min},\beta_{\max}]$. That algorithm is inherently sequential, or {\em adaptive}: the queried values of $\beta$ depend on previous samples. Recently, [Liu, Yin & Zhang 2024] developed a non-adaptive version that needs $O( q (\log^2 n) (\log q + \log \log n + \epsilon^{-2}) )$ samples.

arXiv cs.DS

2mabout 5 hours ago

Research PapersFresh

Online Graph Coloring for $k$-Colorable Graphs

arXiv:2511.16100v2 Announce Type: replace Abstract: We study the problem of online graph coloring for $k$-colorable graphs. The best previously known deterministic algorithm uses $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors for general $k$ and $\widetilde{O}(n^{5/6})$ colors for $k = 4$, both given by Kierstead in 1998. In this paper, we finally break this barrier, achieving the first major improvement in nearly three decades. Our results are summarized as follows: (1) $k \geq 5$ case. We provide a deterministic online algorithm to color $k$-colorable graphs with $\widetilde{O}(n^{1-\frac{1}{k(k-1)/2}})$ colors, significantly improving the current upper bound of $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors. Our algorithm also matches the best-known bound for $k = 4$ ($\widetilde{O}(n^{5/6})$ c

arXiv cs.DS

2mabout 5 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 295 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

ReleasesLive

HarshAI: I Built a Zapier Killer in 40 Days (Open Source)

HarshAI: I Built a Zapier Killer in 40 Days (Open Source) 40 days, 90 planned features, 44% complete. Here's what I built. Why I Started Zapier is expensive. Make.com has a learning curve. I wanted something: ✅ Free open source ✅ Drag-drop builder ✅ Self-hostable ✅ Built for AI workflows So I started building HarshAI . What's Built (Days 1-40) Phase 1: Core Builder (Days 1-15) Drag-drop workflow builder Node-based interface Real-time connections Mobile-responsive design Template system Phase 2: Execution Engine (Days 16-25) Workflow execution engine Real API integrations (Gmail, Twitter, Notion, Slack) Test mode (no credentials needed) Error handling Execution history Phase 3: Advanced Features (Days 26-35) Background scheduler (cron) Email notifications Analytics dashboard Webhook trigger

DEV Community

2m39 minutes ago

ReleasesLive

40 Days of Building HarshAI: What I Learned About AI Automation

40 Days of Building HarshAI: What I Learned About AI Automation 40 days. 90 planned features. Countless lessons. Here's what building in public taught me. The Journey So Far Started March 31, 2026. Today is April 6. In 7 days, I've completed 40 days worth of MVP features. Progress: 40/90 (44.4%) 5 Big Lessons 1. Webhooks Are Harder Than They Look Day 31-35 was ALL about webhooks. What seemed simple became: HMAC signature verification (Stripe-style security) Retry logic with exponential backoff Analytics dashboard Event-based filters Lesson: Enterprise features take time. Don't underestimate. 2. Version Control for Workflows is Essential Day 39: Workflow versioning. Users WILL: Break their workflows Want to rollback Need to compare versions Built: Auto-save, version history, rollback, diff

DEV Community

3m35 minutes ago

ReleasesFresh

Simple parallel estimation of the partition ratio for Gibbs distributions

arXiv cs.DS

2mabout 5 hours ago

ReleasesFresh

Near-Optimal Space Lower Bounds for Streaming CSPs

arXiv:2604.01400v1 Announce Type: cross Abstract: In a streaming constraint satisfaction problem (streaming CSP), a $p$-pass algorithm receives the constraints of an instance sequentially, making $p$ passes over the input in a fixed order, with the goal of approximating the maximum fraction of satisfiable constraints. We show near optimal space lower bounds for streaming CSPs, improving upon prior works. (1) Fei, Minzer and Wang (\textit{STOC 2026}) showed that for any CSP, the basic linear program defines a threshold $\alpha_{\mathrm{LP}}\in [0,1]$ such that, for any $\varepsilon > 0$, an $(\alpha_{\mathrm{LP}} - \varepsilon)$-approximation can be achieved using constant passes and polylogarithmic space, whereas achieving $(\alpha_{\mathrm{LP}} + \varepsilon)$-approximation requires $\Ome

arXiv cs.DS

2mabout 5 hours ago