Efficient Path Query Processing in Relational Database Systems
arXiv:2604.02553v1 Announce Type: new Abstract: Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering
View PDF
Abstract:Path queries are crucial for property graphs, and there is growing interest in queries that combine regular expressions over labels with constraints on property values of vertices and edges. Efficient evaluation of such general path queries requires that intermediate results be eliminated early when there is no possible completion to a full result path. Neither state-of-the-art (SOA) graph DBMS nor relational DBMS currently can do this effectively for a large class of queries. We show that this problem can be addressed by giving a relational optimizer ``a little help'' by specifying early filtering opportunities explicitly in the query. To this end, we propose ReCAP, an abstraction that greatly simplifies the implementation of early filtering techniques for any type of property constraint for which such early filtering can be derived. No matter how complex the constraint, one only needs to implement (1) an NFA-style state transition function and (2) a handful of functions that mirror those needed for user-defined aggregates. We show that when using ReCAP, a standard relational DBMS like DuckDB can effectively push property constraints deep into the query plan, beating the SOA graph and relational DBMS by a factor up to 400,000 over a variety of queries and input graphs.
Subjects:
Databases (cs.DB)
Cite as: arXiv:2604.02553 [cs.DB]
(or arXiv:2604.02553v1 [cs.DB] for this version)
https://doi.org/10.48550/arXiv.2604.02553
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Diego Rivera Correa [view email] [v1] Thu, 2 Apr 2026 22:07:13 UTC (556 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
announcevaluationarxiv
I Built 3 APIs for Turkey’s Used-Car Market with Apify
Turkey’s used-car market is massive, fragmented, and surprisingly hard to work with if you want structured data. Listings live across marketplaces, dealer pages are inconsistent, pricing changes fast, and even simple questions like “What is this car worth?” or “Which dealers dominate Istanbul for this brand?” are harder than they should be. So I built three focused APIs on top of Apify to solve different layers of the problem: A listing extraction API for Arabam A valuation API for Arabam + Sahibinden A dealer intelligence API for Arabam + Sahibinden All three are built for developers, analysts, insurers, lenders, marketplaces, and automotive businesses that need clean Turkish vehicle data instead of brittle scraping scripts. 1. Arabam.com Vehicle Scraper API The first API is the raw data

Simple parallel estimation of the partition ratio for Gibbs distributions
arXiv:2505.18324v2 Announce Type: replace-cross Abstract: We consider the problem of estimating the partition function $Z(\beta)=\sum_x \exp(\beta(H(x))$ of a Gibbs distribution with the Hamiltonian $H:\Omega\rightarrow\{0\}\cup[1,n]$. As shown in [Harris & Kolmogorov 2024], the log-ratio $q=\ln (Z(\beta_{\max})/Z(\beta_{\min}))$ can be estimated with accuracy $\epsilon$ using $O(\frac{q \log n}{\epsilon^2})$ calls to an oracle that produces a sample from the Gibbs distribution for parameter $\beta\in[\beta_{\min},\beta_{\max}]$. That algorithm is inherently sequential, or {\em adaptive}: the queried values of $\beta$ depend on previous samples. Recently, [Liu, Yin & Zhang 2024] developed a non-adaptive version that needs $O( q (\log^2 n) (\log q + \log \log n + \epsilon^{-2}) )$ samples.

Online Graph Coloring for $k$-Colorable Graphs
arXiv:2511.16100v2 Announce Type: replace Abstract: We study the problem of online graph coloring for $k$-colorable graphs. The best previously known deterministic algorithm uses $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors for general $k$ and $\widetilde{O}(n^{5/6})$ colors for $k = 4$, both given by Kierstead in 1998. In this paper, we finally break this barrier, achieving the first major improvement in nearly three decades. Our results are summarized as follows: (1) $k \geq 5$ case. We provide a deterministic online algorithm to color $k$-colorable graphs with $\widetilde{O}(n^{1-\frac{1}{k(k-1)/2}})$ colors, significantly improving the current upper bound of $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors. Our algorithm also matches the best-known bound for $k = 4$ ($\widetilde{O}(n^{5/6})$ c
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

HarshAI: I Built a Zapier Killer in 40 Days (Open Source)
HarshAI: I Built a Zapier Killer in 40 Days (Open Source) 40 days, 90 planned features, 44% complete. Here's what I built. Why I Started Zapier is expensive. Make.com has a learning curve. I wanted something: ✅ Free open source ✅ Drag-drop builder ✅ Self-hostable ✅ Built for AI workflows So I started building HarshAI . What's Built (Days 1-40) Phase 1: Core Builder (Days 1-15) Drag-drop workflow builder Node-based interface Real-time connections Mobile-responsive design Template system Phase 2: Execution Engine (Days 16-25) Workflow execution engine Real API integrations (Gmail, Twitter, Notion, Slack) Test mode (no credentials needed) Error handling Execution history Phase 3: Advanced Features (Days 26-35) Background scheduler (cron) Email notifications Analytics dashboard Webhook trigger

40 Days of Building HarshAI: What I Learned About AI Automation
40 Days of Building HarshAI: What I Learned About AI Automation 40 days. 90 planned features. Countless lessons. Here's what building in public taught me. The Journey So Far Started March 31, 2026. Today is April 6. In 7 days, I've completed 40 days worth of MVP features. Progress: 40/90 (44.4%) 5 Big Lessons 1. Webhooks Are Harder Than They Look Day 31-35 was ALL about webhooks. What seemed simple became: HMAC signature verification (Stripe-style security) Retry logic with exponential backoff Analytics dashboard Event-based filters Lesson: Enterprise features take time. Don't underestimate. 2. Version Control for Workflows is Essential Day 39: Workflow versioning. Users WILL: Break their workflows Want to rollback Need to compare versions Built: Auto-save, version history, rollback, diff

Simple parallel estimation of the partition ratio for Gibbs distributions
arXiv:2505.18324v2 Announce Type: replace-cross Abstract: We consider the problem of estimating the partition function $Z(\beta)=\sum_x \exp(\beta(H(x))$ of a Gibbs distribution with the Hamiltonian $H:\Omega\rightarrow\{0\}\cup[1,n]$. As shown in [Harris & Kolmogorov 2024], the log-ratio $q=\ln (Z(\beta_{\max})/Z(\beta_{\min}))$ can be estimated with accuracy $\epsilon$ using $O(\frac{q \log n}{\epsilon^2})$ calls to an oracle that produces a sample from the Gibbs distribution for parameter $\beta\in[\beta_{\min},\beta_{\max}]$. That algorithm is inherently sequential, or {\em adaptive}: the queried values of $\beta$ depend on previous samples. Recently, [Liu, Yin & Zhang 2024] developed a non-adaptive version that needs $O( q (\log^2 n) (\log q + \log \log n + \epsilon^{-2}) )$ samples.

Near-Optimal Space Lower Bounds for Streaming CSPs
arXiv:2604.01400v1 Announce Type: cross Abstract: In a streaming constraint satisfaction problem (streaming CSP), a $p$-pass algorithm receives the constraints of an instance sequentially, making $p$ passes over the input in a fixed order, with the goal of approximating the maximum fraction of satisfiable constraints. We show near optimal space lower bounds for streaming CSPs, improving upon prior works. (1) Fei, Minzer and Wang (\textit{STOC 2026}) showed that for any CSP, the basic linear program defines a threshold $\alpha_{\mathrm{LP}}\in [0,1]$ such that, for any $\varepsilon > 0$, an $(\alpha_{\mathrm{LP}} - \varepsilon)$-approximation can be achieved using constant passes and polylogarithmic space, whereas achieving $(\alpha_{\mathrm{LP}} + \varepsilon)$-approximation requires $\Ome


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!