Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessNvidia acquisition of SchedMD sparks worry among AI specialists about software access - ReutersGNews AI NVIDIALumentum Holdings (LITE) Is Up 26.3% After Nvidia-Backed $2 Billion AI Optics Expansion - Has The Bull Case Changed? - simplywall.stGNews AI NVIDIANvidia acquisition of SchedMD sparks worry among AI specialists about software access - TradingViewGNews AI NVIDIAMicrosoft’s new AI models signal its independence while challenging OpenAI and Google - eMarketerGNews AI MicrosoftWhy TSMC grew four times faster than its foundry rivals in 2025 — price hikes, vertical integration, and commanding technology lead pay dividendstomshardware.comThe Complete DevSecOps Engineer Career Guide: From Pipeline Security to Platform Architect in 2026DEV CommunityOpenAI’s $1M API Credits, Holos’ Agentic Web, and Xpertbench’s Expert TasksDEV CommunitySemantic matching in graph space without matrix computation and hallucinations and no GPUdiscuss.huggingface.coWhy We Built 5 Products on FastAPI + Next.js (and Would Do It Again)DEV CommunityHow We Run 5 Live SaaS Products on $35/Month in InfrastructureDEV CommunityOur Email Provider Banned Us Overnight -- Here's What We LearnedDEV CommunityCan TensorWave Leapfrog Nvidia’s Big Moat? - The InformationGNews AI NVIDIABlack Hat USAAI BusinessBlack Hat AsiaAI BusinessNvidia acquisition of SchedMD sparks worry among AI specialists about software access - ReutersGNews AI NVIDIALumentum Holdings (LITE) Is Up 26.3% After Nvidia-Backed $2 Billion AI Optics Expansion - Has The Bull Case Changed? - simplywall.stGNews AI NVIDIANvidia acquisition of SchedMD sparks worry among AI specialists about software access - TradingViewGNews AI NVIDIAMicrosoft’s new AI models signal its independence while challenging OpenAI and Google - eMarketerGNews AI MicrosoftWhy TSMC grew four times faster than its foundry rivals in 2025 — price hikes, vertical integration, and commanding technology lead pay dividendstomshardware.comThe Complete DevSecOps Engineer Career Guide: From Pipeline Security to Platform Architect in 2026DEV CommunityOpenAI’s $1M API Credits, Holos’ Agentic Web, and Xpertbench’s Expert TasksDEV CommunitySemantic matching in graph space without matrix computation and hallucinations and no GPUdiscuss.huggingface.coWhy We Built 5 Products on FastAPI + Next.js (and Would Do It Again)DEV CommunityHow We Run 5 Live SaaS Products on $35/Month in InfrastructureDEV CommunityOur Email Provider Banned Us Overnight -- Here's What We LearnedDEV CommunityCan TensorWave Leapfrog Nvidia’s Big Moat? - The InformationGNews AI NVIDIA
AI NEWS HUBbyEIGENVECTOREigenvector

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

arXiv cs.AIby Jakob Prange, Emmanuele ChersoniApril 6, 20261 min read0 views
Source Quiz

arXiv:2305.18915v1 Announce Type: cross Abstract: In this work we build upon negative results from an attempt at language modeling with predicted semantic structure, in order to establish empirical lower bounds on what could have made the attempt successful. More specifically, we design a concise binary vector representation of semantic structure at the lexical level and evaluate in-depth how good an incremental tagger needs to be in order to achieve better-than-baseline performance with an end-to-end semantic-bootstrapping language model. We envision such a system as consisting of a (pretrained) sequential-neural component and a hierarchical-symbolic component working together to generate text with low surprisal and high linguistic interpretability. We find that (a) dimensionality of the

View PDF

Abstract:In this work we build upon negative results from an attempt at language modeling with predicted semantic structure, in order to establish empirical lower bounds on what could have made the attempt successful. More specifically, we design a concise binary vector representation of semantic structure at the lexical level and evaluate in-depth how good an incremental tagger needs to be in order to achieve better-than-baseline performance with an end-to-end semantic-bootstrapping language model. We envision such a system as consisting of a (pretrained) sequential-neural component and a hierarchical-symbolic component working together to generate text with low surprisal and high linguistic interpretability. We find that (a) dimensionality of the semantic vector representation can be dramatically reduced without losing its main advantages and (b) lower bounds on prediction quality cannot be established via a single score alone, but need to take the distributions of signal and noise into account.

Comments: To appear at SEM 2023, Toronto

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2305.18915 [cs.CL]

(or arXiv:2305.18915v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2305.18915

arXiv-issued DOI via DataCite

Submission history

From: Jakob Prange [view email] [v1] Tue, 30 May 2023 10:09:48 UTC (546 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Empirical S…modellanguage mo…announcepredictioninterpretab…componentarXiv cs.AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 209 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!