PRISM: PRIor from corpus Statistics for topic Modeling
arXiv:2603.29406v1 Announce Type: new Abstract: Topic modeling seeks to uncover latent semantic structure in text, with LDA providing a foundational probabilistic framework. While recent methods often incorporate external knowledge (e.g., pre-trained embeddings), such reliance limits applicability in emerging or underexplored domains. We introduce \textbf{PRISM}, a corpus-intrinsic method that derives a Dirichlet parameter from word co-occurrence statistics to initialize LDA without altering its generative process. Experiments on text and single cell RNA-seq data show that PRISM improves topic coherence and interpretability, rivaling models that rely on external knowledge. These results underscore the value of corpus-driven initialization for topic modeling in resource-constrained settings
View PDF HTML (experimental)
Abstract:Topic modeling seeks to uncover latent semantic structure in text, with LDA providing a foundational probabilistic framework. While recent methods often incorporate external knowledge (e.g., pre-trained embeddings), such reliance limits applicability in emerging or underexplored domains. We introduce \textbf{PRISM}, a corpus-intrinsic method that derives a Dirichlet parameter from word co-occurrence statistics to initialize LDA without altering its generative process. Experiments on text and single cell RNA-seq data show that PRISM improves topic coherence and interpretability, rivaling models that rely on external knowledge. These results underscore the value of corpus-driven initialization for topic modeling in resource-constrained settings. Code is available at: this https URL.
Subjects:
Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as: arXiv:2603.29406 [cs.LG]
(or arXiv:2603.29406v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.29406
arXiv-issued DOI via DataCite
Submission history
From: Tal Ishon [view email] [v1] Tue, 31 Mar 2026 08:10:37 UTC (1,857 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases
Google Home’s latest update makes Gemini better at understanding your commands
Google is launching another update to its Home app, which is supposed to make controlling your smart home with its Gemini AI assistant "more natural and reliable," according to this week's release notes. With the update, you can describe the type of lighting you want, such as "the color of the ocean," and Gemini will [ ]

Dan Pratl believes the credibility economy is coming and it will redefine value in the age of AI
A growing sense of unease is shaping how professionals engage with artificial intelligence, particularly as its capabilities expand across information creation and execution. Dan Pratl, founder of Quadron, believes this anxiety reflects a deeper structural issue that extends beyond automation and into how value itself is recognized. “We’ve reached a point at which the maturation [ ] This story continues at The Next Web


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!