[D] Hash table aspects of ReLU neural networks

Reddit r/MachineLearningby /u/oatmealcraving https://www.reddit.com/user/oatmealcravingApril 5, 20261 min read1 views

If you collect the ReLU decisions into a diagonal matrix with 0 or 1 entries then a ReLU layer is DWx, where W is the weight matrix and x the input. What then is Wₙ₊₁Dₙ where Wₙ₊₁ is the matrix of weights for the next layer? It can be seen as a (locality sensitive) hash table lookup of a linear mapping (effective matrix). It can also be seen as an associative memory in itself with Dₙ as the key. There is a discussion here: https://discourse.numenta.org/t/gated-linear-associative-memory/12300 The viewpoints are not fully integrated yet and there are notation problems. Nevertheless the concepts are very simple and you could hope that people can follow along without difficulty, despite the arguments being in such a preliminary state. submitted by /u/oatmealcraving [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/MachineLearning →

Original source

Reddit r/MachineLearning

https://www.reddit.com/r/MachineLearning/comments/1scvhk8/d_hash_table_aspects_of_relu_neural_networks/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

neural network

ModelsLive

Paper close reading: "Why Language Models Hallucinate"

People often talk about paper reading as a skill, but there aren’t that many examples of people walking through how they do it. Part of this is a problem of supply: it’s expensive to document one’s thought process for any significant length of time, and there’s the additional cost of probably looking quite foolish when doing so. Part of this is simply a question of demand: far more people will read a short paragraph or tweet thread summarizing a paper and offering some pithy comments, than a thousand-word post of someone’s train of thought as they look through a paper. Thankfully, I’m willing to risk looking a bit foolish, and I’m pretty unresponsive to demand at this present moment, so I’ll try and write down my thought processes as I read through as much of a a paper I can in 1-2 hours.

LessWrong AI

15m37 minutes ago

ModelsFresh

Goal-Conditioned Neural ODEs with Guaranteed Safety and Stability for Learning-Based All-Pairs Motion Planning

arXiv:2604.02821v1 Announce Type: new Abstract: This paper presents a learning-based approach for all-pairs motion planning, where the initial and goal states are allowed to be arbitrary points in a safe set. We construct smooth goal-conditioned neural ordinary differential equations (neural ODEs) via bi-Lipschitz diffeomorphisms. Theoretical results show that the proposed model can provide guarantees of global exponential stability and safety (safe set forward invariance) regardless of goal location. Moreover, explicit bounds on convergence rate, tracking error, and vector field magnitude are established. Our approach admits a tractable learning implementation using bi-Lipschitz neural networks and can incorporate demonstration data. We illustrate the effectiveness of the proposed method

arXiv cs.RO

1mabout 3 hours ago

Analyst NewsFresh

Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks

arXiv:2604.03068v1 Announce Type: cross Abstract: We analyze the one-pass stochastic gradient descent dynamics of a two-layer neural network with quadratic activations in a teacher--student framework. In the high-dimensional regime, where the input dimension $N$ and the number of samples $M$ diverge at fixed ratio $\alpha = M/N$, and for finite hidden widths $(p,p^*)$ of the student and teacher, respectively, we study the low-dimensional ordinary differential equations that govern the evolution of the student--teacher and student--student overlap matrices. We show that overparameterization ($p>p^*$) only modestly accelerates escape from a plateau of poor generalization by modifying the prefactor of the exponential decay of the loss. We then examine how unconstrained weight norms introduce

arXiv stat.ML

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 224 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

How to optimize for AI search: 12 proven LLM visibility tactics - Search Engine Land

How to optimize for AI search: 12 proven LLM visibility tactics Search Engine Land

GNews AI search

1m2 months ago

ModelsFresh

User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation

arXiv:2604.03014v1 Announce Type: new Abstract: Multi-modal recommendation (MMR) enriches item representations by introducing item content, e.g., visual and textual descriptions, to improve upon interaction-only recommenders. The success of MMR hinges on aligning these content modalities with user preferences derived from interaction data, yet dominant practices based on disentangling modality-invariant preference-driving signals from modality-specific preference-irrelevant noises are flawed. First, they assume a one-size-fits-all relevance of item content to user preferences for all users, which contradicts the user-conditional fact of preferences. Second, they optimize pairwise contrastive losses separately toward cross-modal alignment, systematically ignoring higher-order dependencies i

arXiv cs.IR

2mabout 3 hours ago

ModelsFresh

Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference

arXiv:2604.02985v1 Announce Type: new Abstract: With the wide adoption of language models for IR -- and specifically RAG systems -- the latency of the underlying LLM becomes a crucial bottleneck, since the long contexts of retrieved passages lead large prompts and therefore, compute increase. Prompt compression, which reduces the size of input prompts while aiming to preserve performance on downstream tasks, has established itself as a cost-effective and low-latency method for accelerating inference in large language models. However, its usefulness depends on whether the additional preprocessing time during generation is offset by faster decoding. We present the first systematic, large-scale study of this trade-off, with thousands of runs and 30,000 queries across several open-source LLMs

arXiv cs.IR

2mabout 3 hours ago

ModelsFresh

Bilateral Intent-Enhanced Sequential Recommendation with Embedding Perturbation-Based Contrastive Learning

arXiv:2604.02833v1 Announce Type: new Abstract: Accurately modeling users' evolving preferences from sequential interactions remains a central challenge in recommender systems. Recent studies emphasize the importance of capturing multiple latent intents underlying user behaviors. However, existing methods often fail to effectively exploit collective intent signals shared across users and items, leading to information isolation and limited robustness. Meanwhile, current contrastive learning approaches struggle to construct views that are both semantically consistent and sufficiently discriminative. In this work, we propose BIPCL, an end-to-end Bilateral Intent-enhanced, Embedding Perturbation-based Contrastive Learning framework. BIPCL explicitly integrates multi-intent signals into both it

arXiv cs.IR

1mabout 3 hours ago