NLP SOTA By far, (NOT WORK ON CNN) , Can save 66% FLOPs and same or BETTER accuracy than baseline
NLP SOTA. CNN NOT WORK, IN NLP WE REDUCE 66% FLOPs with same or MORE accuracy, you have the link of github on zenodo page. DeepFocus-BP: Error-Aware Adaptive Backpropagation via Dynamic Alpha-Beta Routing (Achieving 66% FLOPs Reduction with Improved Accuracy) - SOTA NLP Confirmed v3. (Resnet FAIL) 3 posts - 2 participants Read full topic
Could not retrieve the full article text.
Read on discuss.huggingface.co →discuss.huggingface.co
https://discuss.huggingface.co/t/nlp-sota-by-far-not-work-on-cnn-can-save-66-flops-and-same-or-better-accuracy-than-baseline/174971Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
github
ROMAN: A Multiscale Routing Operator for Convolutional Time Series Models
arXiv:2604.02577v1 Announce Type: new Abstract: We introduce ROMAN (ROuting Multiscale representAtioN), a deterministic operator for time series that maps temporal scale and coarse temporal position into an explicit channel structure while reducing sequence length. ROMAN builds an anti-aliased multiscale pyramid, extracts fixed-length windows from each scale, and stacks them as pseudochannels, yielding a compact representation on which standard convolutional classifiers can operate. In this way, ROMAN provides a simple mechanism to control the inductive bias of downstream models: it can reduce temporal invariance, make temporal pooling implicitly coarse-position-aware, and expose multiscale interactions through channel mixing, while often improving computational efficiency by shortening th

OpenClaw Changed How We Use AI. KiloClaw Made It Effortless to Get Started
OpenClaw is a powerful open-source AI agent, but self-hosting it is a pain. KiloClaw is OpenClaw fully hosted and managed by Kilo — sign up, connect your chat apps, and your agent is running in about a minute. No Docker, no YAML, no server babysitting. People are using it for personalized morning briefs, inbox digests, auto-building CRMs, browser automation, GitHub triage, and more. Hosting is $8/month with a 7-day free trial, inference runs through Kilo Gateway at zero markup across 500+ models, and it's free for open-source maintainers. Read All

go-typedpipe: A Typed, Context-Aware Pipe for Go
Background Go channels are one of the best things about the language. But the moment you need context cancellation, error propagation, and safe concurrent shutdown all at once, a simple chan T starts asking you to write a lot of code just to use it correctly. A common pattern looks something like this: out := make ( chan Result , len ( urls )) errc := make ( chan error , 1 ) go func () { defer close ( out ) for _ , url := range urls { select { case ctx . Done () : errc ctx . Err () return default : } resp , err := fetch ( ctx , url ) if err != nil { errc err return } select { case out Result { Data : resp } : case ctx . Done () : errc ctx . Err () return } } }() This works. But if you're not careful, it's easy to introduce bugs: Double-close panics — closing a channel twice crashes the pro
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Open Source AI

OpenClaw Changed How We Use AI. KiloClaw Made It Effortless to Get Started
OpenClaw is a powerful open-source AI agent, but self-hosting it is a pain. KiloClaw is OpenClaw fully hosted and managed by Kilo — sign up, connect your chat apps, and your agent is running in about a minute. No Docker, no YAML, no server babysitting. People are using it for personalized morning briefs, inbox digests, auto-building CRMs, browser automation, GitHub triage, and more. Hosting is $8/month with a 7-day free trial, inference runs through Kilo Gateway at zero markup across 500+ models, and it's free for open-source maintainers. Read All

WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
arXiv:2604.02570v1 Announce Type: new Abstract: Singular Value Decomposition (SVD) has become an important technique for reducing the computational burden of Vision Language Models (VLMs), which play a central role in tasks such as image captioning and visual question answering. Although multiple prior works have proposed efficient SVD variants to enable low-rank operations, we find that in practice it remains difficult to achieve substantial latency reduction during model execution. To address this limitation, we introduce a new computational pattern and apply SVD at a finer granularity, enabling real and measurable improvements in execution latency. Furthermore, recognizing that weight elements differ in their relative importance, we adaptively allocate relative importance to each elemen

I tested speculative decoding on my home GPU cluster. Here's why it didn't help.
I spent Saturday night testing n-gram speculative decoding on consumer GPUs. The claim: speculative decoding can speed up LLM inference by 2-3x by predicting future tokens and verifying them in parallel. I wanted to see if that holds up on real hardware running diverse workloads. For the most part, it doesn't. But the journey was worth it, and I caught a benchmarking pitfall that I think a lot of people are falling into. The setup My home lab runs Kubernetes on a machine called Shadowstack. Two NVIDIA RTX 5060 Ti GPUs (16GB VRAM each, 32GB total). I use LLMKube, an open source K8s operator I built, to manage LLM inference workloads with llama.cpp. For this test I deployed two models: Gemma 4 26B-A4B : Google's Mixture of Experts model. 26B total params but only ~4B active per token. Runs a


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!