Altman Declares Transformer's Death: AGI to Arrive in Two Years, Next-Gen Architecture on the Horizon - 36 Kr
Altman Declares Transformer's Death: AGI to Arrive in Two Years, Next-Gen Architecture on the Horizon 36 Kr
Could not retrieve the full article text.
Read on GNews AI transformer →GNews AI transformer
https://news.google.com/rss/articles/CBMiU0FVX3lxTE5CUmR0Tjk2QmpYZ2NRT0d1MkNvN3oyZFZmR3JzZF9BWnJKWElxYWt3Z04xV1hiZlQ2NWV5SGM3aEFLQ1pIVEdXRHdUNmpxVHJ4OVpN?oc=5Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
transformer
Fine-tuning Whisper-large-v3 for child reading assessment with numerals and proper names
Hi everyone, I’m working on a reading assessment product for children. Current setup: a child reads a known passage for about 1 minute our system then counts how many words were read correctly right now we use whisper-1 as a baseline we now want to move to an open model and fine-tune Whisper-large-v3 on our own infrastructure This is not a generic ASR task: we always know the reference text in advance our main metric is correct-word-count accuracy against the reference passage The main cases we want to improve through fine-tuning are: numerals / spoken-written forms, for example “three” vs “3” proper names and other rare words child reading speech in general I’d like advice specifically on the fine-tuning strategy for this type of task. My questions: For this use case, what training target

Grokking Beyond Addition
Hi everyone, I’m excited to share my research paper: “Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers” Paper: https://zenodo.org/records/19256207 This work explores grokking across multiple algebraic structures and shows a clear result: At small model scale (d_model = 64), transformers reliably grok abelian tasks but fail to generalize on non-abelian groups , even with 100% training accuracy. It also highlights: Early circuit formation before generalization Evidence for discrete-log structure in multiplication Strong embedding similarity across different tasks (CKA) I’m opening this project for collaboration and contributions: Scaling experiments (d_model = 128 / 256) Extending to more algebraic structures Interpretability improvements Reproduction an
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Anyone got Gemma 4 26B-A4B running on VLLM?
If yes, which quantized model are you using abe what’s your vllm serve command? I’ve been struggling getting that model up and running on my dgx spark gb10. I tried the intel int4 quant for the 31B and it seems to be working well but way too slow. Anyone have any luck with the 26B? submitted by /u/toughcentaur9018 [link] [comments]

be careful on what could run on your gpus fellow cuda llmers
according to this report it seems that by "hammering" bits into dram chips through malicious cuda kernels, it could be possible to compromise systems equipped w/ several nvidia gpus up to excalating unsupervised privileged access to administrative role (root): https://arstechnica.com/security/2026/04/new-rowhammer-attacks-give-complete-control-of-machines-running-nvidia-gpus/ submitted by /u/DevelopmentBorn3978 [link] [comments]



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!