Altman Declares Transformer's Death: AGI to Arrive in Two Years, Next-Gen Architecture on the Horizon - 36 Kr

GNews AI transformerMarch 16, 20261 min read1 views

Altman Declares Transformer's Death: AGI to Arrive in Two Years, Next-Gen Architecture on the Horizon 36 Kr

Could not retrieve the full article text.

Original source

GNews AI transformer

https://news.google.com/rss/articles/CBMiU0FVX3lxTE5CUmR0Tjk2QmpYZ2NRT0d1MkNvN3oyZFZmR3JzZF9BWnJKWElxYWt3Z04xV1hiZlQ2NWV5SGM3aEFLQ1pIVEdXRHdUNmpxVHJ4OVpN?oc=5

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

transformer

Open Source AIFresh

Fine-tuning Whisper-large-v3 for child reading assessment with numerals and proper names

Hi everyone, I’m working on a reading assessment product for children. Current setup: a child reads a known passage for about 1 minute our system then counts how many words were read correctly right now we use whisper-1 as a baseline we now want to move to an open model and fine-tune Whisper-large-v3 on our own infrastructure This is not a generic ASR task: we always know the reference text in advance our main metric is correct-word-count accuracy against the reference passage The main cases we want to improve through fine-tuning are: numerals / spoken-written forms, for example “three” vs “3” proper names and other rare words child reading speech in general I’d like advice specifically on the fine-tuning strategy for this type of task. My questions: For this use case, what training target

discuss.huggingface.co

2mabout 3 hours ago

ModelsLive

Building a Semantic Research Assistant: A Production RAG Pipeline Over 120 arXiv Papers

How I benchmarked three transformer embedding models against BM25 and discovered a 7.7× answer quality gap — with a live Streamlit demo. Continue reading on Medium »

Medium AI

1m36 minutes ago

ModelsFresh

Grokking Beyond Addition

Hi everyone, I’m excited to share my research paper: “Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers” Paper: https://zenodo.org/records/19256207 This work explores grokking across multiple algebraic structures and shows a clear result: At small model scale (d_model = 64), transformers reliably grok abelian tasks but fail to generalize on non-abelian groups , even with 100% training accuracy. It also highlights: Early circuit formation before generalization Evidence for discrete-log structure in multiplication Strong embedding similarity across different tasks (CKA) I’m opening this project for collaboration and contributions: Scaling experiments (d_model = 128 / 256) Extending to more algebraic structures Interpretability improvements Reproduction an

discuss.huggingface.co

1mabout 10 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 206 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

New AI foundation model aims to speed up drug discovery - Drug Target Review

New AI foundation model aims to speed up drug discovery Drug Target Review

GNews AI drug discovery

1mabout 1 month ago

ModelsLive

Anyone got Gemma 4 26B-A4B running on VLLM?

If yes, which quantized model are you using abe what’s your vllm serve command? I’ve been struggling getting that model up and running on my dgx spark gb10. I tried the intel int4 quant for the 31B and it seems to be working well but way too slow. Anyone have any luck with the 26B? submitted by /u/toughcentaur9018 [link] [comments]

Reddit r/LocalLLaMA

1mabout 1 hour ago

ModelsFresh

be careful on what could run on your gpus fellow cuda llmers

according to this report it seems that by "hammering" bits into dram chips through malicious cuda kernels, it could be possible to compromise systems equipped w/ several nvidia gpus up to excalating unsupervised privileged access to administrative role (root): https://arstechnica.com/security/2026/04/new-rowhammer-attacks-give-complete-control-of-machines-running-nvidia-gpus/ submitted by /u/DevelopmentBorn3978 [link] [comments]

Reddit r/LocalLLaMA

1mabout 8 hours ago

ModelsFresh

🎙️ This week on How I AI: I gave Claude Code our entire codebase. Our customers noticed.

Your weekly listens from How I AI, part of the Lenny s Podcast Network

lennysnewsletter.com

1mabout 2 hours ago