The Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI - TheSequence
The Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI TheSequence
Could not retrieve the full article text.
Read on Google News: Machine Learning →Google News: Machine Learning
https://news.google.com/rss/articles/CBMifkFVX3lxTE50LXkxTV9NU0R6ZG15WWxGUXdvdGdJazN2V0RoVWN1emdkWFJnZ3R6SUFWUFVNN3RZMUZKOVJ4R2Z5a2h1MTVQUlUzbWw2bFRLdjZzd1VIekZFdGpGSDBRbDBMZ0t2UjNQcHZ1Z3o1VnZsc19uQmk2T1pDSVR2UQ?oc=5Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
transformerpaper
New research could empower people without AI expertise to help create trustworthy AI applications
Involving people without AI expertise in the development and evaluation of artificial intelligence applications could help create better, fairer, and more trustworthy automated decision-making systems, new research suggests. After enlisting members of the public to evaluate the potential impacts of two real-world applications, researchers from UK universities will present a paper at a major international computing conference which suggests how "participatory AI auditing" could improve AI decision-making in the future.
v0.16.0
Axolotl v0.16.0 Release Notes We’re very excited to share this new packed release. We had ~80 new commits since v0.15.0 (March 6, 2026). Highlights Async GRPO — Asynchronous Reinforcement Learning Training ( #3486 ) Full support for asynchronous Group Relative Policy Optimization with vLLM integration. Includes async data producer with replay buffer, streaming partial-batch training, native LoRA weight sync to vLLM, and FP8 compatibility. Supports multi-GPU via FSDP1/FSDP2 and DeepSpeed ZeRO-3. Achieves up to 58% faster step times (1.59s/step vs 3.79s baseline on Qwen2-0.5B). Optimization Step Time Improvement Baseline 3.79s — + Batched weight sync 2.52s 34% faster + Liger kernel fusion 2.01s 47% faster + Streaming partial batch 1.79s 53% faster + Element chunking + re-roll fix (500 steps)
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize - VentureBeat
Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize VentureBeat

Google strongly implies the existence of large Gemma 4 models
In the huggingface card: Increased Context Window – The small models feature a 128K context window, while the medium models support 256K. Small and medium... implying at least one large model! 124B confirmed :P submitted by /u/coder543 [link] [comments]




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!