The Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI
A chat about decentralized, private AI and the evolution of frontier models.
Today, The Sequence welcomes Illia Polosukhin. As a key figure at the inception of the transformer model, Illia joins us to explore his new frontier: building a private, decentralized AI ecosystem.
Can you introduce yourself and tell us about your journey from academy to Google to NEAR AI?
I grew up in Ukraine and did programming competitions as a teenager. I first got interested in AI after I saw the movie “Artificial Intelligence” in 2001 and did freelance software development work through university. After I got my master’s I became a machine learning researcher in the US and ended up doing natural language research at Google, working on TensorFlow and eventually coauthoring “Attention Is All You Need” with some colleagues.
I left Google to found a startup, which became NEAR AI, with Alex Skidanov––we wanted to build apps using natural language. We were paying student researchers internationally for training and many of them didn’t have bank accounts. This was 2017-18 in San Francisco and everyone was talking about blockchains, so we looked into paying the researchers in crypto. We found that the fees were astronomical and there were major scaling issues. The state of AI tech at the time was slowing us down, so we decided to try to build a blockchain that would actually scale. We thought it would take six months and then we’d go back to AI. In fact, it took a couple of years, and the NEAR MainNet launched in October of 2020. To this day, the network has never had a second of downtime. In 2024, we brought back NEAR AI to ensure that users can control their own assets and data and we’ve now launched several products including NEAR AI Cloud, IronClaw, and confidential compute.
Take us back to 2017. When you were co-authoring Attention Is All You Need, the primary objective was essentially to solve machine translation without the sequential bottlenecks of RNNs. If we look at the landscape today—where scaling laws are holding up and attention mechanisms are eating everything from text to vision to robotics—what is the most surprising emergent property of the architecture that you absolutely did not foresee back then?
At the time, I definitely wasn’t thinking about other modalities. I was focused on text as a main avenue for capturing knowledge and reasoning and question answering as a way to test for it.
It is still surprising to me that the scale of data and compute is the main ingredient we need to continue increasing intelligence. I and a lot of other researchers are still looking for something that is more about how the learning process itself is happening and ways to massively improve the data efficiency of this process.
I have this mental model that “Decentralized AI” right now is a bit of a chaotic latent space—lots of noise, some signal. From a pure engineering and infrastructure standpoint, what is actually working in decentralized AI today? Conversely, what approaches are fundamentally broken or hitting a wall because of compute, latency, or consensus bottlenecks?
Things that definitely work today: data collection and inference compute aggregation. Various tooling like verifiable benchmarks and incentivizing RL environments & other components of the stack also provide value. Plus, we see early signs of interesting phenomena from autoresearch-like initiatives that have emerged just in the past 10 days.
There are a few initiatives to train a model across a distributed cluster: Prime Intellect, Nous, and Pluralis are more prominent ones. One practical challenge for these is that most people don’t care if you can train a model in a distributed way. They actually just want the best model. Even important properties of open source and verifiability haven’t been in enough demand when compared to raw performance. And we know this is hard even with large centralized clusters given competition between AI labs.
We spend so much time obsessing over monolithic, trillion-parameter frontier models, but there is an accelerating parallel universe of small, heavily optimized models. Does the future of AI look like a few massive, centralized oracles, or are we moving toward a decentralized swarm of highly specialized, 8B-parameter-class models running locally and coordinating across peer-to-peer networks?
We already see Mixture-of-Expert models to be preferred over monolithic in open source while offering comparative performance with faster inference.
I do think smaller models are going to continue to improve intelligence by enabling local and faster intelligence for agents. Specialization can be also achieved by agentic systems as well, by having access to specialized tools and data.
Privacy feels like the massive, looming bottleneck for the next generation of AI applications. If we want to build robust Private AI, what does the actual infrastructure stack look like under the hood? Are we mostly relying on TEEs (Trusted Execution Environments) for the foreseeable future, or are you seeing practical, scalable breakthroughs in FHE (Fully Homomorphic Encryption) or MPC for inference and training?
Currently TEEs are the only feasible solution. We use MPC as a component of our TEE setup to ensure there is a robust key generation process that is not dependent on an individual machine or specific hardware provider. This setup also allows users to control their encryption keys. FHE and similar approaches are way too slow from a performance perspective.I think that will be true for some time.
Let’s double-click on that privacy aspect across the actual model lifecycle. We essentially have three distinct battlegrounds: private pretraining, private post-training (like SFT or RLHF on proprietary data), and private inference. Where does NEAR AI fit into this pipeline?
At NEAR AI we are building confidential computing infrastructure that should work for the full workflow. That said, currently there are limitations to how confidential computing works for multi-node systems. Which means that it’s very ineffective to do pretraining. We have started by offering confidential inference because it’s the most direct and has immediate value for customers, from enterprises to privacy-focused prosumers.
Confidential inference also is not limited to open weight models. Our platform can host closed weight models in such a way that neither hardware providers or consumers are accessing the weights directly, while the model builders don’t get access to consumer data. This approach enables high-level privacy guarantees on both sides.
Fine tuning would be the next step, and pretraining beyond that. The new generation of NVIDIA hardware will support multi-node confidentiality for the whole cluster.
Let’s dig into some of your recent releases, particularly NEAR Intents. The shift from imperative transaction execution to declarative “intents” is conceptually elegant, but I’m curious about the mechanics. How do these intents actually route, match, and settle across heterogeneous networks? How does this primitive fundamentally change the UX and the way developers should be thinking about cross-chain interactions?
An intent is a declaration of a desired outcome with constraints. It can be as simple as wanting to receive some asset on a given chain or as complex as building a house. The intent is then broadcast through a matching system to find who can actually do it and what would be their cost/requirements to do so.
NEAR Intents started with doing crypto to crypto across chains. Building on top of our other cross-chain tech, we enabled an experience that unifies liquidity across chains and assets. We already see it changing how wallets operate: for example Zodl, the main Zcash wallet, has implemented CrossPay, which allows their users to pay into any address on any chain, in any asset a merchant wants to receive. It automatically converts the user’s ZEC into the merchant’s desired asset and delivers using intents.
A few weeks ago we also launched an expansion called Agent Market where users and agents can post natural language requests for various kinds of information or work. Those are matched to AI agents for fulfillment. This approach doesn’t just transform the cross-chain landscape, but all kinds of commerce. Most commerce will be happening via agents, not even as a UX that users have to interact with. An entire complex set of processes is rapidly compressing.
The idea of autonomous agents doing complex, multi-step reasoning is fascinating, but they eventually need to interact with the real world—which means they need financial rails. It seems inevitable that agents will need wallets. What is the specific role of blockchains in multi-agent workflows?
Blockchain rails are the easiest for agents to use because they are natively designed for machines. They also already have pretty well figured out the various challenges the AI space is only just now trying to learn about: Sybil resistance, security and reputation, guardrails and multi-party confirmations, etc.
We are squeezing an unbelievable amount of capability out of predicting the next token, but we are also seeing the limits of standard autoregressive generation. Are Transformers the engine that takes us all the way to ASI, or do we need a fundamental paradigm shift—like state space models, persistent world models, or native neurosymbolic reasoning—to break through the current reasoning walls?
I do think there is going to be a different training procedure beyond the next token. We already saw this with GRPO. At the end it’s all a credit assignment problem: what part of the network/weights have contributed to making correct/incorrect predictions?
I think from the core neutral network component, there are probably going to be iterations, but the core architecture seems generic enough to learn anything you throw at it. There was recently an example of learning a fully deterministic WASM interpreter directly into weights. So you can actually get neurosymbolic reasoning directly in the weights itself, we just don’t know how to train that.
We can’t really talk about local, agentic AI right now without mentioning OpenClaw. Despite the obvious security growing pains we’ve seen over the last few weeks, do you think the OpenClaw moment has fundamentally shifted the industry’s respect for small, locally-hosted AI? Are people waking up to the fact that you don’t always need a massive frontier model if you have the right execution environment and deep personal context?
There is definitely a growing understanding that the harness is as important as the model. There was recently an example that improved a smaller model’s performance on coding tasks by 10x by just changing how it was editing files. I think the agent harness will change how we interact with computing in general.
Give us one prediction for the AI landscape in 2026 that 90% of our readers would likely disagree with.
Same as the last question, the harness is a game changer. I think privacy and user ownership also become more prominent when you talk about an AI assistant that will know everything about you and work alongside you every day.
Let’s talk about the broader industry narrative. Is Web3 dead? I mean Web3 in the sense of the a16z Read-Write-Own thesis. It feels like generative AI has completely sucked the oxygen out of the room over the last two years. How does the crypto ecosystem evolve from here—does it need to become an invisible backend infrastructure layer, or is there a new paradigm emerging in the shadow of the AI boom?
Web3 has more adoption than ever. Stablecoins and asset tokenization are growing massively and blockchains are being used as infrastructure for payments. This is the “own” part of the thesis.
I always thought blockchain should be an invisible layer for most users. That’s how we have always designed NEAR–we called it “chain abstraction.” The goal is to make the blockchain and infrastructure disappear so it’s just working under the hood with easy interfaces for users. Those users could be mainstream users of a Web2 app, or a degen who wants to interact with 35+ chains. If you haven’t seen it yet, our recently launched near.com is an example of such an experience. All assets from all chains are all in one place with one account, and it supports confidential balances, transfers, and trading.
And importantly, as blockchain becomes this invisible layer, agents will choose it as the easiest way to do trustless interactions with other agents and organizations. So agentic commerce will run on blockchains.
To wrap things up, I love to ask for a truly contrarian take. Given the sheer velocity of the space right now, give us one concrete prediction for the AI landscape in 2026 that 90% of our readers—who are deeply embedded in the technical weeds of ML—would likely strongly disagree with today.
One contrarian take I have is that I think we will see governments making nationalization proposals for AI labs. The only way to avoid that is if we have ensured privacy of AI and provide decentralized access. This is also probably a contrarian take for this newsletter’s audience: in one year, I believe agents are going to be facilitating global trade and paying via stablecoins. Commerce will be completely different.
No posts
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltransformerpaper
Gemma 4 26B A4B Single Page ASCII Chatbot Design
Built a single chatbot HTML page using Gemma 4 26B A4B running locally sharded between my 7900 XT and 3060 Ti with 32K context window at 50-65 t/s. Connects to LM Studio's API with full streaming, Markdown rendering, model selector, 6 parameter sliders, message editing with history branching, regenerate, abort, and system prompt support. Claude helped fix two DOM bugs that Gemma couldn't. Everything else was Gemma 4. GitHub: https://github.com/Shoggoth43/Gemma-4-26B-A4B-Generations submitted by /u/Reaper_9382 [link] [comments]

Best LLM for Mac Mini M4 Pro (64GB RAM) – Focus on Agents, RAG, and Automation?
Hi everyone! I just got my hands on a Mac Mini M4 Pro with 64GB . My goal is to replace ChatGPT on my phone and desktop with a local setup. I’m specifically looking for models that excel at: Web Search RAG: High context window and accuracy for retrieving info. AI Agents: Good instruction following for multi-step tasks. Automation: Reliable tool-calling and JSON output for process automation. Mobile Access: I plan to use it as a backend for my phone (via Tailscale/OpenWebUI). What would be the sweet spot model for this hardware that feels snappy but remains smart enough for complex agents? Also, which backend would you recommend for the best performance on M4 Pro? (Ollama, LM Studio, or maybe vLLM/MLX?) Thanks! submitted by /u/farmatex [link] [comments]

Local Claude Code with Qwen3.5 27B
after long research, finding best alternative for Using a local LLM in OpenCode with llama.cpp to use totally local environment for coding tasks I found this article How to connect Claude Code CLI to a local llama.cpp server how to disable telemetry and make claude code totally offline. model used - Qwen3.5 27B Quant used - unsloth/UD-Q4_K_XL inference engine - llama.cpp Operating Systems - Arch Linux Hardware - Strix Halo I have separated my setups into sessions to run iterative cycle how I managed to improve CC (claude code) and llama.cpp model parameters. First Session as guide stated, I used option 1 to disable telemetry ~/.bashrc config; export ANTHROPIC_BASE_URL="http://127.0.0.1:8001" export ANTHROPIC_API_KEY="not-set" export ANTHROPIC_AUTH_TOKEN="not-set" export CLAUDE_CODE_DISABLE
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Gemma 4 26B A4B Single Page ASCII Chatbot Design
Built a single chatbot HTML page using Gemma 4 26B A4B running locally sharded between my 7900 XT and 3060 Ti with 32K context window at 50-65 t/s. Connects to LM Studio's API with full streaming, Markdown rendering, model selector, 6 parameter sliders, message editing with history branching, regenerate, abort, and system prompt support. Claude helped fix two DOM bugs that Gemma couldn't. Everything else was Gemma 4. GitHub: https://github.com/Shoggoth43/Gemma-4-26B-A4B-Generations submitted by /u/Reaper_9382 [link] [comments]

Best LLM for Mac Mini M4 Pro (64GB RAM) – Focus on Agents, RAG, and Automation?
Hi everyone! I just got my hands on a Mac Mini M4 Pro with 64GB . My goal is to replace ChatGPT on my phone and desktop with a local setup. I’m specifically looking for models that excel at: Web Search RAG: High context window and accuracy for retrieving info. AI Agents: Good instruction following for multi-step tasks. Automation: Reliable tool-calling and JSON output for process automation. Mobile Access: I plan to use it as a backend for my phone (via Tailscale/OpenWebUI). What would be the sweet spot model for this hardware that feels snappy but remains smart enough for complex agents? Also, which backend would you recommend for the best performance on M4 Pro? (Ollama, LM Studio, or maybe vLLM/MLX?) Thanks! submitted by /u/farmatex [link] [comments]

Local Claude Code with Qwen3.5 27B
after long research, finding best alternative for Using a local LLM in OpenCode with llama.cpp to use totally local environment for coding tasks I found this article How to connect Claude Code CLI to a local llama.cpp server how to disable telemetry and make claude code totally offline. model used - Qwen3.5 27B Quant used - unsloth/UD-Q4_K_XL inference engine - llama.cpp Operating Systems - Arch Linux Hardware - Strix Halo I have separated my setups into sessions to run iterative cycle how I managed to improve CC (claude code) and llama.cpp model parameters. First Session as guide stated, I used option 1 to disable telemetry ~/.bashrc config; export ANTHROPIC_BASE_URL="http://127.0.0.1:8001" export ANTHROPIC_API_KEY="not-set" export ANTHROPIC_AUTH_TOKEN="not-set" export CLAUDE_CODE_DISABLE



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!