Running local models on Macs gets faster with Ollama's MLX support

Ars Technica AIMarch 31, 20261 min read0 views

Ollama, a runtime system for operating large language models on a local computer, has introduced support for Apple’s open source MLX framework for machine learning. Additionally, Ollama says it has improved caching performance and now supports Nvidia’s NVFP4 format for model compression, making for much more efficient memory usage in certain models.

Combined, these developments promise significantly improved performance on Macs with Apple Silicon chips (M1 or later)—and the timing couldn’t be better, as local models are starting to gain steam in ways they haven’t before outside researcher and hobbyist communities.

The recent runaway success of OpenClaw—which raced its way to over 300,000 stars on GitHub, made headlines with experiments like Moltbook and became an obsession in China in particular—has many people experimenting with running models on their machines.

As developers get frustrated with rate limits and the high cost of top-tier subscriptions to tools like Claude Code or ChatGPT Codex, experimentation with local coding models has heated up. (Ollama also expanded Visual Studio Code integration recently.)

The new support is available in preview (in Ollama 0.19) and currently supports only one model—the 35 billion-parameter variant of Alibaba’s Qwen3.5. Hardware requirements are intense by normal users’ standards. Users need an Apple Silicon-equipped Mac, sure, but they also need at least 32GB of RAM, according to Ollama’s announcement.

Original source

Ars Technica AI

https://arstechnica.com/apple/2026/03/running-local-models-on-macs-gets-faster-with-ollamas-mlx-support/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelollama

ProductsLive

How to Add Structured Logging to Node.js APIs with Pino 9 + OpenTelemetry (2026 Guide)

Logging is the first thing you reach for when something breaks in production. Yet most Node.js APIs still write plain-text <code>console.log</code> statements that are useless in a distributed system. In 2026, structured JSON logging correlated with distributed traces is the baseline for any serious API. This guide shows you exactly how to wire up Pino 9 + OpenTelemetry so that every log line carries a <code>traceId</code> and <code>spanId</code>, making root-cause analysis a matter of seconds rather than hours. <h2> Why <code>console.log</code> Kills You at Scale </h2> Before diving in, let's be concrete about the problem. A log like this: <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>[2026-04-01T08:00:12.345Z] ERROR:

DEV Community

18m27 minutes ago

ProductsLive

Why Most Agencies Deploy WordPress Multisite for the Wrong Reasons

(Originally published on <a href="https://fachremyputra.com" rel="noopener noreferrer">fachremyputra.com</a>) Managing fifty separate WordPress instances is an operational nightmare. Updating core files, testing plugin compatibilities, and syncing theme deployments across fragmented server environments drains engineering hours and bleeds profit. The promised utopia is a Multisite network where you manage a single codebase, update a plugin once, and watch the entire network reflect the change instantly. I will say it clearly: most agencies push Multisite for the wrong reasons. They trap enterprise clients in a monolithic database nightmare simply because the agency wanted an easier time updating plugins. We build architecture for business ROI, not developer conveni

DEV Community

5m25 minutes ago

ProductsLive

AgentX-Phase2: 49-Model Byzantine FBA Consensus — Building Cool Agents that Modernize COBOL to Rust

<h1> AgentX-Phase2: 49-Model Byzantine FBA Consensus </h1> <h2> Building Cool Agents that Modernize COBOL to Rust </h2> Author: Venkateshwar Rao Nagala | Founder & CEO Company: For the Cloud By the Cloud | Hyderabad, India Submission: Solo.io MCP_HACK//26 — Building Cool Agents GitHub: <a href="https://github.com/tenalirama2005/AgentX-Phase2" rel="noopener noreferrer">https://github.com/tenalirama2005/AgentX-Phase2</a> Demo Video: <a href="https://youtu.be/5_FJA_WUlXQ" rel="noopener noreferrer">https://youtu.be/5_FJA_WUlXQ</a> Full Demo (4:44): <a href="https://youtu.be/k4Xzbp-M2fc" rel="noopener noreferrer">https://youtu.be/k4Xzbp-M2fc</a> </

DEV Community

8m24 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 163 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

From Kindergarten to Career Change: How CMU Designs Education for a Lifetime

<img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/250516B_Surprise_EM_053.jpg.webp?itok=Ipq3jUzk" width="900" height="508" alt="Sharon Carver with students"> CMU’s learning initiatives are shaped by research on how people learn, rather than by any single discipline. That approach shows up in K–12 classrooms, college courses, and workforce training programs, where learning science and AI are used to support evolving educational needs.

Carnegie Mellon News

1m2 months ago

ModelsLive

Build an End-to-End RAG Pipeline for LLM Applications

This article was originally written by Shaoni Mukherjee (Technical Writer) <a href="https://www.digitalocean.com/resources/articles/large-language-models" rel="noopener noreferrer">Large language models</a> have transformed the way we build intelligent applications. <a href="https://www.digitalocean.com/products/gradient/platform" rel="noopener noreferrer">Generative AI Models</a> can summarize documents, generate code, and answer complex questions. However, they still face a major limitation: they cannot access private or continuously changing knowledge unless that information is incorporated into their training data. Retrieval-Augmented Generation (RAG) addresses this limitation by combining information retrieval systems with generative AI models. Instead of rel

DEV Community

16m23 minutes ago

ModelsLive

I Created a SQL Injection Challenge… And AI Failed to Catch the Biggest Security Flaw 💥

I recently designed a simple SQL challenge. Nothing fancy. Just a login system: Username Password Basic query validation Seemed straightforward, right? So I decided to test it with AI. I gave the same problem to multiple models. Each one confidently generated a solution. Each one looked clean. Each one worked. But there was one problem. 🚨 Every single solution was vulnerable to SQL Injection. Here’s what happened: Most models generated queries like: SELECT * FROM users WHERE username = 'input' AND password = 'input'; Looks fine at first glance. But no parameterization. No input sanitization. No prepared statements. Which means… A simple input like: <

DEV Community

2mabout 1 hour ago

ModelsLive

From one model to seven — what it took to make TurboQuant model-portable

A KV cache compression plugin that only works on one model is a demo, not a tool. turboquant-vllm v1.0.0 shipped four days ago with one validated architecture: Molmo2. v1.3.0 validates seven — Llama 3.1, Mistral 7B, Qwen2.5, Phi-3-mini, Phi-4, Gemma-2, and Gemma-3. The path between those two points was more interesting than the destination. <h2> What Changed </h2> Fused paged kernels (v1.2.0). The original architecture decompressed KV cache from TQ4 to FP16 in HBM, then ran standard attention on the result. The new fused kernel reads compressed blocks directly from vLLM's page table, decompresses in SRAM, and computes attention in a single pass. HBM traffic: 1,160 → 136 bytes per token. <div class="highlight js-code-highlight"> <pre class="highlight pyth

DEV Community

3mabout 1 hour ago

Running local models on Macs gets faster with Ollama&#039;s MLX support

Daily AI Digest

More about

How to Add Structured Logging to Node.js APIs with Pino 9 + OpenTelemetry (2026 Guide)

Why Most Agencies Deploy WordPress Multisite for the Wrong Reasons

AgentX-Phase2: 49-Model Byzantine FBA Consensus — Building Cool Agents that Modernize COBOL to Rust

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

From Kindergarten to Career Change: How CMU Designs Education for a Lifetime

Build an End-to-End RAG Pipeline for LLM Applications

I Created a SQL Injection Challenge… And AI Failed to Catch the Biggest Security Flaw 💥

From one model to seven — what it took to make TurboQuant model-portable

Running local models on Macs gets faster with Ollama's MLX support