OpenUMA – bring Apple-style unified memory to x86 AI inference (Rust, Linux)

Hacker News AI Topby hamtun24April 3, 20265 min read2 views

🧒Explain Like I'm 5Simple language

Imagine your computer is like a toy box, and inside are two friends: a super-smart brain (that's your computer's CPU) and a super-fast artist (that's your computer's iGPU).

Usually, they have their own separate piles of toys. But this new magic trick, called OpenUMA, lets them share one big pile of toys! 🎉

This means when the artist needs a toy, the brain doesn't have to send it over slowly. They can both grab toys from the same pile super fast! This helps your computer play smart AI games, like making up stories or drawing pictures, much quicker and smoother. It's like they're working together perfectly!

Article URL: https://github.com/hamtun24/openuma Comments URL: https://news.ycombinator.com/item?id=47624865 Points: 1 # Comments: 0

OpenUMA (Unified Memory Abstraction) is a Rust middleware for detecting shared memory hardware (AMD APUs, Intel iGPUs), configuring unified memory pools, and generating optimal configs for AI inference engines.

┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘

Key Features

Hardware Detection - Automatic detection of AMD APUs and Intel iGPUs
Memory Partitioning - Intelligent iGPU/CPU memory allocation for LLM inference
Zero-Copy DMA-BUF - Direct memory transfers between CPU and iGPU
Multiple Engines - Generate configs for llama.cpp, Ollama, and KTransformers
Interactive TUI - Full terminal UI for hardware monitoring and configuration
Benchmarking - Real inference benchmarks with llama.cpp

Supported Hardware

Vendor Series Examples

AMD Zen 3 (Cezanne, Renoir) Ryzen 5 5600G, Ryzen 7 5700G

AMD Zen 4 (Phoenix, Hawk Point) Ryzen 7 7840HS, Ryzen AI 9 HX 370

AMD Zen 5 (Strix Point) Ryzen AI 9 HX 370, Ryzen AI 7 350

Intel Alder Lake, Raptor Lake Core i5-1240P, Core i7-12700H

Intel Meteor Lake, Lunar Lake Core Ultra 5 125H, Core Ultra 7 258V

Quick Start

# Build cargo build --release

# Build cargo build --release

Detect hardware

./target/release/openuma probe

Launch interactive TUI

./target/release/openuma tui

Generate config for llama.cpp

./target/release/openuma configure --engine llamacpp --model model.gguf`

Terminal UI

┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘

Commands

Command Description

openuma probe Detect hardware profile

openuma tui Launch interactive terminal UI

openuma partition --model Show memory partition for model

openuma configure --engine --model Generate engine config

openuma benchmark --model Run inference benchmark

openuma zerocopy --test Test DMA-BUF zero-copy

openuma serve Start REST API server

openuma profile list List known hardware profiles

Supported Inference Engines

llama.cpp

openuma configure --engine llamacpp --model llama3-8b-q4_k_m.gguf

Ollama

openuma configure --engine ollama --model llama3-8b-q4_k_m.gguf

KTransformers (MoE models)

openuma configure --engine ktransformers --model deepseek-v3-q4km.gguf

How It Works

Memory Model

┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘

Key Insight

For LLM inference on APUs:

Attention layers → benefit from iGPU (parallel matrix ops)
MoE expert layers → should stay on CPU (sparse activation)
KV cache → benefits from unified memory zero-copy

Benchmarking

# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf

# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf

Full multi-backend comparison

openuma benchmark --model model.gguf --full

╔════════════════════════════════════════════════════════════════════╗ ║ OpenUMA Benchmark Report ║ ╠════════════════════════════════════════════════════════════════════╣ ║ Best Backend: vulkan (12.5 t/s) ║ Average TPS: 8.2 ╠════════════════════════════════════════════════════════════════════╣ ║ Test 1: model.gguf [vulkan] ║ └── 12.5 tokens/sec | 8000 ms ║ Test 2: model.gguf [opencl] ║ └── 10.2 tokens/sec | 9800 ms ║ Test 3: model.gguf [cpu] ║ └── 4.8 tokens/sec | 20800 ms ╠════════════════════════════════════════════════════════════════════╣ ║ Recommendations: ║ • Best performing backend: vulkan (~12.5 tokens/sec) ║ • GPU acceleration provides 2.6x speedup over CPU ╚════════════════════════════════════════════════════════════════════╝`

Architecture

openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profiles

openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profiles

Installation

Option A — Download Binary (Linux x86_64)

# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \  | tar xz

# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \  | tar xz

Run it

./openuma probe`

Option B — Build from Source

# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe

# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe

System Requirements

Requirement Notes

OS Linux (kernel 5.10+)

CPU Any x86_64 with AMD APU or Intel iGPU

RAM 16GB minimum, 32GB recommended

Optional llama.cpp in PATH for real benchmarks

Optional Vulkan drivers for iGPU acceleration

Install Vulkan Drivers (if missing)

# AMD iGPU sudo apt install mesa-vulkan-drivers

# AMD iGPU sudo apt install mesa-vulkan-drivers

Intel iGPU

sudo apt install intel-media-va-driver mesa-vulkan-drivers`

Install llama.cpp (optional)

git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"

git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"

Real World Results

OpenUMA's value is in the configuration it generates — not just detecting hardware, but knowing the exact flags that extract maximum performance from it.

Example: AMD Ryzen 5 5600G + 32GB DDR4

Setup Command Tokens/sec

llama.cpp defaults llama-cli -m model.gguf ~3.1 t/s

OpenUMA-configured openuma configure --engine llamacpp --model model.gguf ~7.2 t/s

Improvement

+132%

What OpenUMA changed:

Enabled Vulkan backend (default is CPU)
Set correct --n-gpu-layers for available shared VRAM
Configured dual-channel memory-aware thread count
Disabled mmap in favor of zero-copy DMA-BUF path

Note: Numbers above are estimates from the profile database for this hardware. Run openuma benchmark --model your-model.gguf --full on your machine to get real measured numbers and contribute them to the community database.

Community Benchmarks

This section will grow as users submit hardware profiles. Submit your results →

Contributing

Contributions welcome! Open issues and pull requests.

License

MIT License - see LICENSE for details.

OpenUMA - Making every x86 machine a first-class AI citizen.

Original source

Hacker News AI Top

https://github.com/hamtun24/openuma

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

github

Open Source AILive

Swift package AI inference engine generated from Rust crate

Article URL: https://github.com/ondeinference/onde-swift Comments URL: https://news.ycombinator.com/item?id=47636762 Points: 2 # Comments: 0

Hacker News AI Top

2mabout 1 hour ago

Market NewsFresh

Quoting Kyle Daigle

[GitHub] platform activity is surging. There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won't.) GitHub Actions has grown from 500M minutes/week in 2023 to 1B minutes/week in 2025, and now 2.1B minutes so far this week. Kyle Daigle , COO, GitHub Tags: github , github-actions

Simon Willison Blog

1mabout 6 hours ago

Open Source AIFresh

trunk/6c6e22937db24fe8c7b74452a6d3630c65d1c8b8: Revert "Remove TRITON=yes from CPU-only GCC11 docker configs (#179314)"

This reverts commit 670be7c . Reverted #179314 on behalf of https://github.com/izaitsevfb due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ( comment )

PyTorch Releases

1mabout 5 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 125 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

ProductsLive

Anthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demand

Anthropic is cutting off Claude usage through external tools like OpenClaw for subscription customers. The decision exposes a core problem in the AI industry: flat-rate pricing and agent-driven nonstop usage don't mix. The article Anthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demand appeared first on The Decoder .

The Decoder

1mabout 1 hour ago

ProductsLive

Anthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employees

Anthropic is paying 400 million dollars for an eight-month-old biotech startup with fewer than ten employees. The investor walks away with a 38,513 percent return. The article Anthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employees appeared first on The Decoder .

The Decoder

1m22 minutes ago

ProductsRecent

As a Tool of Productivity, AI Can Make the Effort to Learn More Meaningful - EdSurge

As a Tool of Productivity, AI Can Make the Effort to Learn More Meaningful EdSurge

GNews AI education

1mabout 16 hours ago

Products

Zimbabwe Approves National Artificial Intelligence Strategy - Bantu Gazette

Zimbabwe Approves National Artificial Intelligence Strategy Bantu Gazette

Google News - AI Zimbabwe

1m6 months ago