Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAn I/O psychologist's rules for stopping AI agents from cutting cornersHacker News AI TopMeshLedger – AI agents hire and pay each other through on-chain escrowHacker News AI TopOpenAI shifts to usage-based pricing for Codex in ChatGPT business plansThe DecoderDenseNet Paper Walkthrough: All ConnectedTowards Data ScienceShow HN: Ckpt – Automatic checkpoints for AI coding sessions with per-step undoHacker News AI TopThe Shell Is the Most Underrated Interface in AIHacker News AI TopOrganization firewall settings for Copilot cloud agentGitHub Copilot ChangelogNations priced out of Big AI are building with frugal modelsHacker News AI TopMicrosoft unveils $10 billion Japan AI investment, Sakura Intern - GuruFocusGNews AI Japantrunk/9557db4da58adc85b48136aad2e383f7ab6e456e: [xpu] Add deterministic matmul test to test_gemm.py (#179004)PyTorch ReleasesThe Facebook insider building content moderation for the AI eraTechCrunchThe Coding Agent Multiverse of MadnessAI YouTube Channel 24Black Hat USADark ReadingBlack Hat AsiaAI BusinessAn I/O psychologist's rules for stopping AI agents from cutting cornersHacker News AI TopMeshLedger – AI agents hire and pay each other through on-chain escrowHacker News AI TopOpenAI shifts to usage-based pricing for Codex in ChatGPT business plansThe DecoderDenseNet Paper Walkthrough: All ConnectedTowards Data ScienceShow HN: Ckpt – Automatic checkpoints for AI coding sessions with per-step undoHacker News AI TopThe Shell Is the Most Underrated Interface in AIHacker News AI TopOrganization firewall settings for Copilot cloud agentGitHub Copilot ChangelogNations priced out of Big AI are building with frugal modelsHacker News AI TopMicrosoft unveils $10 billion Japan AI investment, Sakura Intern - GuruFocusGNews AI Japantrunk/9557db4da58adc85b48136aad2e383f7ab6e456e: [xpu] Add deterministic matmul test to test_gemm.py (#179004)PyTorch ReleasesThe Facebook insider building content moderation for the AI eraTechCrunchThe Coding Agent Multiverse of MadnessAI YouTube Channel 24
AI NEWS HUBbyEIGENVECTOREigenvector

Bring state-of-the-art agentic skills to the edge with Gemma 4

Google Developers BlogApril 2, 20261 min read0 views
Source Quiz

Google DeepMind has launched Gemma 4, a family of state-of-the-art open models designed to enable multi-step planning and autonomous agentic workflows directly on-device. The release includes the Google AI Edge Gallery for experimenting with "Agent Skills" and the LiteRT-LM library, which offers a significant speed boost and structured output for developers. Available under an Apache 2.0 license, Gemma 4 supports over 140 languages and is compatible with a wide range of hardware, including mobile devices, desktops, and IoT platforms like Raspberry Pi.

APRIL 2, 2026

Today, Google DeepMind launched Gemma 4, a family of state-of-the-art open models that redefine what is possible on your own hardware. Now available under the Apache 2.0 license, Gemma 4 gives developers a powerful toolkit for on-device AI development. With Gemma 4, you can now go beyond chatbots to build agents and autonomous AI use cases running directly on-device. Gemma 4 enables multi-step planning, autonomous action, offline code generation, and even audio-visual processing, all without specialized fine-tuning. It’s also built for a global audience with support for over 140 languages.

Sorry, your browser doesn't support playback for this video

Gemma 4 enables visual processing and support in >140 languages

We are excited to announce that you can experience Gemma 4’s expansive capabilities on the edge starting today! Access Android's built-in Gemma 4 model through the new AICore Developer Preview, or leverage Google AI Edge to build agentic, in-app experiences across mobile, desktop, and edge devices.

In this post, we’ll show you how to get started with Google AI Edge using both Google AI Edge Gallery and LiteRT-LM.

Discover Agent Skills with Gemma 4 in Google AI Edge Gallery

Google AI Edge Gallery, available on iOS and Android, allows you to build and experiment with AI experiences that run entirely on-device. Today, we are thrilled to announce the launch of Agent Skills, one of the first applications to run multi-step, autonomous agentic workflows entirely on-device. Powered by Gemma 4, Agent Skills can:

  • Augment the knowledge base: Gemma 4 can access the information beyond its initial training data using skills to enable agentic enrichment type experiences. For example, you can build a skill to query Wikipedia, allowing the agent to query and respond to any encyclopedic question.

Sorry, your browser doesn't support playback for this video

Query Wikipedia or other knowledge sources

  • Produce rich, interactive content: Transform paragraphs or videos into concise summaries or flashcards for studying, or transform data into interactive visualizations or graphs. For example, you can create a skill that automatically summarizes and displays trends in hours of sleep and moods per day based on user speech input:

Sorry, your browser doesn't support playback for this video

Create graphs, flashcards, and other visualizations

  • Expand Gemma 4's core capabilities: Integrate with other models, such as text-to-speech, image generation, or music synthesis. For instance, you can utilize skills to pair photos with music that perfectly matches the mood.

Sorry, your browser doesn't support playback for this video

Integrate with other models to synthesize music and understand images

  • Create comprehensive end-to-end experiences: Rather than navigating multiple apps, users can manage complex workflows and build their own applications entirely through conversation with Gemma 4. To illustrate this, we built a working app that describes and plays the vocal calls of animals.

Sorry, your browser doesn't support playback for this video

Build multi-step workflows and end-to-end experiences

To experience the Gemma 4 E2B and E4B models in action, check out the Google AI Edge Gallery app today. Within the app, it’s easy to start experimenting and creating your own skills with our guide. We can’t wait to see what you build and share your skills in the Github Discussion!

Leverage Gemma 4 across devices with LiteRT-LM

For developers who are interested in deploying Gemma 4 in-app or across a broader range of devices, LiteRT-LM provides stellar performance with reach across the entire hardware spectrum. LiteRT-LM adds GenAI specific libraries on top of LiteRT, which is already trusted by millions of Android and edge developers with its high-performance libraries XNNPack and ML Drift. LiteRT-LM builds on this stack and enhances model performance with the following new features:

  • Minimal Memory footprint: Run Gemma 4 E2B using <1.5GB memory on some devices thanks to LiteRT’s support for 2-bit and 4-bit weights along with memory-mapped per-layer embeddings
  • Constrained decoding: Get structured, predictable outputs every time, ensuring your AI-driven apps and tool-calling scripts remain reliable in production.
  • Dynamic context: Flexibility to handle single models across CPUs and GPUs with dynamic context lengths, allowing you to take full advantage of the Gemma 4 128K context window.

To support the extended context lengths required by agentic use cases, LiteRT-LM leverages cutting-edge GPU optimizations to process 4,000 input tokens across 2 distinct skills in under 3 seconds.

LiteRT-LM also brings smaller Gemma 4 models to IoT & edge devices with compelling performance. On a Raspberry Pi 5, for example, it achieves a prefill throughput of 133 tokens per second and decode throughput of 7.6 tokens per second on Gemma 4 E2B. With this performance, you can run smart home controllers, voice assistants, and robotics completely offline on constrained hardware.

Ready to get started? Check out the LiteRT-LM documentation for a complete guide and device-specific performance metrics. You can also view the individual model cards for Gemma 4 E2B and Gemma 4 E4B.

Run on any device

Gemma 4 is available today with support across an unprecedented range of platforms:

  • Mobile: Available with CPU/GPU support across both Android and iOS. Developers can also access and deploy Android's built-in and optimized Gemma 4 model system-wide via Android AICore.
  • Desktop and Web: Seamless performance on Windows, Linux, and macOS (via Metal), plus native browser-based execution powered by WebGPU.
  • IoT and robotics: We are bringing Gemma 4 to the edge on Raspberry Pi 5 and Qualcomm IQ8 NPU platforms.

Today, we are also launching a new Python package and CLI tool to make it easier than ever to experiment with Gemma in the console, and to power Gemma-based Python pipelines for IoT devices. The litert-lm CLI is available on Linux, macOS, and Raspberry Pi, enabling developers to try out the latest Gemma 4 model capabilities without writing any code. The CLI now also supports tool calling that powered Agent Skills in Google AI Edge Gallery. Python bindings for LiteRT-LM provide the flexibility to deeply customize your on-device LLM pipeline from Python. Getting started with LiteRT-LM in your terminal is simple using our guide.

The era of agentic experiences on-device is here, and we hope you are excited to start building on the edge. Regardless of which device you are building on, get started with our Agent Skills examples in Google AI Edge Gallery, and LiteRT-LM getting started guide. We can’t wait to see what you build!

Acknowledgements

We'd like to extend a special thanks to our significant contributors for their work on this project:

Advait Jain, Alice Zheng, Amber Heinbockel, Andrew Zhang, Byungchul Kim, Cormac Brick, Daniel Ho, Derek Bekebrede, Dillon Sharlet, Eric Yang, Fengwu Yao, Frank Barchard, Grant Jensen, Hriday Chhabria, Jae Yoo, Jenn Lee, Jing Jin, Jingxiao Zheng, Juhyun Lee, Lu Wang, Lin Chen, Majid Dadashi, Marissa Ikonomidis, Matthew Chan, Matthew Soulanille, Matthias Grundmann, Milen Ferev, Misha Gutman, Mohammadreza Heydary, Pradeep Kuppala, Qidong Zhao, Quentin Khan, Ram Iyengar, Raman Sarokin, Renjie Wu, Rishika Sinha, Rodney Witcher, Ronghui Zhu, Sachin Kotwani, Suleman Shahid, Tenghui Zhu, Terry Heo, Tiffany Hsiao, Wai Hon Law, Weiyi Wang, Xiaoming Hu, Xu Chen, Yishuang Pang, Yi-Chun Kuo, Yu-Hui Chen, Zichuan Wei, and the gTech team.

Previous

Next

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelreleaselaunch

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Bring state…modelreleaselaunchavailableplatformautonomousGoogle Deve…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 124 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases