Bring state-of-the-art agentic skills to the edge with Gemma 4
Google DeepMind has launched Gemma 4, a family of state-of-the-art open models designed to enable multi-step planning and autonomous agentic workflows directly on-device. The release includes the Google AI Edge Gallery for experimenting with "Agent Skills" and the LiteRT-LM library, which offers a significant speed boost and structured output for developers. Available under an Apache 2.0 license, Gemma 4 supports over 140 languages and is compatible with a wide range of hardware, including mobile devices, desktops, and IoT platforms like Raspberry Pi.
APRIL 2, 2026
Today, Google DeepMind launched Gemma 4, a family of state-of-the-art open models that redefine what is possible on your own hardware. Now available under the Apache 2.0 license, Gemma 4 gives developers a powerful toolkit for on-device AI development. With Gemma 4, you can now go beyond chatbots to build agents and autonomous AI use cases running directly on-device. Gemma 4 enables multi-step planning, autonomous action, offline code generation, and even audio-visual processing, all without specialized fine-tuning. It’s also built for a global audience with support for over 140 languages.
Sorry, your browser doesn't support playback for this video
Gemma 4 enables visual processing and support in >140 languages
We are excited to announce that you can experience Gemma 4’s expansive capabilities on the edge starting today! Access Android's built-in Gemma 4 model through the new AICore Developer Preview, or leverage Google AI Edge to build agentic, in-app experiences across mobile, desktop, and edge devices.
In this post, we’ll show you how to get started with Google AI Edge using both Google AI Edge Gallery and LiteRT-LM.
Discover Agent Skills with Gemma 4 in Google AI Edge Gallery
Google AI Edge Gallery, available on iOS and Android, allows you to build and experiment with AI experiences that run entirely on-device. Today, we are thrilled to announce the launch of Agent Skills, one of the first applications to run multi-step, autonomous agentic workflows entirely on-device. Powered by Gemma 4, Agent Skills can:
- Augment the knowledge base: Gemma 4 can access the information beyond its initial training data using skills to enable agentic enrichment type experiences. For example, you can build a skill to query Wikipedia, allowing the agent to query and respond to any encyclopedic question.
Sorry, your browser doesn't support playback for this video
Query Wikipedia or other knowledge sources
- Produce rich, interactive content: Transform paragraphs or videos into concise summaries or flashcards for studying, or transform data into interactive visualizations or graphs. For example, you can create a skill that automatically summarizes and displays trends in hours of sleep and moods per day based on user speech input:
Sorry, your browser doesn't support playback for this video
Create graphs, flashcards, and other visualizations
- Expand Gemma 4's core capabilities: Integrate with other models, such as text-to-speech, image generation, or music synthesis. For instance, you can utilize skills to pair photos with music that perfectly matches the mood.
Sorry, your browser doesn't support playback for this video
Integrate with other models to synthesize music and understand images
- Create comprehensive end-to-end experiences: Rather than navigating multiple apps, users can manage complex workflows and build their own applications entirely through conversation with Gemma 4. To illustrate this, we built a working app that describes and plays the vocal calls of animals.
Sorry, your browser doesn't support playback for this video
Build multi-step workflows and end-to-end experiences
To experience the Gemma 4 E2B and E4B models in action, check out the Google AI Edge Gallery app today. Within the app, it’s easy to start experimenting and creating your own skills with our guide. We can’t wait to see what you build and share your skills in the Github Discussion!
Leverage Gemma 4 across devices with LiteRT-LM
For developers who are interested in deploying Gemma 4 in-app or across a broader range of devices, LiteRT-LM provides stellar performance with reach across the entire hardware spectrum. LiteRT-LM adds GenAI specific libraries on top of LiteRT, which is already trusted by millions of Android and edge developers with its high-performance libraries XNNPack and ML Drift. LiteRT-LM builds on this stack and enhances model performance with the following new features:
- Minimal Memory footprint: Run Gemma 4 E2B using <1.5GB memory on some devices thanks to LiteRT’s support for 2-bit and 4-bit weights along with memory-mapped per-layer embeddings
- Constrained decoding: Get structured, predictable outputs every time, ensuring your AI-driven apps and tool-calling scripts remain reliable in production.
- Dynamic context: Flexibility to handle single models across CPUs and GPUs with dynamic context lengths, allowing you to take full advantage of the Gemma 4 128K context window.
To support the extended context lengths required by agentic use cases, LiteRT-LM leverages cutting-edge GPU optimizations to process 4,000 input tokens across 2 distinct skills in under 3 seconds.
LiteRT-LM also brings smaller Gemma 4 models to IoT & edge devices with compelling performance. On a Raspberry Pi 5, for example, it achieves a prefill throughput of 133 tokens per second and decode throughput of 7.6 tokens per second on Gemma 4 E2B. With this performance, you can run smart home controllers, voice assistants, and robotics completely offline on constrained hardware.
Ready to get started? Check out the LiteRT-LM documentation for a complete guide and device-specific performance metrics. You can also view the individual model cards for Gemma 4 E2B and Gemma 4 E4B.
Run on any device
Gemma 4 is available today with support across an unprecedented range of platforms:
- Mobile: Available with CPU/GPU support across both Android and iOS. Developers can also access and deploy Android's built-in and optimized Gemma 4 model system-wide via Android AICore.
- Desktop and Web: Seamless performance on Windows, Linux, and macOS (via Metal), plus native browser-based execution powered by WebGPU.
- IoT and robotics: We are bringing Gemma 4 to the edge on Raspberry Pi 5 and Qualcomm IQ8 NPU platforms.
Today, we are also launching a new Python package and CLI tool to make it easier than ever to experiment with Gemma in the console, and to power Gemma-based Python pipelines for IoT devices. The litert-lm CLI is available on Linux, macOS, and Raspberry Pi, enabling developers to try out the latest Gemma 4 model capabilities without writing any code. The CLI now also supports tool calling that powered Agent Skills in Google AI Edge Gallery. Python bindings for LiteRT-LM provide the flexibility to deeply customize your on-device LLM pipeline from Python. Getting started with LiteRT-LM in your terminal is simple using our guide.
The era of agentic experiences on-device is here, and we hope you are excited to start building on the edge. Regardless of which device you are building on, get started with our Agent Skills examples in Google AI Edge Gallery, and LiteRT-LM getting started guide. We can’t wait to see what you build!
Acknowledgements
We'd like to extend a special thanks to our significant contributors for their work on this project:
Advait Jain, Alice Zheng, Amber Heinbockel, Andrew Zhang, Byungchul Kim, Cormac Brick, Daniel Ho, Derek Bekebrede, Dillon Sharlet, Eric Yang, Fengwu Yao, Frank Barchard, Grant Jensen, Hriday Chhabria, Jae Yoo, Jenn Lee, Jing Jin, Jingxiao Zheng, Juhyun Lee, Lu Wang, Lin Chen, Majid Dadashi, Marissa Ikonomidis, Matthew Chan, Matthew Soulanille, Matthias Grundmann, Milen Ferev, Misha Gutman, Mohammadreza Heydary, Pradeep Kuppala, Qidong Zhao, Quentin Khan, Ram Iyengar, Raman Sarokin, Renjie Wu, Rishika Sinha, Rodney Witcher, Ronghui Zhu, Sachin Kotwani, Suleman Shahid, Tenghui Zhu, Terry Heo, Tiffany Hsiao, Wai Hon Law, Weiyi Wang, Xiaoming Hu, Xu Chen, Yishuang Pang, Yi-Chun Kuo, Yu-Hui Chen, Zichuan Wei, and the gTech team.
Previous
Next
Google Developers Blog
https://developers.googleblog.com/bring-state-of-the-art-agentic-skills-to-the-edge-with-gemma-4/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelreleaselaunch
IREX Launches Smarter, Faster Fire and Smoke AI Detection to Protect Communities and Critical Infrastructure
[Washington, DC – April 2, 2026] – IREX, a global pioneer in ethical AI and intelligent video analytics deployed across 10+ countries and over 300,000 cameras, announced a major update to its FireTrack smoke and fire detection module. The update doesn’t require any additional hardware and broadens FireTrack’s applicability to critical infrastructure such as energy [ ] This story continues at The Next Web

Show HN: The Comments Owl for HN browser extension now hides obvious "AI" items
If you want to give yourself a break from the flood of "AI" items on Hacker News until/unless you feel like reading them, the Comments Owl for Hacker News browser extension now adds a handy toggle to your right-click context menu on the main item list pages (or the extension popup, for mobile browsers) which filters out the most obvious "AI" items by title and site, using (editable) regular expressions which have been tested on the contents of these pages over the last week or so. The extension's primary functionality is to make it easier to follow comment threads across repeat visits, and catch up with recent comments, but it also offers other UI + UX tweaks, such as muting and noting users, and tweaks to the UI on mobile. Release notes and screenshots for new functionality: https://githu
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

IREX Launches Smarter, Faster Fire and Smoke AI Detection to Protect Communities and Critical Infrastructure
[Washington, DC – April 2, 2026] – IREX, a global pioneer in ethical AI and intelligent video analytics deployed across 10+ countries and over 300,000 cameras, announced a major update to its FireTrack smoke and fire detection module. The update doesn’t require any additional hardware and broadens FireTrack’s applicability to critical infrastructure such as energy [ ] This story continues at The Next Web

U.S. economy adds stronger-than-expected 178,000 jobs in March
Data: Bureau of Labor Statistics ; Chart: Courtenay Brown/Axios The labor market snapped back with 178,000 jobs added in March, while the unemployment rate ticked down to 4.3%, the government said Friday. Why it matters : Hiring boomed after shedding jobs in February, suggesting a steadying labor market as the Iran war injected fresh uncertainty into the economic outlook. The data captures the first full month of hiring since the Iran war began, offering an early read on how businesses are responding to the shock. The economy added nearly three times as many jobs as economists expected. By the numbers: The Bureau of Labor Statistics said on Friday that the economy shed 133,000 jobs in February, 41,000 more than initially reported. Jobs growth was held down by a health care strike and cold



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!