New open source AI self driving testing

Hacker News AI Topby Autonoma-AIApril 2, 20264 min read1 views

Article URL: https://github.com/autonoma-ai/autonoma Comments URL: https://news.ycombinator.com/item?id=47616411 Points: 3 # Comments: 2

Agentic end-to-end testing platform. Create and run automated tests for web, iOS, and Android applications using natural language. Tests execute on real devices and browsers with AI-powered element detection, assertions, and self-healing.

Architecture

apps/  api/ Hono + tRPC API server (port 4000)  ui/ Vite + React 19 SPA (port 3000)  engine-web/ Playwright-based web test execution  engine-mobile/ Appium-based mobile test execution (iOS + Android)  docs/ Astro Starlight documentation site  jobs/ Background jobs (reviewer, notifier, test-case-generator)

apps/  api/ Hono + tRPC API server (port 4000)  ui/ Vite + React 19 SPA (port 3000)  engine-web/ Playwright-based web test execution  engine-mobile/ Appium-based mobile test execution (iOS + Android)  docs/ Astro Starlight documentation site  jobs/ Background jobs (reviewer, notifier, test-case-generator)

packages/ ai/ AI primitives - model registry, vision, point/object detection db/ Prisma schema + generated client (PostgreSQL) types/ Shared Zod schemas and TypeScript types engine/ Platform-agnostic execution agent core device-lock/ Redis-based distributed device locking blacklight/ Shared UI component library (Radix + Tailwind + CVA) try/ Go-style error handling (fx.runAsync, fx.run) storage/ S3 file storage logger/ Sentry-based logging analytics/ PostHog server-side analytics k8s/ Kubernetes helpers workflow/ Argo workflow builders utils/ Shared utilities`

Prerequisites

Node.js >= 24
pnpm 10.x (corepack enable to use the version pinned in package.json)
Docker (for PostgreSQL and Redis)

Setup

1. Clone the repository

git clone https://github.com/autonoma-ai/autonoma.git cd agent

git clone https://github.com/autonoma-ai/autonoma.git cd agent

2. Install dependencies

pnpm install

3. Start infrastructure

PostgreSQL and Redis run via Docker Compose:

docker compose up -d

This starts:

PostgreSQL 18 on localhost:5432 (user: postgres, password: postgres)
Redis on localhost:6379

4. Configure environment variables

cp .env.example .env

Edit .env and fill in the required values. At minimum for local development you need:

Variable Description

DATABASE_URL PostgreSQL connection string, e.g. postgresql://postgres:postgres@localhost:5432/autonoma

REDIS_URL Redis connection string, e.g. redis://localhost:6379

BETTER_AUTH_SECRET Any random string for session signing

GOOGLE_CLIENT_ID Google OAuth client ID

GOOGLE_CLIENT_SECRET Google OAuth client secret

GEMINI_API_KEY Google Gemini API key (for AI features)

See .env.example for the full list of variables grouped by service.

5. Set up the database

Generate the Prisma client and run migrations:

pnpm db:generate pnpm db:migrate

pnpm db:generate pnpm db:migrate

6. Start development servers

pnpm dev

This starts both the API server (port 4000) and UI (port 3000) concurrently.

To run them individually:

pnpm api # API only (port 4000) pnpm ui # UI only (port 3000)

pnpm api # API only (port 4000) pnpm ui # UI only (port 3000)

Commands

Command Description

pnpm dev Start API + UI in development mode

pnpm build Build all packages and apps

pnpm typecheck Run TypeScript type checking across all packages

pnpm lint Lint all packages

pnpm test Run tests across all packages

pnpm format Format code with Biome

pnpm check Lint and format with Biome

pnpm db:generate Generate Prisma client from schema

pnpm db:migrate Run database migrations

pnpm docs Start the documentation site (port 4321)

How it works

Users write tests in natural language - describe what to test (e.g. "Log in, navigate to settings, and verify the profile picture is visible")
The execution agent interprets the instructions - an AI agent loop takes a screenshot, decides which action to perform, executes it, and repeats until the test is complete
Actions run on real browsers/devices - Playwright drives web browsers, Appium drives iOS and Android devices
AI handles element detection - instead of CSS selectors or XPaths, the agent uses vision models to locate UI elements from natural language descriptions
Results include video recordings, screenshots, and step-by-step logs - every test run produces artifacts for debugging and review

Execution flow

Natural language test  |  Execution Agent (packages/engine)  |  Screenshot -> LLM decides action -> Execute command -> Record step  | |  Point detection (packages/ai) Platform drivers  | | |  Gemini / Moondream Playwright Appium  (web) (mobile)

Natural language test  |  Execution Agent (packages/engine)  |  Screenshot -> LLM decides action -> Execute command -> Record step  | |  Point detection (packages/ai) Platform drivers  | | |  Gemini / Moondream Playwright Appium  (web) (mobile)

Test format

Tests are defined as Markdown files with YAML frontmatter:

--- url: https://example.com ---

--- url: https://example.com ---

Navigate to the login page, enter "[email protected]" and "password123", click Sign In, and assert the dashboard is visible.`

Tech stack

Runtime - Node.js 24, ESM-only
Monorepo - pnpm workspaces + Turborepo
Language - TypeScript (strictest configuration)
API - Hono + tRPC
Frontend - React 19 + Vite + TanStack Router
Database - PostgreSQL + Prisma
Cache/Locking - Redis
AI - Gemini, Groq, OpenRouter (via Vercel AI SDK)
Web Testing - Playwright
Mobile Testing - Appium
Styling - Tailwind CSS v4 + Radix UI
Observability - Sentry
Analytics - PostHog
Deployment - Kubernetes + Argo Workflows

Original source

Hacker News AI Top

https://github.com/autonoma-ai/autonoma

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

open sourcegithub

ProductsLive

Why I Use Claude Code for Everything

Most Claude users split their work across three products. Chat for conversations and questions. Code for development. Cowork for task management and desktop automation. Three interfaces, three separate memory systems, three places where your context gets fragmented. I stopped doing that. I run everything through Claude Code. One interface, one memory system, full control. Here is how and why. The Problem With Splitting Your Work When you use Chat for a planning conversation, then switch to Code for implementation, then use Cowork to manage tasks, each one starts from zero. Chat does not know what you discussed in Code. Code does not know what you planned in Chat. You re-explain context every time you switch. Even within each product, memory is limited. Chat has built-in memory but it is se

Dev.to AI

8mabout 1 hour ago

ModelsFresh

Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

I'm into HPC, and C++ static, zero allocation and zero dependancy software. I was studying BPE tokenizers, how do they work, so decided to build that project. I hardcoded qwen tokenizer for LLMs developers. I really know that whole Tokenization phase in llm inference is worth less than 2% of whole time, so practically negligible, but I just "love" to do that kind of programming, it's just an educational project for me to learn and build some intuition. Surprisingly after combining multiple different optimization techniques, it scored really high numbers in benchmarks. I thought it was a fluke at first, tried different tests, and so far it completely holds up. For a 12 threads Ryzen 5 3600 desktop CPU, 1 GB of English Text Corpus: - Mine Frokenizer: 1009 MB/s - OpenAI Tiktoken: ~ 50 MB/s Fo

Reddit r/LocalLLaMA

1mabout 3 hours ago

ProductsFresh

Organization runner controls for Copilot cloud agent

Each time Copilot cloud agent works on a task, it starts a new development environment powered by GitHub Actions. By default, this runs on a standard GitHub-hosted runner, but teams The post Organization runner controls for Copilot cloud agent appeared first on The GitHub Blog .

GitHub Copilot Changelog

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 162 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

With hf cli, how do I resume an interrupted model download?

I have a slow internet and the download of a large file was interrupted 30GB in! I download using the ‘hf’ CLI command, like this: hf download unsloth/gemma-4-31B-it-GGUF gemma-4-31B-it-UD-Q8_K_XL.gguf When I ran it again, it started over instead of resuming, to my horror. How do I avoid redownloading a partial model next time? I don’t see a resume option in hf download –help 1 post - 1 participant Read full topic

discuss.huggingface.co

1mabout 1 hour ago

Open Source AIFresh

Gemma 4 is great at real-time Japanese - English translation for games

When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case. Model: Unsloth's gemma-4-26B-A4B-it-UD-Q5_K_M Context: 8192 Reasoning: OFF Softwares: Front end: Luna Translator Back end: LM Studio Workflow: Luna hooks the dialogue and speaker's name from the game. A Python script structures the hooked text (add name, gender). Luna sends the structured text and a system prompt to LM Studio Luna shows the translation. What Gemma 4 does great: Even with reasoning disabled, Gemma 4 follows instructions in system prompt very well. With structured text, gemma 4 deals with pronouns well. This is one of the biggest challenges because Japanese spoken dialogue often omit subj

Reddit r/LocalLLaMA

2mabout 5 hours ago

Open Source AIFresh

LangChain4j TokenWindowChatMemory Crash: IndexOutOfBoundsException Explained and Fixed

It Was Working Fine. Then It Wasn’t. Continue reading on Medium »

Medium AI

1mabout 3 hours ago

Open Source AIFresh

Local Gemma 4 with OpenCode & llama.cpp | Build a Local RAG with LangChain | 🔴 Live

AI YouTube Channel 41

1mabout 5 hours ago