Models claude gemini mistral model open-source product

I had a bunch of Skills sitting in a folder. None of them were callable as APIs

Dev.to AIby skrunApril 3, 20263 min read3 views

🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine you have a bunch of super cool toy robots, right? Each robot knows how to do one special trick, like singing a song or building a tower.

But guess what? Each robot could only do its trick by itself, in its own room! You couldn't tell your singing robot to help your building robot. They were stuck!

This story is about a clever grown-up who built a special magic bridge called "Skrun." Now, all the robot tricks can talk to each other and work together! It's like giving them walkie-talkies so they can share their amazing skills and make even bigger, cooler things happen! Hooray for teamwork!

So I built a runtime to fix that. The problem If you use Claude Code, Copilot, or Codex, you've probably created Agent Skills, those SKILL.md files that tell the AI what to do. I had a bunch of them. But they were stuck. I couldn't plug them into a product, trigger them from a webhook, or let any service call them with a POST request. Each skill was trapped inside the tool that created it. What I wanted take a SKILL.md → get a POST /run endpoint No new framework to learn. No infrastructure to set up. Just point at a skill, configure the model, and deploy. What I built Skrun , an open-source runtime that takes Agent Skills and turns them into callable APIs. skrun init --from-skill ./my-existing-skill # reads SKILL.md, generates agent.yaml skrun deploy # validates, builds, pushes # → POST ht

So I built a runtime to fix that.

The problem

If you use Claude Code, Copilot, or Codex, you've probably created Agent Skills, those SKILL.md files that tell the AI what to do.

I had a bunch of them. But they were stuck. I couldn't plug them into a product, trigger them from a webhook, or let any service call them with a POST request.

Each skill was trapped inside the tool that created it.

What I wanted

take a SKILL.md → get a POST /run endpoint

Enter fullscreen mode

Exit fullscreen mode

No new framework to learn. No infrastructure to set up. Just point at a skill, configure the model, and deploy.

What I built

Skrun, an open-source runtime that takes Agent Skills and turns them into callable APIs.

skrun init --from-skill ./my-existing-skill

reads SKILL.md, generates agent.yaml

skrun deploy

validates, builds, pushes

→ POST http://localhost:4000/api/agents/dev/my-skill/run`

Enter fullscreen mode

Exit fullscreen mode

Then you call it:

curl -X POST http://localhost:4000/api/agents/dev/code-review/run \  -H "Authorization: Bearer dev-token" \  -H "Content-Type: application/json" \  -d '{"input": {"code": "function add(a,b) { return a + b; }"}}'

curl -X POST http://localhost:4000/api/agents/dev/code-review/run \  -H "Authorization: Bearer dev-token" \  -H "Content-Type: application/json" \  -d '{"input": {"code": "function add(a,b) { return a + b; }"}}'

Enter fullscreen mode

Exit fullscreen mode

{  "status": "completed",  "output": {  "score": 60,  "issues": [  {"severity": "warning", "description": "Use const instead of var"}  ],  "review": "Lacks error handling..."  } }

{  "status": "completed",  "output": {  "score": 60,  "issues": [  {"severity": "warning", "description": "Use const instead of var"}  ],  "review": "Lacks error handling..."  } }

Enter fullscreen mode

Exit fullscreen mode

Multi-model

You pick the provider in agent.yaml, not in your code. Anthropic, OpenAI, Google, Mistral, Groq. If one fails, it falls back to the next.

model:  provider: google  name: gemini-2.5-flash  fallback:  provider: openai  name: gpt-4o

model:  provider: google  name: gemini-2.5-flash  fallback:  provider: openai  name: gpt-4o

Enter fullscreen mode

Exit fullscreen mode

Tool calling

Two approaches.

You can bundle your own CLI tools with the agent. Create a scripts/ directory, write whatever you want (shell, Node, Python), declare them in agent.yaml:

tools:

name: eslint_check script: scripts/eslint-check.sh description: "Run ESLint on JavaScript code"`

Enter fullscreen mode

Exit fullscreen mode

The LLM calls the tool when it needs to. Skrun executes the script, returns the result. Your agent can run a linter, query a database, call an internal API.

Or use MCP servers. Any MCP server from the npm ecosystem works via npx:

mcp_servers:

name: browser transport: stdio command: npx args: ["-y", "@playwright/mcp", "--headless"]`

Enter fullscreen mode

Exit fullscreen mode

Stateful

Agents can persist key-value state across runs. Run the same agent twice, it remembers what happened last time.

Roadmap

v0.1 runs on a local registry server. Next up:

Cloud deploy (the architecture has a RuntimeAdapter interface ready for sandboxed VMs), caller-provided API keys, streaming responses, and a hub to discover and share agents.

The numbers

4 packages (@skrun-dev/schema, cli, runtime, api), 10 CLI commands, 154 tests, 6 demo agents, MIT license.

Try it

npm install -g @skrun-dev/cli git clone https://github.com/skrun-dev/skrun.git cd skrun && pnpm install && pnpm build

npm install -g @skrun-dev/cli git clone https://github.com/skrun-dev/skrun.git cd skrun && pnpm install && pnpm build

Enter fullscreen mode

Exit fullscreen mode

Set a Google API key in .env, start the registry (pnpm dev:registry), and follow the "Try an example" section in the README.

github.com/skrun-dev/skrun

I'd love feedback on: The agent.yaml format (does the I/O contract make sense?), the skill import flow, and what agents you'd build with this.

Original source

Dev.to AI

https://dev.to/skrun/i-had-a-bunch-of-skills-sitting-in-a-folder-none-of-them-were-callable-as-apis-m46

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudegeminimistral

ModelsLive

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

arXiv:2604.02709v1 Announce Type: new Abstract: The formal reasoning capabilities of LLMs are crucial for advancing automated software engineering. However, existing benchmarks for LLMs lack systematic evaluation based on computation and complexity, leaving a critical gap in understanding their formal reasoning capabilities. Therefore, it is still unknown whether SOTA LLMs can grasp the structured, hierarchical complexity of formal languages as defined by Computation Theory. To address this, we introduce ChomskyBench, a benchmark for systematically evaluating LLMs through the lens of Chomsky Hierarchy. Unlike prior work that uses vectorized classification for neural networks, ChomskyBench is the first to combine full Chomsky Hierarchy coverage, process-trace evaluation via natural language

arXiv cs.CL

2mabout 2 hours ago

ModelsLive

Trivial Vocabulary Bans Improve LLM Reasoning More Than Deep Linguistic Constraints

arXiv:2604.02699v1 Announce Type: new Abstract: A previous study reported that E-Prime (English without the verb "to be") selectively altered reasoning in language models, with cross-model correlations suggesting a structural signature tied to which vocabulary was removed. I designed a replication with active controls to test the proposed mechanism: cognitive restructuring through specific vocabulary-cognition mappings. The experiment tested five conditions (unconstrained control, E-Prime, No-Have, elaborated metacognitive prompt, neutral filler-word ban) across six models and seven reasoning tasks (N=15,600 trials, 11,919 after compliance filtering). Every prediction from the cognitive restructuring hypothesis was disconfirmed. All four treatments outperformed the control (83.0%), includi

arXiv cs.CL

2mabout 2 hours ago

ModelsLive

Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments

arXiv:2604.02669v1 Announce Type: new Abstract: How biased is a language model? The answer depends on how you ask. A model that refuses to choose between castes for a leadership role will, in a fill-in-the-blank task, reliably associate upper castes with purity and lower castes with lack of hygiene. Single-task benchmarks miss this because they capture only one slice of a model's bias profile. We introduce a hierarchical taxonomy covering 9 bias types, including under-studied axes like caste, linguistic, and geographic bias, operationalized through 7 evaluation tasks that span explicit decision-making to implicit association. Auditing 7 commercial and open-weight LLMs with \textasciitilde45K prompts, we find three systematic patterns. First, bias is task-dependent: models counter stereotyp

arXiv cs.CL

2mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 267 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

arXiv cs.CL

2mabout 2 hours ago

ModelsLive

Trivial Vocabulary Bans Improve LLM Reasoning More Than Deep Linguistic Constraints

arXiv cs.CL

2mabout 2 hours ago

ModelsLive

Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments

arXiv cs.CL

2mabout 2 hours ago

ModelsLive

SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models

arXiv:2604.02660v1 Announce Type: new Abstract: As Large Language Models (LLMs) increasingly power decision-making systems across critical domains, understanding and mitigating their biases becomes essential for responsible AI deployment. Although bias assessment frameworks have proliferated for attributes such as race and gender, socioeconomic status bias remains significantly underexplored despite its widespread implications in the real world. We introduce SocioEval, a template-based framework for systematically evaluating socioeconomic bias in foundation models through decision-making tasks. Our hierarchical framework encompasses 8 themes and 18 topics, generating 240 prompts across 6 class-pair combinations. We evaluated 13 frontier LLMs on 3,120 responses using a rigorous three-stage

arXiv cs.CL

1mabout 2 hours ago