llm-echo 0.4

Simon Willison BlogMarch 31, 20261 min read0 views

Release: <a href="https://github.com/simonw/llm-echo/releases/tag/0.4">llm-echo 0.4</a> <blockquote> <ul> <li>Prompts now have the <code>input_tokens</code> and <code>output_tokens</code> fields populated on the response.</li> </ul> </blockquote> Tags: <a href="https://simonwillison.net/tags/llm">llm</a>

Could not retrieve the full article text.

Read on Simon Willison Blog →

Original source

Simon Willison Blog

https://simonwillison.net/2026/Mar/31/llm-echo/#atom-everything

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

releasegithub

Releases

Vatican releases document on Christian anthropology and the future of humanity after AI - CatholicVote org

Vatican releases document on Christian anthropology and the future of humanity after AI CatholicVote org

Google News - AI Vatican

1mabout 1 month ago

Releases

The War Department to Expand AI Arsenal on GenAI.mil With xAI > U.S. Department of War > Release - U.S. Department of War (.gov)

The War Department to Expand AI Arsenal on GenAI.mil With xAI > U.S. Department of War > Release U.S. Department of War (.gov)

GNews AI Grok

1m3 months ago

ModelsLive

The Sequence Radar #837: Last Week in AI: From Model Releases to Market Structure

OpenAI's massive round and lots of model releases.

TheSequence

8mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 160 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

TurboQuant seems to work very well on Gemma 4 — and separately, per-layer outlier-aware K quantization is beating current public fork results on Qwen PPL

I’ve been experimenting with TurboQuant KV cache quantization in llama.cpp (CPU + Metal) on Gemma 4 26B A4B-it Q4_K_M on an Apple M4 Pro 48GB, and the results look surprisingly strong. Gemma 4 findings On Gemma 4, QJL seems to work well, and FWHT as a structured rotation substitute also looks like a good fit for the large attention heads (dk=256/512). My benchmark results: tq3j/q4_0: 37/37 on quality tests, 8/8 on NIAH tq2j/q4_0: 36/37, with the only miss being an empty response +34% faster than q4_0/q4_0 at 131K context TurboQuant overtakes q4_0 from 4K context onward So on this setup, ~3.1 bits per K channel gets near-zero accuracy loss with a meaningful long-context speedup. What’s also interesting is that this looks better than the public Gemma 4 fork results I’ve seen so far. In the l

Reddit r/LocalLLaMA

2mabout 2 hours ago

Open Source AIFresh

Talk like caveman

Article URL: https://github.com/JuliusBrussee/caveman Comments URL: https://news.ycombinator.com/item?id=47647455 Points: 3 # Comments: 0

Hacker News Top

3mabout 3 hours ago

Open Source AIFresh

quarkus-chat-ui: A Web Front-End for LLMs, and a Real-World Case for POJO-actor

Note: This article was originally published on SciVicsLab . quarkus-chat-ui: A Web Front-End for LLMs, and a Real-World Case for POJO-actor quarkus-chat-ui is a web UI for LLMs where multiple instances can talk to each other — built as a real-world use case for POJO-actor . Each quarkus-chat-ui instance exposes an HTTP MCP server at /mcp , so Instance A can call tools on Instance B, and Instance B can reply by calling tools back on A. The LLM backend — Claude Code CLI, Codex, or a local model via claw-code-local — acts as an MCP client that can reach these endpoints. The question was how to wire that up over HTTP, and how to handle the fact that LLM responses take tens of seconds and arrive as a stream. quarkus-chat-ui is the bridge that makes this work. Each instance wraps one LLM backend

DEV Community

11mabout 4 hours ago

Open Source AIFresh

I'm under 18, broke, and I just designed an open-source AI chip. Here's the full story.

I don't have a team. I don't have funding. I don't have a lab. I have a laptop, an internet connection, and an obsession with chips. This is the story of T1C — Tier 1 Chip — and why I built it. It started with a frustration. Every time I read about AI hardware, it was the same story. NVIDIA charges $30,000 for an H100. TSMC charges millions for a custom fab run. Apple Silicon is beautiful but completely closed. Intel, Qualcomm, AMD — all of them — locked behind NDAs, closed architectures, and billion-dollar relationships. I kept thinking: why does no one make an open-source AI chip that a real person can actually fabricate? Not a toy. Not a demo. A real architecture with real specs, real physics, and a real path to silicon. So I built one. T1C uses Digital In-Memory Computing — D-IMC. Inst

DEV Community

5mabout 4 hours ago