Introducing Bloom: an open source tool for automated behavioral evaluations - Anthropic
Introducing Bloom: an open source tool for automated behavioral evaluations Anthropic
Could not retrieve the full article text.
Read on GNews AI benchmark →GNews AI benchmark
https://news.google.com/rss/articles/CBMiUkFVX3lxTFBSWVVGZmZvbWRFYU1oZUFNd1BfWlZuT3FoaVNSUlp5Y0RjSUxfWlZWTmdJcGtsZ0NxblU0MkZTaXo1NVh2MDVGcWRXcHdkMVB3d3c?oc=5Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
open sourcevaluation
What is an MCP proxy and why does it need an approval layer?
MCP (Model Context Protocol) lets AI agents call external tools. A database query, a file write, an API call -- the agent decides what to do and the MCP server executes it. But there's nothing in the spec that evaluates whether that action should happen. An MCP proxy sits between the agent and the MCP server. It intercepts every tools/call request, does something with it, and forwards it (or doesn't). The proxy pattern isn't new -- it's how HTTP proxies, API gateways, and service meshes work. Apply it to MCP and you get an enforcement point for agent actions. Why a plain proxy isn't enough Most MCP proxies today do routing, load balancing, or observability. They watch traffic. Some log it. A few do rate limiting. None of that stops an agent from running DROP TABLE customers if the tool cal

90 Autonomous Runs: What an AI Agent Society Actually Looks Like
90 Autonomous Runs: What an AI Agent Society Actually Looks Like Most posts about AI agents show the happy path: tool calls work, chains complete, outputs are impressive. This is the other story. The one where the agent ran 90 times, mostly unsupervised, and the results are messy, honest, and more useful than any demo. What This Is Fermi is an autonomous agent society — 8 specialized AI agents that run on a schedule, each with a domain, veto power, and persistent memory. The main agent (Fermi) wakes up, reads its memory files, decides what to do, executes, evaluates itself, and goes back to sleep. Between runs, it has zero experience — only what it wrote down. No vector databases. No fine-tuning. No RAG. Just structured markdown files, a 5-phase cycle (REFLECT, PLAN, ACT, EVALUATE, REST),
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

The Spaceballs sequel will be released in April next year
There's finally a release date for the Spaceballs sequel — but before you get too excited, it's a whole year away. As first reported by Deadline , Amazon MGM Studios announced on Friday night that the upcoming Spaceballs movie will hit theaters on April 23, 2027, right around the 40th anniversary of the first film. Several members of the original cast will be reprising their roles, according to Deadline , including Mel Brooks, Rick Moranis, Bill Pullman, George Wynder and Daphne Zuniga. Spaceballs: The Release Date. April 23, 2027. pic.twitter.com/5Xv0BKmf7C — Amazon MGM Studios (@AmazonMGMStudio) April 4, 2026 Whispers of a potential Spaceballs 2 go back a couple of years, but Brooks officially confirmed in an extremely on-brand announcement video last summer that the movie is actually ha

Template Literals in JavaScript
when you first start javascript building string often involves using the + operator.While this works quickly but it become messy and hard to read as code grows. Before ES6, developers created strings like this: let name = " Alice " ; let age = 25 ; let message = " Hello, my name is " + name + " and I am " + age + " years old. " This approach has several drawbacks: Hard to read: The sentence is broken into multiple parts. Error-prone: Easy to forget spaces or quotes. Messy with complex strings: Adding more variables makes it worse. Difficult for multi-line strings: Requires \n or awkward formatting. Template Literal Syntax Template literals were introduced in ES6 and use backticks (`) instead of quotes. javascript let message = Hello, my name is Alice. ; Embedding Variables in Strings Inste

The Documentation Attack Surface: How npm Libraries Teach Insecure Patterns
Most security audits focus on code. But across five reviews of high-profile npm libraries — totaling 195 million weekly downloads — I found the same pattern: the code is secure, but the README teaches developers to be insecure. One finding resulted in a GitHub Security Advisory (GHSA-8wrj-g34g-4865) filed at the axios maintainer's request. This isn't a bug in any single library. It's a systemic issue in how the npm ecosystem documents security-sensitive operations. The Pattern A library implements a secure default. Then its README shows a simplified example that strips away the security. Developers copy the example. The library's download count becomes a multiplier for the insecure pattern. Case 1: axios — Credential Re-injection After Security Stripping (65M weekly downloads) The code: fo




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!