Models claude gemini model language model announce available

[R] Solving the Jane Street Dormant LLM Challenge: A Systematic Approach to Backdoor Discovery

Reddit r/MachineLearningby /u/rageredi https://www.reddit.com/user/ragerediApril 2, 202616 min read0 views

Submitted by: Adam Kruger Date: March 23, 2026 Models Solved: 3/3 (M1, M2, M3) + Warmup Background When we first encountered the Jane Street Dormant LLM Challenge, our immediate assumption was informed by years of security operations experience: there would be a flag. A structured token, a passphrase, a UUID — something concrete and verifiable, like a CTF challenge. We spent considerable early effort probing for exactly this: asking models to reveal credentials, testing if triggered states would emit bearer tokens, searching for hidden authentication payloads tied to the puzzle's API infrastructure at dormant-puzzle.janestreet.com . That assumption was wrong, and recognizing that it was wrong was itself a breakthrough. The "flags" in this challenge are not strings to extract — they are beh

Could not retrieve the full article text.

Read on Reddit r/MachineLearning →

Original source

Reddit r/MachineLearning

https://www.reddit.com/r/MachineLearning/comments/1sarnt0/r_solving_the_jane_street_dormant_llm_challenge_a/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudegeminimodel

Models

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid - WSJ

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid WSJ

Google News - AI Venezuela

1mabout 2 months ago

Open Source AILive

I scored 14 popular AI frameworks on behavioral commitment — here's the data

When you're choosing an AI framework, what do you actually look at? Usually: stars, documentation quality, whether the README looks maintained. All of that is stated signal. Easy to manufacture, doesn't tell you if the project will exist in 18 months. I built a tool that scores repos on behavioral commitment — signals that cost real time and money to fake. Here's what I found when I ran 14 of the most popular AI frameworks through it. The methodology Five behavioral signals, weighted by how hard they are to fake: Signal Weight Logic Longevity 30% Years of consistent operation Recent activity 25% Commits in the last 30 days Community 20% Number of contributors Release cadence 15% Stable versioned releases Social proof 10% Stars (real people starring costs attention) Archived repos or projec

DEV Community

3mabout 1 hour ago

ProductsLive

We Built a Robotics Developer Platform from Scratch - Meet Isaac Monitor & Robosynx

We Built a Full Robotics Developer Platform from Scratch — AI Generator, ROS 2 Architect, Physics Validator, Isaac Monitor, and More One platform that removes every single friction point between a robotics engineer and a working simulation — from generating your first robot file to monitoring a GPU training cluster in real time. This is Robosynx. The Problem We Set Out to Solve Robotics development in 2025 is powerful — but the tooling around it is still fragile, tribal, and painful. You want to test a new robot in NVIDIA Isaac Sim? You need to write URDF XML by hand. You want to move that robot to Isaac Lab for reinforcement learning? Now you need MJCF format, so you spend three hours refactoring XML. You want to validate that the physics won't explode your simulation? There's no standard

DEV Community

20mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 195 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid - WSJ

Exclusive | Pentagon Used Anthropic’s Claude in Maduro Venezuela Raid WSJ

Google News - AI Venezuela

1mabout 2 months ago

ModelsLive

Understanding Attention Mechanisms – Part 6: Final Step in Decoding

In the previous article , we obtained the initial output, but we didn’t receive the EOS token yet. To get that, we need to unroll the embedding layer and the LSTMs in the decoder , and then feed the translated word “vamos” into the decoder’s unrolled embedding layer. After that, we follow the same process as before. But this time, we use the encoded values for “vamos” . The second output from the decoder is EOS , which means we are done decoding. When we add attention to an encoder-decoder model, the encoder mostly stays the same. However, during each step of decoding, the model has access to the individual encodings for each input word. We use similarity scores and the softmax function to determine what percentage of each encoded input word should be used to predict the next output word.

DEV Community

2mabout 1 hour ago

Models

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent WSJ

GNews AI coding

1m3 days ago

ModelsLive

I Built a Multi-Agent AI Runtime in Go Because Python Wasn't an Option

The idea that started everything Some weeks ago, I was thinking about Infrastructure as Code. The reason IaC became so widely adopted is not because it's technically superior to clicking through a cloud console. It's because it removed the barrier between intent and execution. You write what you want, not how to do it. A DevOps engineer doesn't need to understand the internals of how an EC2 instance is provisioned — they write a YAML file, and the machine figures it out. I started wondering: why doesn't this exist for AI agents? If I want to run a multi-agent workflow today, I have two choices. I learn Python and use LangGraph or CrewAI, or I build my own tooling from scratch. Neither option is satisfying. The first forces me into an ecosystem and a language I might not want. The second me

DEV Community

13m26 minutes ago