The Quest for Tasks Frontier Models don't one-shot. | Berlin .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

June 17, 2026 · Berlin

terminal-bench-3: Evaluating Frontier Models

Explore the challenges of creating complex evaluation environments for frontier models, discussing successes, failures, and pitfalls to avoid.

Overview
Tech stack