Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …
https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained
… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding
06:22 - WeirdML Bench
07:01 - Simple Bench Record High
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats
Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/
Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87
https://simple-bench.com/
WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542
Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot
https://aistudio.google.com/prompts/new_chat
Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314
LiveCode Bench: https://livecodebench.github.io/
SWE-Verified: https://arxiv.org/pdf/2310.06770
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Stuff You Should Know
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Dateline NBC
Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com
On Purpose with Jay Shetty
I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!