All Episodes

December 22, 2024 • 76 mins

The idea of model cards, which was introduced as a measure to increase transparency and understanding of LLMs, has been perverted into the marketing gimmick characterized by OpenAI's o1 system card. To demonstrate the adversarial stance we believe is necessary to draw meaning from these press-releases-in-disguise, we conduct a close read of the system card. Be warned, there's a lot of muck in this one.

Note: All figures/tables discussed in the podcast can be found on the podcast website at https://kairos.fm/muckraikers/e009/


  • (00:00) - Recorded 2024.12.08
  • (00:54) - Actual intro
  • (03:00) - System cards vs. academic papers
  • (05:36) - Starting off sus
  • (08:28) - o1.continued
  • (12:23) - Rant #1: figure 1
  • (18:27) - A diamond in the rough
  • (19:41) - Hiding copyright violations
  • (21:29) - Rant #2: Jacob on "hallucinations"
  • (25:55) - More ranting and "hallucination" rate comparison
  • (31:54) - Fairness, bias, and bad science comms
  • (35:41) - System, dev, and user prompt jailbreaking
  • (39:28) - Chain-of-thought and Rao-Blackwellization
  • (44:43) - "Red-teaming"
  • (49:00) - Apollo's bit
  • (51:28) - METR's bit
  • (59:51) - Pass@???
  • (01:04:45) - SWE Verified
  • (01:05:44) - Appendix bias metrics
  • (01:10:17) - The muck and the meaning


Links


Additional o1 Coverage

  • NIST + AISI [report] - US AISI and UK AISI Joint Pre-Deployment Test
  • Apollo Research's paper - Frontier Models are Capable of In-context Scheming
  • VentureBeat article - OpenAI launches full o1 model with image uploads and analysis, debuts ChatGPT Pro
  • The Atlantic article - The GPT Era Is Already Ending


On Data Labelers

  • 60 Minutes article + video - Labelers training AI say they're overworked, underpaid and exploited by big American tech companies
  • Reflections article - The hidden health dangers of data labeling in AI development
  • Privacy International article = Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets


Chain-of-Thought Papers Cited

  • Paper - Measuring Faithfulness in Chain-of-Thought Reasoning
  • Paper - Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
  • Paper - On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
  • Paper - Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models


Other Mentioned/Relevant Sources


Unrelated Developments

  • Cruz's

Advertise With Us

Popular Podcasts

Stuff You Should Know
New Heights with Jason & Travis Kelce

New Heights with Jason & Travis Kelce

Football’s funniest family duo — Jason Kelce of the Philadelphia Eagles and Travis Kelce of the Kansas City Chiefs — team up to provide next-level access to life in the league as it unfolds. The two brothers and Super Bowl champions drop weekly insights about the weekly slate of games and share their INSIDE perspectives on trending NFL news and sports headlines. They also endlessly rag on each other as brothers do, chat the latest in pop culture and welcome some very popular and well-known friends to chat with them. Check out new episodes every Wednesday. Follow New Heights on the Wondery App, YouTube or wherever you get your podcasts. You can listen to new episodes early and ad-free, and get exclusive content on Wondery+. Join Wondery+ in the Wondery App, Apple Podcasts or Spotify. And join our new membership for a unique fan experience by going to the New Heights YouTube channel now!

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.