AI: post transformers

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.

Episodes

ATTENTION2D and lean attention: Distributed Self-Attention

October 29, 2025 • 31 mins

We cover two new innovations from Microsoft extending ideas from the original old **FlashAttention**. Flash Attention is an IO-aware attention algorithm for Transformers designed to address the quadratic time and memory complexity of standard self-attention on long sequences. By using **tiling and recomputation** to minimize slow **High Bandwidth Memory (HBM)** accesses in favor of fast **on-chip SRAM**, FlashAttention achieves sig...

Mark as Played

Sentence-BERT: Siamese Networks for Sentence Embeddings

October 29, 2025 • 30 mins

The provided text introduces **Sentence-BERT (SBERT)**, a modification of the popular **BERT** and **RoBERTa** language models, designed to efficiently generate **semantically meaningful sentence embeddings**. The authors address the significant **computational overhead** of using standard BERT for tasks requiring sentence-pair comparisons, such as semantic similarity search and clustering, which can take hours for large datasets. ...

Mark as Played

TxGNN: Foundation Model for Zero-Shot Drug Repurposing

October 29, 2025 • 15 mins

The source provides excerpts from a scientific paper introducing **TxGNN**, a novel graph foundation model designed for **zero-shot drug repurposing**, which aims to identify therapeutic candidates even for diseases with no existing treatments or limited molecular data. Developed by researchers affiliated with institutions like Harvard Medical School and Stanford University, this model leverages a **medical knowledge graph (KG)** a...

Mark as Played

STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency

October 26, 2025 • 17 mins

These April 29, 2024 paper provides an overview of the challenges associated with using **NVIDIA's Multi-Instance GPU (MIG)** technology, specifically focusing on the address translation mechanism in the **A100 GPU**. The papers reveal, primarily through **reverse-engineering efforts**, that the L2 and L3 Translation Lookaside Buffers (**TLBs**) utilize a compression design where each entry comprises **16 sub-entries** to enhance m...

Mark as Played

Strata: Efficient Hierarchical Context Caching for LLM Serving

October 26, 2025 • 16 mins

The August 26, 2025 collaboration between Stanford, NVIDIA, Shanghai Jiao Tong University, University of Michigan, University of Colorado Boulder, Carnegie Mellon University introduces **Strata**, a hierarchical context caching framework designed to improve the performance of serving Large Language Models (LLMs) with long context windows. The core problem Strata addresses is that while caching key-value (KV) states is essential for...

Mark as Played

FlashAttention: IO-Aware Fast and Memory-Efficient Attention

October 26, 2025 • 14 mins

This is a classic review of a now old but yet still important paper, the original Flash Attention paper. We review this in light of advances in compiler technology.

The June 23, 2022 Stanford paper describes the original **FlashAttention**, an innovative, IO-aware algorithm designed to significantly enhance the efficiency of the attention mechanism in Transformer models by optimizing memory usage and access. Standard attention suffe...

Mark as Played

Introducing MTEB v2: Multimodal Embedding Evaluation

October 26, 2025 • 12 mins

On October 20, 2025 Hugging Face released **MTEB v2**, a significant refactoring of the Massive Text Embedding Benchmark, which was originally designed for evaluating text embedding models across various tasks like classification and retrieval. The update addresses **package bloating and the need for broader support** by introducing a **more consistent interface, better typing, and improved documentation**. Key new features include...

Mark as Played

Structural Understanding of LLM Overthinking

October 26, 2025 • 17 mins

The October 10, 2025 paper from the University of Michigan and **Google DeepMind** concerning the phenomenon of **"overthinking" in Large Language Models (LLMs)** that utilize chain-of-thought (**CoT**) reasoning. The authors introduce a systematic analyzer called **TRACE** to structurally examine an LLM's thought process, decomposing it into sub-thoughts and progression graphs to move beyond superficial, length-based metrics of ov...

Mark as Played

Stuck in the Matrix: LLM Spatial Reasoning

October 26, 2025 • 13 mins

The October 23 2025 research paper **probes the spatial reasoning capabilities of Large Language Models (LLMs) when processing text-based inputs**, specifically focusing on how performance degrades as task complexity increases. Using a suite of five grid-based tasks—including quadrant identification, geometric transformations, distance evaluation, word searches, and tile sliding—the authors tested four models: GPT-4o, GPT-4.1, and ...

Mark as Played

LLM-Empowered Knowledge Graph Construction: A Survey

October 26, 2025 • 18 mins

This October 23, 2025 Xidian University academic survey systematically reviews the transformative impact of **Large Language Models (LLMs)** on the three core stages of **Knowledge Graph (KG) construction**: ontology engineering, knowledge extraction, and knowledge fusion. The text explains that LLMs are shifting the paradigm from rigid, rule-based systems to **unified, adaptive, and generative frameworks**. The paper is structured...

Mark as Played

Survey of Emerging Topics in AI and Robotics

October 26, 2025 • 12 mins

The October 23, 2025 collaboration between UC San Diego , NVIDIA , META , UW-Madison , and UNC introduces **Real Deep Research (RDR)**, a systematic framework designed to analyze vast amounts of research literature in rapidly growing fields such as AI and robotics. The methodology uses **large language and multimodal models (LLMs/LMMs)** for content extraction, reasoning, and semantic embedding to map the research landscape. RDR’s ...

Mark as Played

The Free Transformer: VAE Extension for Decoders

October 26, 2025 • 13 mins

The October 20, 2025 Meta FAIR paper introduces the **Free Transformer**, an innovative extension of the decoder-only Transformer architecture, which addresses the limitations of purely autoregressive language modeling by integrating **random latent variables** into the generative process. This new model is structured as a **conditional Variational Autoencoder (VAE)**, where an encoder learns the latent variables unsupervised, and ...

Mark as Played

LithOS: Operating System for Efficient GPU Machine Learning

October 26, 2025 • 18 mins

This 2025 CMU paper introduces **LithOS**, a novel operating system designed to improve the efficiency and utilization of Graphics Processing Units (GPUs) for machine learning (ML) workloads in data centers. The authors argue that current GPU management solutions, such as NVIDIA's MPS and MIG, are too coarse-grained, leading to low utilization and high latency in multi-tenant environments. LithOS proposes a transparent, OS-level ap...

Mark as Played

Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning

October 26, 2025 • 14 mins

This October 23, 2025 technical report from the Ling Team introduces the **Ring-linear model series**, specifically Ring-mini-linear-2.0 and Ring-flash-linear-2.0, which utilize a **hybrid attention architecture** combining linear and softmax attention mechanisms to enhance efficiency in long-context reasoning. The paper explains how this architecture, featuring **Mixture-of-Experts (MoE)** and advanced **FP8 training optimization*...

Mark as Played

GigaBrain-0: World Model-Powered Generalist Robots

October 26, 2025 • 19 mins

The October 22, 2025 GigaAI paper introduces **GigaBrain-0**, a novel Vision-Language-Action (VLA) model designed for general-purpose robotic systems, which is primarily trained using a combination of real-world robot data and synthetic data generated by a world model called **GigaWorld**. This approach aims to enhance generalization across various real-world conditions by leveraging diverse synthetic data streams like **Real2Real ...

Mark as Played

Open-o3 Video: Spatio-Temporal Grounded Reasoning

October 26, 2025 • 18 mins

The October 25, 2025 Bytedance paper introduces **Open-o3 Video**, a novel framework developed by researchers from **Peking University** and **ByteDance**, aimed at advancing video reasoning by incorporating explicit spatio-temporal evidence. Unlike prior models that only generate textual rationales, Open-o3 Video explicitly highlights key **timestamps** and **bounding boxes** to ground its answers in visual observations. To achiev...

Mark as Played

Cattell–Horn–Carroll Theory of Intelligence

October 26, 2025 • 15 mins

We review the Cattell-Horn-Carroll (CHC) used in recent AI papers on the definition of what AGI could be. The provided sources offer a comprehensive overview of the **Cattell–Horn–Carroll (CHC) theory of human cognitive abilities**, a widely accepted psychological model of intelligence. The first source, a Wikipedia excerpt, explains that the **CHC theory** synthesizes two prior models, Cattell and Horn's Gf–Gc model and Carroll's ...

Mark as Played

Internal Mechanisms of a Large Language Model

October 26, 2025 • 14 mins

This March 27, 2025 Anthropic paper provides an overview and detailed excerpts from two related Anthropic papers concerning the **interpretability of large language models**, specifically focusing on Claude 3.5 Haiku. The core objective is to reverse engineer the **internal computational mechanisms**, or "circuits," that drive the model's behavior, analogous to studying biology or neuroscience. The research introduces a **circuit t...

Mark as Played

Latent Constituency in Humans and LLMs

October 26, 2025 • 19 mins

The provided text is an academic paper titled **"Active Use of Latent Constituency Representation in both Humans and Large Language Models,"** which explores how sentences are internally represented in both the human brain and large language models (**LLMs**) like ChatGPT. The authors introduce a novel **one-shot learning word deletion task** where participants infer a deletion rule from a single example; they found that both human...

Mark as Played

Cognitive Impact of AI and Search on Essay Writing

October 26, 2025 • 13 mins

The June 2025 paper presents excerpts from a study examining the **cognitive and performance differences** in essay writing among participants using a Large Language Model (LLM) like ChatGPT, a traditional **Search Engine**, or **no external tools (Brain-only)**. The research uses **EEG connectivity analysis** to illustrate that the Brain-only group experienced a higher cognitive load, characterized by stronger and more extensive n...

Mark as Played

Popular Podcasts

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

The Clay Travis and Buck Sexton Show

The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Advertise With Us

AI: post transformers

Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}ATTENTION2D and lean attention: Distributed Self-Attention

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Sentence-BERT: Siamese Networks for Sentence Embeddings

TxGNN: Foundation Model for Zero-Shot Drug Repurposing

STAR: Sub-Entry Sharing TLB for Multi-Instance GPU Efficiency

Strata: Efficient Hierarchical Context Caching for LLM Serving

FlashAttention: IO-Aware Fast and Memory-Efficient Attention

Introducing MTEB v2: Multimodal Embedding Evaluation

Structural Understanding of LLM Overthinking

Stuck in the Matrix: LLM Spatial Reasoning

LLM-Empowered Knowledge Graph Construction: A Survey

Survey of Emerging Topics in AI and Robotics

The Free Transformer: VAE Extension for Decoders

LithOS: Operating System for Efficient GPU Machine Learning

Ring-linear: Efficient Hybrid Architecture for Long-Context Reasoning

GigaBrain-0: World Model-Powered Generalist Robots

Open-o3 Video: Spatio-Temporal Grounded Reasoning

Cattell–Horn–Carroll Theory of Intelligence

Internal Mechanisms of a Large Language Model

Latent Constituency in Humans and LLMs

Cognitive Impact of AI and Search on Essay Writing

Popular Podcasts

ATTENTION2D and lean attention: Distributed Self-Attention

Sentence-BERT: Siamese Networks for Sentence Embeddings