Training frontier models isn’t as simple as adding more GPUs—one small problem and the whole coordinated dance falls apart. OpenAI’s Mark Handley and Greg Steinbrecher discuss how a new supercomputer network design, used to train some of the company’s latest models, keeps the whole system moving in lockstep, even with record numbers of GPUs. They break down Multipath Reliable Connection, a new protocol OpenAI developed with AMD, Broadcom, Intel, Microsoft, and Nvidia, and why they’re making it available for the whole industry to use.
Chapters
00:00 Intro
00:39 Greg and Mark's paths to OpenAI
04:34 Why training AI stresses networks differently
10:05 Bottlenecks, failures, and the cost of waiting
15:19 How Multipath Reliable Connection works
18:59 A protocol to route around failures
25:05 Why OpenAI is making MRC an open standard
35:09 Could AI compute move to space?
Hosted on Acast. See acast.com/privacy for more information.
Hey Jonas!
Hey Jonas! The official Jonas Brothers podcast. Hosted by Kevin, Joe, and Nick Jonas. It’s the Jonas Brothers you know... musicians, actors, and well, yes, brothers. Now, they’re sharing another side of themselves in the playful, intimate, and irreverent way only they can. Spend time with the Jonas Brothers here and stay a little bit longer for deep conversations like never before.
Stuff You Should Know
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Las Culturistas with Matt Rogers and Bowen Yang
Ding dong! Join your culture consultants, Matt Rogers and Bowen Yang, on an unforgettable journey into the beating heart of CULTURE. Alongside sizzling special guests, they GET INTO the hottest pop-culture moments of the day and the formative cultural experiences that turned them into Culturistas. Produced by the Big Money Players Network and iHeartRadio.