#221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53 - Last Week in AI

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:11):
Hello and welcome to the last week in AI podcast whereyou can hear chat about what's going on with ai As usual.
In this episode, we'll summarize and discusssome of last week's most interesting AI news.
You can head on over to last week inai.com for the links to all your stories.
You can also go to the episodedescription for the timestamps and so on.

(00:32):
I am one of your regular hosts, Andre Koff.
I studied AI in grad school and now work on it in astartup, and once again, this week Jeremy is busy.
Unfortunately, Jeremy has been very busylately, so has not been around, but I once
again have a great co-host of me, Michelle Lee.
Hey everyone.
I am Michelle Lee, your guest host for the week.

(00:56):
I went to grad school with Andre, also studyingai, and now I am the founder and CEO of Mera,
which is a fiscal AI startup based in San Francisco.
Right.
And we can kind of do a quick bit of news on that.
You just announced your biglaunch milestone for the company.

(01:17):
So go ahead and feel free to let peopleknow more about Medweb for a bit.
Metro ai, we're a fiscal AI company for life sciences.
We're building.
Physical AI infrastructure thatpowers the scientific frontier.
Our physical AI platforms can do lab work inside lifescience companies and generate a lot of experimental

(01:38):
data, which then in turn can help train frontierand foundation models in sciences and also help our
partners be able to find cures to disease faster.
Yeah, so working with a bunch of robot arms.
I just saw them recently doinga lot of pipetting and whatnot.
I don't really know the details.

(01:59):
Somewhat in female vat.
We'll have in this episode quitea few stories about robotics.
Actually.
There's a lot going on with humanoids, a lotof going on with self-driving cars, even.
Of foundation models, so you can lookforward to some discussion on that front.
Before we get into it real quick, I do wannaacknowledge for regular listeners the output has

(02:20):
been inconsistently, Lee, as I've said, Jeremy hasbeen busy with work and whatever else is up to.
So as always, I promise to try and make it more consistent.
But just bear with us please.
Let's go ahead and start with a news in tools and apps.
The first story is OpenAI upgradesCodex with a new version of GP five.

(02:42):
So they now have GPD five Codex, which is just GPfive, but better for coding is what it sounds like.
It is now available if you're using theCodex CLI or the Codex kind of IDE tool.
You can switch from using regular GP five toGP five codex, and also if you're using their

(03:03):
web agent, it's now powered by GP five Codex.
So pretty significant news given that itlooks like OpenNet is trying to catch up to
philanthropic and be a competitor to cloud codewhere they're a little bit behind is my impression.
Yeah, I think right now definitely talking tosoftware developers, the general consensus is

(03:29):
that cloud code is still the best tool out there.
So it's very interesting to see OpenAI release new toolsand better tools to make it more powerful for coding tasks.
Yeah, and people are, have been a little angryat philanthropic lately due to infra issues.
So from like a business strategy, I think openinghas a real opportunity to get some converts.

(03:52):
And if you go look on Reddit or Twitter, there's a bitof sentiment of like, oh, I've switched to Codex now.
It's great.
I'm trying it out.
I'm not a convert yet, but I might become one.
Next up we have Google Injects Geminiinto Chrome as AI browsers go mainstream.
So pretty much what it sounds like.
They now have a version of Chrome where onthe top right there's a little Gemini button.

(04:15):
You click on it and you can ask questions about the tab.
Talk to Gemini, potentially latergoing to more agentic tasks.
Very much in line with what's been seeingfrom perplexity from the browser company,
like integrating chatbots into the browser.
Also, AARO recently had their chat, their Chrome extension.

(04:38):
So it seems like just a matter of time till thishappened actually a Google a bit long to do this,
if anything, but definitely taken us towards thefuture where you just have a chat bot literally
in every single piece of software you ever use.
Yeah, I wonder if it took them a while becauseof also all the com competitiveness issues

(05:00):
that Google was facing with Chrome.
'cause this is definitely gives them a hugecompetitive advantage that they own both the browser
with Chrome, one of the most popular browsers, andalso now can integrate that directly with their ai.
Yeah.
Yeah, right.
Perplexity did try to buy Chrome or makea bid for it, so I guess that tracks.

(05:26):
Yeah.
Have you tried using any of these browser based.
Ai.
Actually I did use chat g PT agent a little bit.
So chat G Agent isn't like browser plugin or anything,but it does browse the web for you and do stuff.
And I found it to be pretty powerful.
Like things that you could not do otherwise.
It can go and open your Google Doc and click on links and goand do it for like half an hour, which is pretty impressive.

(05:52):
So I could see these being.
like an even easier way to automate stuffyou do via prompt instead of anything else.
Yeah that's interesting.
I tried dia for a little bit and just didn't find it smoothenough really, or didn't find that it brought enough value.
But I'd be interested in checkingout Gemini directly in Chrome.

(06:14):
And next we go to Enro.
They have a new feature in cloud.
It can now make you spreadsheets or PDFs,which I think is actually pretty different.
Like, I don't know that.
Or others can make it seemingly very like strategy.
Can do PowerPoints.
I don't know about spreadsheets, but becauseOpenAI has such a strong collaboration, Microsoft,

(06:39):
I believe they were able to roll out a lotof features with Microsoft 365 pretty early.
Oh, nice.
Well, in cloud now there's an experimentalfeatures called upgraded file creation analysis.
It sounds like we might be running like alittle cloud code agent within it, that if you
upload a file it can do gentech stuff to it.

(07:02):
So, yeah, if you are working with spreadsheetsor PowerPoints or PDFs, we should really
make cloud more powerful for that.
And just one last story in thesection, we have got a new video model.
This one is from Luma, and it is their Ray free model.
What they are saying is an AIreasoning video model, which is.

(07:27):
Kind of interesting.
They say it's using reasoning power to create AIvideo clips with more complex action sequences.
I don't know if that means that it interleaves videocreation with reasoning or pretty steady kind of
progression and video creation for the last year or two.
Now you're able to get clips in 20seconds here and Press them if you want.

(07:48):
You, as with any of these video models, you reallyhave to go and look at the previews to see the improved
clarity prompt visibility, all those kinds of things.
Yeah.
Interesting that it causes of its first reasoningvideo model I highly doubt that all the other
video models don't use reasoning at all.

(08:09):
Yeah.
It's hard to know to what extent this is.
Kind of marketing speak and to what extent thisis architectural or other things like that.
This is coming, I guess, with Google having releasedvia free I don't know how long ago, but not too long
ago, and being very impressive and very powerful.
So it's getting increasingly competitive.

(08:31):
Definitely
onto applications and business and as usual, or Iguess as often is the case, we begin with OpenAI
having some very businessy kinds of updates.
So for the past year or something like that, we'vebeen trying to go for profit, as we've covered

(08:51):
over many months, have had many legal struggles.
And now there's a bit of an update on it.
Apparently OpenAI secured Microsoft'sblessing for the transition to the for-profit.
So they have now this memorandum of understanding,so kind of a unofficial agreement, so to speak,

(09:14):
where they have terms that they are agreeing uponwhere they, we'll retain some sort of relationship,
but it'll be not quite as exclusive as OpenAIand Microsoft has had, I guess prior to 2025.
We've seen them become a little more antagonisticover time as we've tried to transition to for-profit.

(09:37):
Let's see.
Is there anything else to say here?
No.
It's like memorandum of understanding.
I. There's not many details here.
Yeah, it just sounds like some interesting updates.
Maybe.
Maybe to help with fundraising,maybe to just produce some more news.

(09:58):
Yeah, there's not really many details here.
All we know is apparently we'resending months of negotiation.
and this was stated by in a jointstatement, so presumably, behind the scenes.
This involved a lot of back and forth and iskind of a significant update for OpenAI because
they are under the gun to do VIS for transition.

(10:22):
They announced wanting to do it early this year.
They still haven't done it.
You know, they're in a tough spot and if they don'tcomplete this transition, they're in real trouble.
And related to that, we have the next story.
Microsoft is going to apparently lessen itsreliance on OpenAI by buying AI from philanthropic.
So they are going to integrate Enrointo their office 365 applications.

(10:49):
Presumably it's gonna be kind of a way to.
Pick your models as you use AI and at least accordingto this article and presumably like reasonable
speculation, this is related to whatever tensionsare currently existing between Microsoft and Open ai.

(11:10):
Well, maybe this makes sense why the new cloudmodels now can work with office 365 applications.
And it looks like also open AI is also workingto reduce their dependency on Microsoft by also
working with AI chips other cloud providers.
So it sounds like both parties aretrying to lessen their reliance and.

(11:34):
Lessen their partnership.
Yeah, that's right.
Yeah.
OpenAI did just sign a massive contract,Oracle that Oracle's stuck to jump quite a bit.
So as OpenAI, we've covered a lot on this podcast.
They have, if you like, into business drama.
Open The Eye is a never ending fountainof business drama and sort of interesting

(11:57):
developments and this is just latest to that.
Moving out to slightly less, let's say.
Boring.
I dunno.
Businessy news, we've got some stories on robotics.
First up, we have figure ai.
They are passing $1 billion in committed capitalwith their series C funding round, which would

(12:20):
make their post money valuation 39 billion.
Dollars figure is one of the several humanoidrobotics startups that are fairly new.
I forget how old figure is, but they must be from 2023 ish.
You know, obviously pre-revenue, they're stillmore or less an r and d lab at this point.

(12:41):
So, pretty cool to see the venture funds stillbeing committed to funding these very ambitious
humanoid robotics bets that are seeminglymaking a lot of progress from what I can see.
And maybe Michelle, you have a take on this?
Yeah, I mean, we've been seeing really exciting new.

(13:01):
Models come out of physical intelligence.
Dyna Robotics just launched with their new fundraise, sovery exciting to see more funding going into robotics.
And also very exciting to see, especiallymore and more efforts into hardware.

(13:21):
Which has been very much a bottleneck right now in robotics.
How do we actually have betterhands, better human or robots?
six years ago if you wanted a human or robot to do research,you would have to be in a select few universities around
the world that actually have access to human robots.

(13:43):
And now we have several human companiesall trying to build better hardware.
So it's very exciting.
Which I guess also leads to the
next news.
Exactly, yeah.
The next news is about China's unitytree, which is already planning an IPO.
Apparently so they are saying that thecompany might be valued at up to $7 billion.

(14:08):
Re if you haven't seen lately, has been big in humanoids.
They've unveiled this kind of mini humanoidthat is quite affordable and quite capable.
believe they also were.
Pretty active in the like dog quaded space whichis China has been killing it for quite a while now.

(14:29):
Even they have been profitable since 2020 actuallywith revenues now exceeding like $140 million.
So, china is, especially on robotics front,quite competitive with the frontier of ai.
But there is a question I think here of thesoftware AI side where it's still very tough.
Yeah, I mean it's very cool to see China focusingmore and more on humanoids and in general in robotics.

(14:54):
I heard that there are just dozens of humanoid companies,not just unit tree, that have been founded in China.
And this is great for the robotics industry.
As more companies are building hardware, the costof these general purpose hardware keeps going down.
And with unit tree, with their new.
Human, a robot, which is very affordable.

(15:16):
It costs around the same price of a robotic arm.
Most labs now in the US and university canvery easily afford their own humanoid, which
again was just not true several years ago.
Right.
Apparently it's what, $16,000 for this unit 3Gone robot, which is actually on the lower end of

(15:40):
robotic arms from what I've heard, so very cool.
Onto another type of robotics.
Robotaxis are also a very hot area this year.
First up, we've got a couple of stories about Tesla.
First of all, Tesla's robot taxiis planning to test in Nevada.
They now have testing.

(16:01):
Permit from Nevada's Department of Motor Vehicles.
On the slightly less positive side, there was reportingfrom Electric about there having already been free
Robo Taxii accidents with at least one injury.
Reported as well.
And this is from the robot taxi fleet in Austin,which is estimated to have about 12 vehicles still

(16:26):
in a very kind of small scale with safety drivers.
So if that is true, apparently theN-H-T-C-A is investigating Tesla for
potentially misreporting the crash data.
That would not be a good news and good signfor it in trying to compete with Waymo, which
has a stellar record from what I know at least.

(16:47):
Yeah, it's very tricky.
For these companies when they try to avoid or try to hidethese accidents, because that was really what got Cruz
in trouble in San Francisco was when after the accidentthey tried to hide information and many reasons why.

(17:07):
Cruz no longer.
Was able to operate in San Francisco, so Ihope Tesla is able to be honest and report the
accidents correctly so that they can continuebuilding that trust with the government officials.
Yeah, for sure.
And Robax does seem pretty capable.

(17:29):
I am a major Waymo user.
I don't know how often you use it, Michelle, but I'llbe looking forward to try Robax whenever I come here.
I mean, I love Waymo's and it really trulyfeels like magic and so very excited to see
more and more self-driving cars In the streets.

(17:49):
And on that note, one more story about robotaxis.
Next up we have Amazon's Zoox jumps into USRobot, robot taxi race with Las Vegas launch.
So they have this offering now of a publicrobotaxis service on the Las Vegas strip.
Apparently they're offering free rides fromselect locations with plans to expand citywide.

(18:14):
So.
It's really small.
Test, as you might expect, I suppose, with just theinitial set of testing from Zoox, they are using
their very futuristic model of car where you don'thave a steering wheel and it's like a tiny kind
of bus looking thing where you have, I love it.
All the seats facing inward.
It looks great.

(18:35):
I would love to try it.
I've been seeing people probably employeesand testers in San Francisco riding them.
It looks so cool because you're facing eachother so you can actually have meetings
while you're in the cars, which is very cool.
Yeah, and this zoox by way, Fornowas acquired by Amazon back in 2020.

(18:56):
I've been working on this since 2014.
So Zoox, even though they haven't deployed toextent that Tesla or way more have or haven't
demonstrated as much, given their backing and giventhat they've been in this for a long time I think.
They still have a chance to really kind ofgrow rapidly if this turns out to go well.

(19:19):
Yeah, and how fun to have it start in Las Vegas.
I know.
Yeah, I should go try it out.
And just two more stories in the sectionwith more funding news, we've got repli.
Hitting a free billion dollar valuationwith $150 million dollar annualized revenue.

(19:43):
That's after they raised 250million in a new funding ground.
So Repli the, one of the like key winners of evvibe coding era, I suppose that started this year
seemed to be growing very rapidly in terms oftheir revenue and, unsurprisingly, I suppose also
getting some impressive fundraising as a result.

(20:05):
rep would definitely makes it really easy forpeople to get started on coding, building their own
projects, and they have definitely done a great job
at, Being able to leverage all the new AI codingtools and integrating it with their platform.
Right.
And as a result, I'm looking at this kind of jumpedout me, apparently, where revenue went from 2.8 million

(20:32):
annualized revenue to 150 million in less than a year.
And this company has been around since 2016.
So Rev Light has been active for a longtime as a sort of dev tool for coders.
But now having kind of made it usable fornon-professionals, they're rocketing upward.
Well, honestly, I am surprised they've beenable to raise money with only a 2.8 million

(20:58):
ar revenue previously for them to grow this
big.
But yeah, very exciting.
We're definitely seeing a lot of AI tools now beingable to go to a hundred, 150, 200 mil revenue in
a very short amount of time, so, very exciting.
And the last story also on fundraisingperplexity the, I guess, primarily search tool.

(21:22):
Now they're trying to expand intoagents and browsers and so on.
They have reportedly raised 200 million at a $20 billionvaluation, and this is just after two months after they
raised a hundred million at an $18 billion valuation.
So one of the very fun things with this podcast is.

(21:44):
Ai, AI people just fundraise like constantly.
They just, every few months these companiesare getting billions of dollars if they can,
and that is certainly true in this case.
Yeah, so also their a RR just hit 200 milup from one 50 million reported last month.

(22:04):
So they're also growing quite,quite a lot in revenue as well.
And onto the projects and open source section.
Just a couple of things here.
First one is K to think a parameterefficient reasoning model.
So this is research paper plus an open source model comingfrom the Institute F Foundation models from the Mohad

(22:28):
Benza, university of Artificial Intelligence in the UAE.
Which I don't think we've coveredbefore, which is interesting.
They took an existing model QU2.5 dash 32 B as their base model.
And then they put it through allthe typical reasoning training.

(22:49):
So they had some fine tuning, somereinforcement, learning all the tricks.
They also have best event sampling some of thestuff on the inference side package in here.
And as a result, they get a 32billion parameter model, which is.
Similarly performing very impressively according toat least their M results, they are performing better

(23:14):
than deeps cq, R one, deeps CQ V three one GPT Osat a relatively small number of total parameters.
Now, this is a little bit unfair 'causethey're not comparing total active parameters.
We're comparing total parameters.
But nonetheless, I think.
Very cool to see even better open source models now on thereasoning side, and pretty impressive to see a university.

(23:41):
This kind of stuff.
Stuff.
Yeah.
And it's also interesting that their way ofgetting to better results is not just more
parameters, it's actually thinking about scaling.
It's using plan before you think prompt restructuring we'rejust seeing this like time to compute rethinking prompts,

(24:04):
really, I. Really trying to think of it almost as havingdifferent agents be able to think through different proms
and surfacing the best ideas come up as now one of thebest ways to improve reasoning, and I think even in large
foundation models, the time to compute and improving theprompts is so key to getting better model performance.

(24:32):
And just one more open source story here.
Next one is a benchmark, not a model.
The paper that came out of it is calledLocal Bench, a Benchmark for long context,
large language models in software.
Engineering, so I assume that's longcontext software engineering benchmark.

(24:54):
The basic point I make is the existingsoftware engineering benchmarks.
We have like SBE bench and so on, typically deal withGitHub issues and therefore are pretty localized.
So you might be working a code base, but the total amountof work, the total amount of files you need to look at.
The total amount of code you need to look at isrelatively minor, and as a result, the benchmarks

(25:15):
don't necessarily correlate too deeply to theperformance you get when you actually try to use
them via cloud code or via Codex or via any IV tools.
So this paper introduces a whole bunch of tasks.
So they have eight.
Categories of long context tasks.
They have architectural understanding, crossfile, refactoring, feature implementation,

(25:40):
bug investigation, et cetera, et cetera.
They have like a thousand of each of theseeight things, different difficulty levels in
terms of the length of, I think, tokens here.
So on the low side, you have what youtypically see in the existing benchmarks of.
10 K to a hundred K tokens, but then you scaleup to 10 x, 50 x, a hundred x. Those kinds of

(26:08):
context lengths for their kind of hardest level.
And as you might expect compared to the easier, or let'ssay the shorter existing coding benchmarks, existing
systems aren't able to solve these things as a be bench.
I think now we had like 90% we'relike saturating them with these tasks.

(26:28):
The existing models are nowhere near able tofully resolve 'em, and there's quite a hierarchy
in terms of their capabilities as well.
Yeah.
I think this is great that benchmarksare becoming more and more realistic.
That's always so important.
'cause when the benchmarks aren't realistic,we end up building what we can measure.
And 10 K tokens is not at all realistic to thetype of coding tasks that people do every day.

(26:57):
Even simple things like, 10 K is not enough.
If you're trying to work with multiple files and refactorand the context window, just a lot of people are now just
doing a lot of engineering tricks to be able to rememberwhat's happening and so we don't have to use up all the
context window, but it's great if we can start measuringhow these models can work with longer and longer context.

(27:25):
They also interestingly introduce some kind of interestingmetrics, so they have a total of eight software
engineering Excels metrics, architectural coherence, scoredependency, traversal accuracy, cross file reasoning depth.
System thinking score, robustness score, comprehensivenessscore, innovation score, and solution elegance score.

(27:51):
All based on, I guess, previous research thatsuggested variations of these, or at least the
last few that are more dealing with code quality.
So overall it seems like a very, thoughtfuleffort to make a very useful benchmark that
tracks actual software engineering quality.
Yeah, hopefully this just means these models will,can keep improving on real more realistic tasks.

(28:17):
And speaking of continuing to improve ontothe next section, research and Advancements.
The first story is self-improving embodiedfoundation models, and this is coming from
Google DeepMind in calibration with generalists,which I don't know what I'm aware of.
Oh, yeah.
Generalist is a robotic company that came out of DeepMind.

(28:40):
Oh, well, there you go.
That makes a lot of sense.
So in this collaboration, they introduceda self-improving embodied foundation model.
What that means is they begin with Something likethe RT two model that came out of DeepMind where
they take a whole bunch of video, a whole bunch ofrollouts of robotics and train robotics foundation

(29:03):
model in the sense that you're able to, get arobot arm in this case to technically do anything.
So give it some text and it'll try toexecute a policy to do whatever you want.
The self improving part here is after you do thepre-training, in se, in stage two, you can do online

(29:27):
self-improvement with an on policy rollouts of a robot.
So you have.
Ideally one person or maybe twopeople, like supervising actual robots.
And in these little cages, they have somethingthat is able to evaluate success criteria
on whatever tasks they're working on.

(29:48):
And a result, you are basically able to generate.
A continuous stream of success and failurerollouts, and at least in the ideal case,
creating a larger data set to then train on.
And yeah they implement this with realhardware and show that you're able to do.
Quite a significant improvement onsome of these language to table.

(30:12):
Aloha, single insertion wheel to sim language table.
All these different evaluations ofrobotic let's say ARM-based tasks.
This is very smart because in robotics one of thebiggest problems is just we don't have enough data.
You wanna do imitation learning and behavior cloning, great.

(30:33):
Now you have to collect lots of data, either VRheadsets or using Aloha to like teleoperate the robot.
Having the self-improvement basically is almost likea simplified reinforcement learning by without needing
to do reinforcement learning fully, where you only get.

(30:55):
Supervision from the rewards itself.
Now you can just predict the reward function anddetect the success and use that to supervise and.
Be able to get more trainable datain order to scale up their models.
Yeah.
In a way it's almost similar to what people are doingwith reasoning models now, which is you pre-train your

(31:17):
model, you then align it, and then you do a bunch of.
Executions and you know, actually doing enforcement,learning on the language models with these verifiable words.
This is kind of that in the robotics domain,which I suppose makes a lot of sense.
Yeah.
The only differences with reasoning models, you can startout fully self supervised here, you have to start out with

(31:39):
imitation learning, and then you can, after, with enoughdata, you now can improve it its own self supervision.
And the next research is also about a foundationmodel, also about, I guess, physics related foundation
model of, in this case it's not robotics, it's.
A physics foundation model.

(31:59):
So the paper is towards a physics foundation model.
I'm gonna be honest, it's mostly gonna goover my head, so I'm not gonna be able to
go deep in, but it looks pretty impressive.
So they frame this as there are existing physicsmodels, like physics inform neural networks.
We can do various things like, estimating thermalflows solving sheer flow, optical flow, sheer flow.

(32:27):
Yeah.
These kinds of things.
And they try to create a foundation model in the sensethat it's one model to do a whole bunch of stuff.
Right.
And the way they do that is that they.
Have this G five TGT model that is given a set of states.

(32:48):
And the states are these kind of spatial, temporalpatches containing things I guess like state, basically.
Right?
So they have.
Forces, fields et cetera.
And they basically just give you aprompt, which is a sequence of states.
And just from this sequence of states, the modelis able to then do these various kinds of physics

(33:16):
related operations, like thermal flows and so on.
And they do that And I train it on a diverseset of 1.8 terabytes corpus of simulation data
on these wide range of physical systems, thewithout explicit physics describing features.
So, seems pretty impressive.
Again I'm not.

(33:37):
Up on the physics simulation sideof research, but pretty cool.
It's pretty cool, but it seems like theytrain mostly on simulation data, so I am
curious if they can generalize to real data.
Yeah, I guess that would be a key question.
But they do compare to these specializedmodels and apparently it outperforms these

(34:04):
specialized architecture's, unknown tasks, andalso generalizes to other distribution problems.
So I guess the hope is you train onenough data, you train on enough.
Very data and it's gonna be able to do quite well,although I'm sure you're right, that it needs to
go beyond simulation to really be super reliable.

(34:26):
Next, we have yet another foundation model.
I just decided to make that kind of a theme.
And also in robotics.
But this time instead of arms, it's aboutlegs or Regals, I suppose you could say.
The paper is embodied navigation foundation model.
And so navigation is one of these sort of.

(34:48):
Pretty base sort of task that's beenlooked at in research over the past decade.
It's kind of what it sounds like.
The robot is given a goal place to go to and itneeds to make it there usually by relying on vision.
So you can give us to quarter pads, you can.
Give us to humanoids robot on wheels, and Itypically need to kind of navigate an apartment

(35:13):
or some other space to be able to get there.
And there's been quite a bit of research forabout a decade on doing reinforcement learning,
deep learning, all sorts of things like that.
So here the researchers have developed nav foam,which is a cross task and cross embodiment.
Navigation foundation model.

(35:34):
So they have 8 million navigationsamples from these different tasks.
An embodiments where embodiments, again can beQuadro pads humanoids, robots on wheels, and all
of these if you just give it an egocentric videoand language instructions, the model is then gonna.
Predict the trajectory that the agent shouldtake to get you to wherever you wanna get.

(35:59):
So very useful.
Type of model for where you want.
Yeah, I guess general purpose robotics, for instance.
I have to be honest, I feel like thisis just like publishing for the sake of
publishing a foundation model, right?
Like we have pretty good models to do self-driving.
Like that's why earlier in the episode we talkedabout several self-driving car company news

(36:23):
and with these kind of like diversity based.
Foundation models, like, Hey, we can do it on humanoid.
Hey, we can do it on a car.
Hey, we can do it on robot wheels.
Oftentimes it's really about diversity becauseif you look at the benchmark, the performance
is like at 64.4%, which still feels quite lowto actually be utilized in a real world setting.

(36:46):
So I wonder if, you know, for navigation,it's still more important to build the models.
Probably big foundation models necessary for navigation,but rather than trying to go across different types of
platforms, focusing on the specific type of platform.
Yeah, if I do try to incorporate autonomousdriving and UAV data here, which to your point

(37:12):
probably isn't necessary, I think navigationbenchmarks typically are more indoors oriented.
I guess the key benefit to trying to dothis cross embodiment stuff is trying to
have something that generalizes right.
So they do say that they are taking in different cameraview information have different temporal context.

(37:35):
Maybe if they focus a little bit to notdeal with cars and UAVs, but more so just.
Different types of embodied agents of differentheights and different kind of perspectives.
I think that could probably be quite useful.
Alrighty.
Well that's it for research.
Lots of foundation models.
Next on we go to policy and safety.

(37:55):
First up, we have something in our home state of California.
Philanthropic has endorsedCalifornia's AI safety bill, SB 53.
So this is, I believe we kind of.
Follow up version of a regulation that wasbeing discussed earlier that was passed

(38:17):
by vetoed by the government of California.
This is a tweaked version that took out some of the,let's say, more onerous requirements and philanthropic.
Explicitly endorsing it is pretty significant sign that theythink that this is a good way to regulate for AI safety.

(38:38):
And SB 53 is an AI safety bill that is meant toregulate basically philanthropic regulate companies
working on advanced AI models that might contributeto risks such as biological weapons or cyber attacks.
So as with a previous version of his bill, itpassing or not passing would be a pretty big deal.

(39:02):
I'm sure open AI would not be very happy if it passesbut it probably has a better chance than its predecessor.
philanthropic seems to be always on theforefront of really arguing for more safety,
but I am surprised that they are going.
After regulatory efforts too, to improve safety asit does mean that there will be more requirements and

(39:30):
legal requirements for people to, to innovate on models.
Yeah, according to this article and some, Iguess, policy experts are saying that this is
a more restrained approach compared to previousAI safety bill, so this could be a good.
At least a coronavirus seems tobe like the right way to do it.
They have this quote in their blog post.

(39:54):
The question isn't whether we have AIgovernance is whether we develop in a
thoughtfully today or we reactively tomorrow.
SB 53 offers a solid path toward former, so thebasic point is, according to philanthropic, this
is a good way to do this kind of regulation,
Next up, moving away from AI safety to copyright stuff.

(40:17):
Another popular topic for legal battles.
This time we have Warner Bros. Suing Midjourney.
So Warner Bros is filing a lawsuit against me,journey accusing them of copyright violations
related to things like Superman, Batman, bugs Bunny.

(40:39):
The complaint alleges that Midjourney has removedsafeguards that previously prevented users from creating
infringing videos and has resulted in unauthorized creation.
Of Batman and so on.
They have also the team in charge here has also filedlawsuits to my journey on behalf of Disney and Universal.

(41:03):
So, sounds like it's more of what Midjourney is already
facing.
Yeah.
Well, it does seem like.
Mid journey compared to a lot of other imagegeneration platforms, doesn't really have as many
safeguards against intellectual property violations.

(41:24):
But it's also interesting that all thesecompanies are now kind of jumping in and
dog piling and going against midjourney.
Yeah, I think it's because it's a pretty.
Straightforward thing to do, and as with the previouslawsuits here, if you go and read the PDF, the actual
complaint, it is kind of a fun one to read just becausethey are image attachments and so they have examples

(41:48):
of Batman and Superman and Wonder Woman and Scooby-Dooand all these characters as images generate from your
journey in this lawsuit, which is certainly fun to see.
And let's just do one last story.
This is a bit of a shorter episode.
The last one also deals with lawsuits andcopyright, but this is now in the text domain.

(42:16):
The company doing the lawsuit is RollingStones, and they are suing Google.
Over AI overview summaries.
So Velocity disclaiming that Google AI overview of apanel displays summaries that discourage users from
clicking through to the full articles, which wouldimpact the publisher's ad and subscription revenue

(42:38):
similar to what Perplexity has been dealing with.
I suppose now Google kind of doing the samething as perplexity and giving you this
kind of AI summary of a bunch of sources.
And I guess was just a matter oftime till Google had to address this.
Details here that apparently publishers like DMGMedia and others have reported significant declines in

(43:01):
clickthrough rates since the introduction of AI overviews.
Pure research found that users are lesslikely to click through to articles when AI
summaries are present in search of results.
So not a trivial matter, I mean, this is kind oflive or die for these kinds of publishers, right?
And I love how Google denies.

(43:23):
These claims, but if you actually ask Gemini ifAI overviews result in less traffic it actually
contradicts Google's public stance and says,yes, it does actually reduce clickthroughs.
Right.
And publishers are a. Tough spothere because they need Google, right?
They need to be indexed by Google.

(43:43):
They need the traffic generated by Google.
But on the other hand, now Google iscannibalizing on that business, on the clicks.
So it's tricky balance to strike.
And another kind of interesting question onthe legal dynamics, the financial dynamics of.

(44:05):
Kind of LM driven world as with image generation.
Now with search, text publishing, allof this is somehow still not resolved.
I mean, look, this is very disruptive technology,so a lot of old business models are just gonna be
disrupted and I mean, publishing has been very much.
hurt by the internet as well, so this is another wave ofpotential, less revenue, less clicks for these publishers.

(44:34):
So I can see why they are trying tofigure out a way to salvage the situation.
Well, we'll finish it with that slightly sad detail,although robotics stuff hopefully made up for it.
Thanks once again, Michelle for guest hosting.
Yeah, it was fun.

(44:54):
It was fun to, talk about the latest AI news with you.
Andre, thank you so much for inviting me.
Yeah, no.
Maybe we'll do it again.
We'll see.
And thank you also to the listeners as usual for tuningin, in apologies once again for not being very consistent.
Last week in AI is supposed to beevery week, but sometimes it's not.

(45:15):
Please do keep tuning in.

All Episodes

#221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53

Episode Transcript

Popular Podcasts

On Purpose with Jay Shetty

Dateline NBC

What Are We Even Doing? with Kyle MacLachlan

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}#221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53