Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
Welcome to Innovation Pulse, your quick, no-nonsense update on the latest in AI.
(00:10):
First, we will cover the latest news.
Open AI's chat GPT agent is revolutionising task management,
while collaboration with Shopify boosts revenue.
AI alters language, NVIDIA expands globally, and Meta shifts AI strategies.
After this, we'll dive deep into Moonshot AI's groundbreaking Kimi K2 model.
(00:33):
Open AI has introduced the chat GPT agent, a new AI tool designed to handle complex multi-step tasks,
such as managing calendars, planning meals, and creating presentations.
This tool operates using a virtual computer and integrates capabilities from Open AI's existing tools, operator and deep research.
(00:56):
Developed by a team of 20 to 35 members, it can perform tasks like planning a date night by syncing with calendars and restaurant reservations.
However, it requires user permission for actions like sending emails or making bookings.
Although it may be slow, Open AI emphasises its efficiency for complex tasks.
(01:19):
Currently, financial transactions are restricted, and the tool features a watch mode for browsing.
Initially available to Pro, Plus and Team users, it will soon be accessible to Enterprise and Education users.
The AI agent trend is gaining traction, with companies striving to develop tools similar to Ironman's JARVIS.
(01:46):
The chat introduces new features to enhance user interaction, making it more intuitive and engaging.
The deep research mode offers fast, structured reports on complex topics, transforming Lachey into a collaborative research assistant.
Users can speak to Lachey with the new voice mode, voxtral, and enjoy multilingual reasoning powered by the Magistral model.
(02:12):
Projects allow for organising conversations into context-rich folders.
While advanced image editing is available through a partnership with Black Forest Labs,
in the financial world, 2025 marks a resurgence of IPOs with notable listings on the NYSE, including companies like NIQ Global Intelligence and CHIME.
(02:36):
This revival is attributed to market stability and investor confidence, spotlighting a diverse range of sectors.
Let's now turn our attention to the revenue model. OpenAI plans to introduce a payment system in chatGPT,
allowing it to earn commission on online sales according to the Financial Times.
(02:59):
This new feature will enable merchants to fulfil orders directly through chatGPT,
providing OpenAI an additional revenue stream beyond its subscription model.
The system, still in development, is being shown to brands in collaboration with partners like Shopify, though details are still being discussed.
(03:20):
The partnership, announced earlier this year, aligns with OpenAI's efforts to enhance its shopping features.
Currently, chatGPT suggests products via links to retailers.
OpenAI's annual revenue has increased significantly, reaching $10 billion by June, despite a $5 billion loss last year.
(03:43):
OpenAI and Shopify have not commented on the report.
A recent study reveals that human language may be evolving due to AI chatbots like chatGPT.
Researchers from the Max Planck Institute found that people are increasingly using GPT words, such as delve, meticulous and bolster in conversations.
(04:09):
This linguistic shift began with the rise of chatGPT.
To determine this, researchers analysed millions of texts asking chatGPT to polish them to identify AI-favourite words.
They then tracked these words' frequency in YouTube videos and podcasts before and after chatGPT's debut.
(04:30):
The findings suggest a cultural feedback loop, where patterns in AI influence human speech.
The study highlights how AI integration impacts communication, reflecting its growing role in our lives.
Other studies note AI's broader effects, with concerns about overreliance and impersonation, risks on social media platforms.
(04:56):
NVIDIA founder and CEO Jensen Huang recently promoted AI in Washington District of Columbia and Beijing, highlighting its global benefits for business and society.
In the United States, he met with President Trump and policy makers, expressing NVIDIA's support for job creation, AI infrastructure and onshore manufacturing.
(05:19):
In Beijing, he discussed AI's potential to boost productivity and opportunity with government and industry leaders.
Huang announced NVIDIA's plans to resume sales of the NVIDIA H20 GPU with expected license approvals from the United States government.
He also introduced the new NVIDIA RTX Puro GPU, suitable for AI and smart factories.
(05:44):
Huang emphasised that AI is now essential like energy and the internet.
NVIDIA is committed to open-source research and foundation models, aiming to democratise AI and support emerging economies across the globe.
Join us as we explore the strategic pivot in AI.
(06:05):
Meta is considering shifting from its open-source AI model, Behemoth, to a private version.
Senior team members, including new chief AI officer Alexander Wang, debated pausing Behemoth's release to focus on a proprietary model.
This represents a strategic shift from Meta's tradition of open-source sharing, which has fostered innovation and allowed others to build on their work.
(06:31):
Despite this, Meta still values openness and recognises the competitive pressures from rivals like Google and OpenAI.
Recently, Meta invested $14.3 billion in scale AI, leading to the creation of Meta Superintelligence Labs with Wang at the helm.
CEO Mark Zuckerberg plans to invest heavily in AI infrastructure, aiming for a supercluster named Prometheus.
(06:59):
These moves indicate Zuckerberg's desire to accelerate AI progress and position Meta at the forefront of superintelligence development.
Moonshot AI, the company behind Kimi Chatbot, has launched Kimi K2, an open-source language model that challenges systems from open AI and Anthropic.
(07:21):
Kimi K2 includes one trillion parameters, optimised for coding and autonomous tasks, and offers both a foundation and an instruction-tuned model.
Its standout feature is its agentic capability, allowing it to autonomously execute tasks like coding and complex workflows.
(07:43):
Kimi K2 outperformed competitors and benchmarks, achieving a 53.7% accuracy on live code bench.
The model is cost-efficient thanks to the Muon Clip optimiser, which stabilises training.
Moonshot's open-source strategy, paired with competitive pricing, undercuts major players, positioning Kimi K2 as a practical tool beyond mere demos.
(08:09):
This launch marks a significant moment in AI, showing open-source models can match, if not surpass, proprietary systems, challenging traditional business models in the industry.
And now, pivot our discussion towards the main AI topic.
Welcome back to Innovation Pulse. I'm Alex, and today I'm joined by AI researcher and tech analyst, Yakov Lasker.
(08:39):
Yakov, I have to ask, did you actually let an AI book your flight here today?
Oh, you heard about that? Well, funny you should ask, because as of two days ago, that's not just a hypothetical anymore.
OpenAI just dropped ChatGPT agent on July 17th. And I'm not exaggerating when I say this might be the biggest shift in how we interact with computers since, well, since ChatGPT itself launched.
(09:02):
Wait, hold up. You're telling me ChatGPT can now actually do things instead of just talking about doing things?
That's exactly what I'm telling you. And Alex, when I say do things, I mean it can literally see your screen, click buttons, fill out forms, navigate websites,
and complete entire workflows from start to finish. We're talking about an AI that uses its own virtual computer to carry out tasks.
(09:27):
Okay, but I've heard promises like this before. What makes this different from all the other AI agents we've been hearing about?
Here's what caught my attention. OpenAI combined three of their most powerful tools into one unified system.
Remember operator, their web browsing agent, and deep research, which could synthesize information from dozens of sources?
(09:49):
They've merged those with ChatGPT's conversational intelligence. It's like taking three specialists and creating one super generalist.
That's actually kind of brilliant. But walk me through what this looks like in practice. What can it actually do?
Picture this scenario they demoed. Someone asked it to help prepare for a wedding.
Not just give me wedding advice, but find an outfit that matches the dress code, give me five specific options, book hotels with buffer days around the event, and handle the logistics.
(10:19):
The AI went out, browsed multiple websites, compared options, and came back with actionable results.
No way. It actually made the bookings.
Well, here's where it gets interesting. And, where OpenAI was smart about safety. For sensitive actions like purchases or sending emails, it prompts you to log in and take over.
So it does the legwork of research and navigation, but you maintain control over the final actions.
(10:43):
That's actually reassuring. But I'm curious about the technical side. How is this different from just having a really good web scraper?
This is where it gets wild, Alex. It's not scraping or it's actually seeing and interacting with web pages like a human would.
It takes screenshots, recognizes buttons and forms, understands visual layouts, and can adapt when websites change.
(11:06):
Think about how often you've had a script break because a website updated their interface. This agent can literally see the new layout and figure out how to navigate it.
Okay, that's legitimately impressive. But I have to ask, is this actually useful for regular people? Or is this just a tech demo that sounds cool?
You know what convinced me this is real? The mundane examples. Sure, they showed off the wedding planning, but they also demonstrated things like,
(11:32):
look at my calendar and brief me on upcoming client meetings based on recent news. That's not flashy, but it's exactly the kind of task that eats up 30 minutes of your morning.
Oh, that reminds me of something. Didn't Google announce something similar with their Jarvis project?
They did. But let me back up and give you the full picture of what's happening in this space. Because it's actually quite fascinating how different companies are approaching this problem.
(11:56):
OpenAI didn't just wake up and build Agent Mode overnight. This is actually the evolution of their standalone tool called Operator.
Right. I remember hearing about Operator. That was the thing that could browse the web and click buttons, wasn't it?
Exactly. Operator was OpenAI's first real attempt at an AI agent. It launched as a research preview earlier this year, but it was limited.
(12:18):
It could navigate websites, click buttons, fill out forms, but it couldn't do deep analysis or write detailed reports. It was basically really good at following instructions on websites,
but not so great at thinking through complex problems.
So ChatGPT Agent is essentially operator on steroids.
More like operator plus deep research plus ChatGPT's conversational abilities all rolled into one.
(12:41):
But here's what's interesting. Each company is taking a fundamentally different approach to this challenge.
Google's Jarvis, for example, is being designed as a browser-based agent that can handle tasks like making restaurant reservations and buying event tickets.
From what we know, it's more focused on specific, well-defined tasks.
How does that compare to what, say, Anthropik is doing with Claude?
(13:05):
Great question. Anthropik actually beat everyone to the punch in some ways.
They released Claude's computer use capabilities months ago, and it can literally take control of your desktop.
Not just browse the web, but interact with any application on your computer.
It can see your screen through screenshots and control your mouse and keyboard.
Wait, that sounds more advanced than what OpenAI just released.
(13:27):
In some ways it is. But here's the trade-off. Claude's approach is more powerful but also more complex to set up and use.
You need to give it access to your entire desktop, which raises significant security concerns.
OpenAI's approach keeps everything contained within their virtual environment, which is safer but potentially more limited.
That's a really important distinction.
(13:49):
So we're seeing different philosophies here, broad access versus controlled environments.
Exactly. And then you have Microsoft taking yet another approach with their co-pilot agents.
They're focusing heavily on workplace integration, agents that can send emails, manage records, and work within Microsoft's ecosystem.
It's less about general web browsing and more about automating business workflows.
(14:10):
And where does Apple fit into all this?
Apple's playing the long game with Siri.
The enhanced Siri capabilities coming with Apple Intelligence will let it control apps and execute more complex actions on your iPhone.
But it's very much focused on the mobile experience and Apple's walled garden approach.
It's not trying to be a general-purpose web agent like these others.
So we've got Google focusing on specific tasks, Anthropic going for full desktop control, Microsoft targeting business workflows,
(14:37):
Apple optimizing for mobile, and OpenAI trying to find the sweet spot in the middle?
That's a really good way to frame it.
And what's fascinating is that OpenAI's agent mode actually learned from the limitations of their earlier operator tool.
Operator was powerful but felt disconnected from the chat GPT experience.
Users had to go to a separate website, the conversations didn't carry over,
(15:00):
and it couldn't leverage chat GPT's reasoning capabilities for complex analysis.
So agent mode is essentially what operator should have been from the beginning?
In many ways, yes. By integrating everything into chat GPT, they've created something that feels more natural and conversational.
Instead of having to think, do I need the browsing agent or the research agent or the chat agent,
(15:21):
you just talk to chat GPT and it figures out what tools it needs to use.
But I'm curious about the technical implementation here.
How does OpenAI's virtual computer approach compare to say, what Anthropic is doing with direct desktop access?
That's where the safety versus capability trade-off becomes really clear.
Anthropic's approach is like giving someone the keys to your actual computer.
(15:45):
They can do anything you can do, but that also means they could potentially cause real damage if something goes wrong.
OpenAI's virtual computer is more like giving someone access to a sandbox that looks like the real web,
but can't directly affect your personal files or other applications.
And I imagine that makes it easier for businesses to adopt too.
(16:06):
Absolutely. IT departments are much more comfortable with the idea of AI agents that operate in controlled environments
rather than having direct access to corporate systems.
But here's the thing. I think we're going to see convergence over time.
The company's taking the more cautious approach will gradually add more capabilities,
while the ones starting with broader access will add more safety controls.
(16:27):
Speaking of safety, how are the different vendors handling that challenge?
Because this seems like it could go wrong in so many ways.
Each company is taking a different approach to safety, which is actually really instructive.
OpenAI has built multiple layers.
They've got prompt injection defenses, monitoring systems that watch for suspicious behavior,
(16:48):
and they require human approval for high-risk actions.
Anthropic focuses heavily on constitutional AI principles and has built-in refusal mechanisms.
Google and Microsoft are leveraging their existing enterprise security infrastructure.
It sounds like we're in this experimental phase where everyone's trying different approaches to see what works.
(17:09):
Exactly, and that's actually healthy for the industry.
Rather than everyone converging on one solution too quickly,
we're getting to see multiple approaches tested in the real world.
Some will prove more successful than others,
and the winners will likely incorporate the best ideas from all the different approaches.
Wait, so I could theoretically try this today?
You're a subscription, absolutely. You just select Agent Mode from the drop-down menu in chat GPT and start giving it tasks.
(17:35):
But, Alex, this is where we need to talk about the elephant in the room.
What happens when AI can actually act on our behalf?
You're thinking about the security implications, aren't you?
Exactly. OpenAI classified this as high biological and chemical capability in their safety assessment,
which triggered intensive safety checks.
They've built in safeguards against prompt injection attacks, malicious websites trying to hijack the agent,
(18:00):
and they have monitoring systems watching for suspicious behavior.
But still, this is an AI that can access your personal information and take actions on websites where you're logged in.
That's both exciting and terrifying. How are they handling that balance?
They've been pretty thoughtful about it.
The agent is trained to refuse ambiguous or potentially harmful instructions.
(18:23):
It won't make purchases without explicit approval.
It won't send emails on your behalf without confirmation, and it alerts you when it encounters something sensitive.
Plus, you can delete all browsing data and log out of all sites with one click.
But here's what I'm really curious about. Is this what the future of computing looks like?
Are we moving toward a world where we just tell our computers what we want done instead of manually clicking through interfaces?
(18:51):
I think that's exactly where we're headed, and chatGPT agent feels like the first real glimpse of that future.
Think about it. Instead of opening five browser tabs, comparing prices, reading reviews, and making spreadsheets,
you could just say, find me the best laptop under 1500 for video editing, and have the AI do all the legwork.
Okay, but there's something that's been bothering me about all these AI agents.
(19:16):
Are we making ourselves less capable? Like, if AI can do all this research and decision making, what happens to our own problem-solving skills?
That's such an important question, Alex. I think it depends on how we use these tools.
If we're using them to handle the tedious, repetitive research so we can focus on the creative and strategic thinking,
that could actually enhance capabilities. But if we're outsourcing our decision-making entirely, yeah, that's concerning.
(19:43):
Right. It's like the difference between having a research assistant and having someone else make all your choices for you.
Exactly, and I think that's why OpenAI's approach of keeping humans in the loop for major decisions is smart.
The AI can gather information and present options, but you're still making the final calls.
So where does this go next? Because if this is just the beginning, I'm trying to imagine what chatGPT agent looks like in six months or a year.
(20:09):
Well, OpenAI said this is just the foundation, and they'll be adding improvements regularly.
But here's what really has me excited. They're planning to expose the underlying model called CUA or Computer Using Agent through their API.
That means developers could build their own specialized agents for specific industries or tasks.
Oh wow, so we could see agents specifically designed for, say, medical research or financial analysis?
(20:35):
Exactly. Imagine an agent trained specifically for academic research that knows how to navigate scientific databases,
or one designed for real estate that understands property listings and market data. The possibilities are pretty staggering.
But let's bring this back down to earth for our listeners. If someone wanted to start experimenting with this technology today, what would you recommend?
(20:58):
Start small and specific. Don't try to automate your entire workflow on day one.
Pick one repetitive task that takes you 15-20 minutes regularly. Maybe researching restaurants for a dinner reservation or comparing product specifications.
Give the agent that task and see how it performs.
And I'm guessing you'd recommend being cautious about what information you give it access to.
(21:19):
Absolutely. Follow the principle of least privilege. Only give it access to what it needs for this specific task.
If you're asking it to research vacation destinations, it doesn't need access to your email or financial accounts.
This has been fascinating, Yakov. Any final thoughts on what ChatGPT agent means for the broader AI landscape?
I think July 17th, 2025 might be one of those dates we look back on as a turning point. Not because ChatGPT agent is perfect, it's not,
(21:48):
but because it's the first time an AI agent that can actually take actions has been available to millions of users.
We're about to get real-world data on how people actually want to use these tools.
And that real-world testing is going to drive the next wave of innovation.
Exactly. The demos are one thing, but seeing how people actually integrate a gentic AI into their daily workflows, that's going to teach us things we never anticipated.
(22:11):
Well, I know what I'm doing after we finish recording. I'm going to see if ChatGPT agent can plan a better lunch than I usually manage to throw together.
Ha! Let me know how it goes. Just maybe don't let it order for you on the first try.
That's probably wise advice for anyone diving into the age of AI agents. Thanks for breaking this down with us, Yakov, and thanks to everyone for listening to Innovation Pulse.
(22:35):
Remember, the future isn't just knocking. It's actively browsing the web and taking screenshots.
And it's available right now. Wild times indeed.
Until next time, keep your pulse on innovation.
We've explored the exciting launch of OpenAI's ChatGPT agent and its implications for human-computer interaction, alongside developments in AI from companies like NVIDIA, Meta, and Moonshot AI.
(23:08):
As AI continues to shape industries, remember to stay informed and engaged.
Don't forget to like, subscribe, and share this episode with your friends and colleagues, so they can also stay updated on the latest news and gain powerful insights.
Stay tuned for more updates.