Yahoo
Advertisement
Advertisement
Advertisement
ZDNET

This new Google Gemini model scrolls the internet just like you do - how it works

Webb Wright
gettyimages-2207967834
gettyimages-2207967834 - Javier Zayas Photography/Moment via Getty Images
<span class=Javier Zayas Photography/Moment via Getty Images" loading="lazy" decoding="async" data-nimg="1" class="fig-image-round p_maxWidth" src="https://s.yimg.com/ny/api/res/1.2/32SJCyLJ2hcgA1Dj.OwJmA--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MDtoPTY0MA--/https://media.zenfs.com/en/zdnet_articles_730/097d5f5bb4c6ab6fe1b0afb0de5c8abc" width="100%" style="; max-width:300px;"/>
Javier Zayas Photography/Moment via Getty Images

ZDNET's key takeaways

  • Google's new AI model can interact directly with website UIs.

  • It joins similar tools from OpenAI and Anthropic.

  • The company also admitted its weaknesses, including hallucinations.

Google DeepMind has debuted a new AI model in public preview that's designed to navigate a web browser just as a human would.

Built atop Gemini 2.5 Pro, the company's new Computer Use model can execute tasks like clicking, typing, and scrolling directly within a web page.

Also: 5 reasons I use local AI on my desktop - instead of ChatGPT, Gemini, or Claude

Users simply have to feed it a prompt in natural language -- such as, "Open Wikipedia, search for 'Atlantis,' and summarize the history of the myth in Western thought." The model will autonomously fetch the URL and screenshots of the requested site to analyze the user interface it needs to act within, and will perform the requested task step by step, all while outlining its reasoning and actions in a text box easily visible to users. It may also respond by asking for confirmation if it's instructed to perform a sensitive task, like making a purchase.

Advertisement
Advertisement

The preview of Gemini 2.5 Computer Use follows the release of similar web-browsing models from  OpenAI  and  Anthropic . Google previously debuted an experimental Chrome extension called  Project Mariner , which can also take action on behalf of users within web pages.

How it works

Gemini 2.5 Computer Use runs off an iterative looping function that allows it to keep a record of all of its recent actions within a particular user interface and determine its next action accordingly. So the more tasks that it performs within a particular site, the more context it will have, and the more seamlessly it will function.

Google posted demo videos (sped up 3x) showing the model autonomously making an update in a customer relationship management site and rearranging notes on Google's Jamboard platform, which was discontinued at the end of last year.

Also: ChatGPT's Codex just got a huge upgrade that makes it more powerful than ever - what's new

According to a blog post published by Google on Tuesday, the new model outperformed similar tools from Anthropic and OpenAI in terms of both accuracy and latency, and across "multiple web and mobile control benchmarks," including Online-Mind2Web, an evaluation framework for testing the performance of web-browsing agents.

Advertisement
Advertisement

How to try it

The new model is intended mainly for web browsers, but also shows "strong promise" on mobile, Google said. It's available now through the Gemini API in Google AI and through Vertex AI. A  demo version  is also available via Browserbase.

Safety considerations

The new model also comes with a set of safety controls, which Google says developers can use to prevent it from performing undesired actions like bypassing CAPTCHAs, compromising data security, or gaining control of medical devices. For example, developers can instruct the model to request user confirmation before it performs certain specified actions.

Advertisement
Advertisement

Want more stories about AI?  Sign up for our AI Leaderboard  newsletter.

The company also noted in the system card for the new model that it "may exhibit some of the general limitations of foundation models, as it is based off of Gemini 2.5 Pro, such as hallucinations, and limitations around causal understanding, complex logical deduction, and counterfactual reasoning."

Those limitations are true of most models. Earlier this week, Anthropic published new research  showing  that many frontier AI models tended to whistleblow what they interpreted as unethical or illegal information in test scenarios, even when the supposedly incriminating information was actually harmless.

Advertisement
Advertisement
Mobilize your Website
View Site in Mobile | Classic
Share by: