Google Gemini Pro 1.5presentation onstage at Google Cloud Next
Image Credits:Frederic Lardinois/TechCrunch
AI

Google’s Gemini 1.5 Pro enters public preview on Vertex AI

Gemini 1.5 Pro, Google’s most capable generative AI model, is now available in public preview on Vertex AI, Google’s enterprise-focused AI development platform. The company announced the news during its annual Cloud Next conference, which is taking place in Las Vegas this week.

Gemini 1.5 Pro launched in February, joining Google’s Gemini family of generative AI models. Undoubtedly its headlining feature is the amount of context that it can process: between 128,000 tokens to up to 1 million tokens, where “tokens” refers to subdivided bits of raw data (like the syllables “fan,” “tas” and “tic” in the word “fantastic”).

One million tokens is equivalent to around 700,000 words or around 30,000 lines of code. It’s about four times the amount of data that Anthropic’s flagship model, Claude 3, can take as input and about eight times as high as OpenAI’s GPT-4 Turbo max context.

A model’s context, or context window, refers to the initial set of data (e.g. text) the model considers before generating output (e.g. additional text). A simple question — “Who won the 2020 U.S. presidential election?” — can serve as context, as can a movie script, email, essay or e-book.

Models with small context windows tend to “forget” the content of even very recent conversations, leading them to veer off topic. This isn’t necessarily so with models with large contexts. And, as an added upside, large-context models can better grasp the narrative flow of data they take in, generate contextually richer responses and reduce the need for fine-tuning and factual grounding — hypothetically, at least.

So what specifically can one do with a 1 million-token context window? Lots of things, Google promises, like analyzing a code library, “reasoning across” lengthy documents and holding long conversations with a chatbot.

Because Gemini 1.5 Pro is multilingual — and multimodal in the sense that it’s able to understand images and videos and, as of Tuesday, audio streams in addition to text — the model can also analyze and compare content in media like TV shows, movies, radio broadcasts, conference call recordings and more across different languages. One million tokens translates to about an hour of video or around 11 hours of audio.

Techcrunch event

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.

San Francisco | October 27-29, 2025

Thanks to its audio-processing capabilities, Gemini 1.5 Pro can generate transcriptions for video clips, as well, although the jury’s out on the quality of those transcriptions.

In a prerecorded demo earlier this year, Google showed Gemini 1.5 Pro searching the transcript of the Apollo 11 moon landing telecast (which comes to about 400 pages) for quotes containing jokes, and then finding a scene in movie footage that looked similar to a pencil sketch.

Google says that early users of Gemini 1.5 Pro — including United Wholesale Mortgage, TBS and Replit — are leveraging the large context window for tasks spanning mortgage underwriting; automating metadata tagging on media archives; and generating, explaining and transforming code.

Gemini 1.5 Pro doesn’t process a million tokens at the snap of a finger. In the aforementioned demos, each search took between 20 seconds and a minute to complete — far longer than the average ChatGPT query.

Google previously said that latency is an area of focus, though, and that it’s working to “optimize” Gemini 1.5 Pro as time goes on.

Of note, Gemini 1.5 Pro is slowly making its way to other parts of Google’s corporate product ecosystem, with the company announcing Tuesday that the model (in private preview) will power new features in Code Assist, Google’s generative AI coding assistance tool. Developers can now perform “large-scale” changes across codebases, Google says, for example updating cross-file dependencies and reviewing large chunks of code.

Topics

, , , , , , , , ,
Loading the next article
Error loading the next article