At its Google I/O developer conference in Mountain View, California, on Tuesday, Google unveiled a series of generative artificial intelligence (AI) products, including the Gemini Live assistant, updates to its Android and Workspaces platforms, and a revamped Search product.
These announcements are part of Google's broader strategy to reclaim its position as Silicon Valley's AI leader, following Microsoft's surprising partnership with OpenAI in 2022.
Additionally, Google aims to diversify beyond its core advertising business with new devices and AI-powered tools.
Emphasizing the importance of AI, Google CEO Sundar Pichai noted that the term "AI" was mentioned 120 times during the event, as counted by Google's AI platform, Gemini.
This flurry of updates follows OpenAI's recent launch of its latest AI system, GPT4o, which showcased advanced capabilities like reading human expressions via a phone camera and engaging in fluent, even flirtatious, conversations.
Google is clearly intent on demonstrating that its AI tools are equally proficient in this type of "multimodal" understanding.
In a clear demonstration of the competitive "anything you can do, I can do better" mindset, Google strategically previewed its AI systems running on a phone just before OpenAI's announcement.
You can watch recaps from the Google I/O Conference here.
Google Wants AI to be a Part of Everything You Do
During the keynote, Google demonstrated its vision of integrating AI into users' daily lives, showcasing how its AI products can assist with sharing information, interacting with others, finding objects around the house, making schedules, shopping, and using Android devices.
Google aims for its AI to become an integral part of everything users do.
Pichai introduced several new features powered by its latest AI model, Gemini 1.5 Pro.
One notable feature, called Ask Photos, enables users to search their photo library for specific insights, such as identifying when their daughter learned to swim or recalling their license plate number from saved images.
Pichai also showcased how Gemini 1.5 Pro can summarise recent emails from a child's school by analysing attachments and extracting key points and action items.
Two versions of the Gemini 1.5 Pro model were unveiled: Gemini 1.5 Pro Flash, a lightweight, fast, and cost-efficient iteration with multimodal capabilities and a 1M token context length, boasting an MMLU of 78.9% compared to the original model's 81.9%. The standard Gemini 1.5 Pro model now features a doubled context length of 2M tokens.
This new model is available via a waitlist for select developers through the API.
Throughout the presentation, Google executives highlighted other capabilities, such as the latest model's ability to "read" a textbook and transform it into an AI lecture with natural-sounding teachers who can answer questions.
AI Overviews: Revolutionising Search Results Generation
Last May, Pichai had announced the company's ambitious plan to reimagine all its products through AI.
However, given the risks associated with new generative AI technology, such as the potential for spreading false information, Google was initially cautious about integrating it into its search engine, which serves over two billion users and generated $175 billion in revenue last year.
At the conference, Pichai unveiled how the company's dedicated work on AI has now been incorporated into its search engine.
Starting this week, United States (US) users will experience a new feature, AI Overviews, previously known as the Search Generative Experience (SGE) which was announced at Google I/O 2023.
This feature generates information summaries above traditional search results, and it will soon be available to users worldwide.
By the end of the year, over a billion people are expected to have access to this technology.
Liz Reid, Google's newly installed head of Search, said:
“What we see with generative AI is that Google can do more of the searching for you. It can take a bunch of the hard work out of searching, so you can focus on the parts you want to do to get things done, or on the parts of exploring that you find exciting."
So how does AI Overviews work?
Google's new experience integrates generative AI with search results to provide AI-generated summaries and answers based on live information.
Powered by the Gemini AI model, this enhancement will present AI Overviews for many queries when the system identifies that generative AI could be helpful.
These AI-generated summaries will appear above traditional search results, pushing them further down the page.
Typically, AI Overviews display a few relevant links per query, but they only become fully visible after expanding the response.
Google compares AI Overviews to features like Knowledge Panels or Featured Snippets, and they cannot be completely disabled.
However, Google will introduce a "web" filter in Search to bypass AI responses and show only traditional links.
A major concern about Google's AI-enhanced Search is its impact on websites that rely heavily on search traffic.
A major concern is that AI Overviews could potentially intensify worries among web publishers about reduced traffic from Google Search, exacerbating challenges within an industry already strained by conflicts with other tech platforms.
On Google, users will encounter longer summaries on various topics, potentially diminishing the need to visit external websites.
Some estimates suggest that websites could lose up to 25% of their traffic over the next few years due to this change, compounding recent declines caused by Search algorithms.
However, Google asserts that the links included in AI Overviews receive more clicks than those in traditional search results.
The company emphasizes its commitment to directing traffic to publishers and creators as AI Overviews reach more users.
In a recent blog post, Reid revealed that the links featured in AI Overviews receive more clicks from users compared to when they are presented as traditional search results.
Reid added:
“We'll continue to focus on sending valuable traffic to publishers and creators.”
Additionally, Google has announced new features to be tested with Labs participants in Search.
These features include options to refine AI Overviews by simplifying language, enabling multi-step reasoning for complex queries, providing planning capabilities, organising search results with AI, and incorporating video as part of search prompts.
Google hints that these developments are just the beginning of its efforts to reimagine Google Search, with more innovations on the horizon.
Your Very Own Personalised AI Assistant: Gemini Live
Google's latest unveiling also includes Gemini Live, a personalised AI assistant poised to revolutionise user interactions.
Powered by Google's advanced Gemini 1.5 Pro model, Gemini Live offers users the ability to engage with a chatbot through voice commands, with responses delivered in natural-sounding voices.
What sets this apart is the chatbot's adaptability, allowing users to interrupt and ask clarifying questions mid-conversation.
Amar Subramanya, Google's vice president of engineering for Gemini experiences, shared insights into the transformative potential of Gemini Live during an interview with Yahoo Finance.
Subramanya revealed his personal utilisation of Gemini Live for brainstorming sessions and idea exchanges, showcasing the assistant's versatility in aiding creative processes.
Early testers have also explored Gemini Live's capabilities, leveraging it for tasks such as translation with promising results.
Looking ahead, Google plans to integrate camera access into Gemini Live, empowering the assistant to interact with real-world environments and objects—a feature reminiscent of OpenAI's GPT4o demonstrations.
Subramanya recounted a scenario where he tasked the assistant with sourcing a pineapple upside-down cake recipe for a gathering of 15 people and seamlessly adding the ingredients to his Keep shopping list.
The assistant adeptly adjusted a recipe meant for eight individuals, scaled the proportions accordingly, and efficiently compiled the necessary items for Subramanya's convenience.
Additionally on the Android front, Google is extending its assistant's reach to popular apps like Google Messages and Gmail, enhancing user productivity by enabling tasks like inserting Gemini-generated images into messages.
Google's Gemini Nano boasts the ability to identify potential phone scammers during conversations.
This feature operates by detecting specific conversation patterns commonly linked with fraudulent activities.
Remarkably, all scam detection processing occurs locally on your device, ensuring privacy as conversations remain confined to your phone without being uploaded to the web.
Google's DeepMind AI Lab's Project Astra
Google briefly unveiled Project Astra, a creation of its DeepMind AI lab, poised to revolutionise everyday life by harnessing phone cameras to interpret real-world information.
This endeavour promises to identify objects and even locate misplaced items, hinting at a future integration with augmented reality glasses.
Demis Hassabis, chief executive of DeepMind, detailed in a blog post that select capabilities of Project Astra will be accessible to Gemini chatbot users this year.
Powered by Gemini, this project offers real-time support across audio, text, video, and image formats.
Despite being presented as a prototype, Astra's potential was showcased through pre-recorded videos, as it remains unavailable to all users.
Early testers noted a longer latency and perceived limitations in emotional intelligence and tone compared to GPT4o.
However, Astra exhibits strong text-to-speech capabilities and potentially superior support for ongoing video and long-context interactions.
Veo is OpenAI's Sora Competitor
Next in line for Google is Veo, its latest AI model designed to produce high-definition videos from simple text inputs, akin to OpenAI's Sora system.
#Google's AI model 'Veo' can be used by creators, says James Manyika, Senior VP, Google on the growing possibilities of using gen AI. The company unveiled Veo as its most advanced video generation model at Google I/O 2024 conference.@Google@AshmitTejKumar#GoogleIO#AI#Veopic.twitter.com/2WcOS1YDNN
— CNBC-TV18 (@CNBCTV18News) May 15, 2024
a
This technology marks a significant advancement in video generation capabilities, promising creators the ability to preview Veo and join a waitlist for access.
Anticipation mounts as Google plans to integrate Veo's functionalities into YouTube Shorts and other platforms later this year.
Veo, developed by Google DeepMind, boasts impressive features:
-It delivers videos in stunning 1080p resolution.
-Videos can extend beyond a minute, offering flexibility in content creation.
-Veo offers a diverse array of cinematic and visual styles to suit various preferences.
This versatile model can animate images or edit videos based on textual prompts, with support for masked editing, enabling targeted modifications within videos.
Google has enhanced Veo's training data by enriching video captions with additional details.
Furthermore, Veo leverages compressed representations of video, known as latents, to enhance performance, generation speed, and efficiency.
Google Announced a Slew of Other AI Features
The 2-hour session brimmed with a wealth of product updates and announcements spanning the Google ecosystem, showcasing enhancements across Search, Workspace, Photos, Android, and beyond.
Notably, Imagen 3, their cutting-edge image generation model, will soon debut in multiple iterations tailored for diverse tasks, from rapid sketching to producing high-resolution images.
Also Gemma 2 and PaliGemma, two new additions to the Gemma family, marks a significant stride in open-source models.
PaliGemma, Google's inaugural vision-language open-source model, is available now, while Gemma 2, boasting 27 billion parameters, surpasses its predecessor and launches in June.
Furthermore, the unveiling of Lyria, Google's music-generation tool, adds another dimension to their innovative offerings.
With over 15 project launches and product announcements, the event underscores Google's commitment to advancing technology across various domains.
Google's Path to AI Dominance Filled with Roadblocks & Rivals
In the eyes of analyst Jacob Bourne from Emarketer, the spotlight on AI at this year's Google developer conference comes as no surprise.
He said:
“By showcasing its latest models and how they'll power existing products with strong consumer reach, Google is demonstrating how it can effectively differentiate itself from rivals."
He views the reception of these new tools as a litmus test for Google's ability to adapt its search product to the evolving landscape of generative AI.
He added:
“To maintain its competitive edge and satisfy investors, Google will need to focus on translating its AI innovations into profitable products and services at scale.”
As the company expands its AI endeavours, it pledges to implement additional safeguards to mitigate potential misuse.
Moreover, Google underscores its commitment to refining the capabilities of its new models through partnerships with experts and institutions.
However, while Google has intensified its focus on AI over the past year, it has encountered notable hurdles along the way.
One such setback occurred last year when the introduction of its generative AI tool, initially named Bard and later rebranded as Gemini, led to a drop in the company's share price.
This decline followed a demo video showcasing the tool's production of factually inaccurate responses to inquiries about the James Webb Space Telescope.
More recently, in February, Google faced criticism on social media for Gemini's depiction of historically inaccurate images, predominantly featuring people of colour instead of individuals of White ethnicity.
In response, the company halted Gemini's capability to generate images of people.
Like other AI tools such as ChatGPT, Gemini draws from extensive datasets available online.
However, experts have consistently cautioned against the limitations and potential pitfalls associated with AI technologies, including inaccuracies, biases, and the dissemination of misinformation.
And with the mention of rivalry, ChatGPT emerged as a formidable contender in the tech industry upon its release in late 2022, sparking discussions about its potential threat to Google's dominant search engine, the go-to platform for online information retrieval.
In response, Google embarked on a determined journey to reclaim its supremacy in the realm of AI.
On a positive note, at Oppenheimer, analyst Jason Helfstein said in a report:
"Relative to OpenAI's limited product demo the day before, we believe Google demonstrated its strong competitive position, driven by an essentially unlimited R&D budget."
Evercore ISI analyst Mark Mahaney also said in a report:
"In our view, Google delivered in this year's I/O against the mounting hype and doubts. From this I/O, we also noticed a greater emphasis from Google on using gen AI to more tightly connect its services into one holistic experience. And, an emphasis on these new innovations being 'Only On Android'."
However, other tech giants are very close behind.
At its Build conference starting 20 May, Microsoft is expected to unveil enhancements to its AI-driven Copilot for the Microsoft 365 productivity suite.
Meanwhile, Apple is gearing up for its WWDC event on 10 June, where it plans to introduce a new iteration of its Siri voice assistant powered by generative AI.
As the battle for AI supremacy intensify, who will emerge victorious?
It seems like when one releases a" groundbreaking" innovation, another will be right on its tail.
So only time will tell, not so much who will emerge the winner but who will be left behind.