12/06/2024 | News release | Distributed by Public on 12/05/2024 19:48
Updated: December 06, 2024
Published: December 12, 2023
Shortly after news spread that Google was pushing back the release of its long awaited AI model called Gemini, Google announced its launch.
As part of the release, they published a demo showcasing impressive - downright unbelievable - capabilities from Gemini. Well, you know what they say about things being too good to be true.
Let's dig into what went wrong with the demo and how it compares to OpenAI.
Rivaling OpenAI's GPT-4, Gemini is a multimodal AI model, meaning it can process text, image, audio and code inputs.
(For a long time, ChatGPT was unimodal, only processing text, until it graduated to multimodality this year.)
Gemini comes in three versions:
Ultra isn't yet available to consumers, with a rollout scheduled for early 2024, as Google runs final tests to ensure it's safe for commercial use. Gemini Nano will power Google's Pixel 8 Pro phone, which has AI features built in.
Gemini Pro, on the other hand, will power Google tools like Bard starting today and is accessible via API through Google AI Studio and Google Cloud Vertex AI.
New research into how marketers are using AI and key insights into the future of marketing.
All fields are required.
Google published a six-minute YouTube demo showcasing Gemini's skills in language, game creation, logic and spatial reasoning, cultural understanding, and more.
If you watch the video, it's easy to be wowed.
Gemini is able to recognize a duck from a simple drawing, understand a sleight of hand trick, and complete visual puzzles - to name a few tasks.
However, after earning over 2 million views, a Bloomberg report revealed that the video was cut and stitched together that inflated Gemini's performance.
Google did share a disclaimer at the beginning of the video: "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity."
However, Bloomberg points out they left out a few important details:
The way Gemini actually processed inputs in the demo was through still images and written prompts.
It's like when you're showing everyone your dog's best trick.
You share the video via text and everyone's impressed. But when everyone's over, they see it actually takes a whole bunch of treats and petting and patience and repeating yourself 100 times to see this trick in action.
Let's do some side-by-side comparison.
In this 8-second clip, we see a person's hand gesturing as if they're playing the game used to settle all friendly disputes. Gemini responds, "I know what you're doing. You're playing rock-paper-scissors."
But what actually happened behind the scenes involves a lot more spoon feeding.
In the real demo, the user submitted each hand gesture individually and asked Gemini to describe what it saw.
From there, the user combined all three images, asked Gemini again and included a huge hint.
While it's still impressive how Gemini is able to process images and understand context, the video downplays how much steering is required for Gemini to generate the right answer.
Although this has gotten Google a lot of criticism, some point out that it's not uncommon for companies to use editing to create more seamless, idealistic use cases in their demos.
Thus far, GPT-4, created by OpenAI, has been the most powerful AI model out on the market. Since then, Google and other AI players have been hard at work coming up with a model that can beat it.
Google first teased Gemini in September, suggesting that it would beat out GPT-4 and technically, it delivered.
Gemini outperforms GPT-4 in a number of benchmarks set by AI researchers.
However, the Bloomberg article points out something important.
For a model that took this long to release, the fact that it's only marginally better than GPT-4 is not the huge win Google was aiming for.
OpenAI released GPT-4 in March. Google now releases Gemini, which outperforms but only by a few percentage points.
So, how long will it take for OpenAI to release an even bigger and better version? Judging by the last year, it probably won't be long.
For now, Gemini seems to be the better option but that won't be clear until early 2024 when Ultra rolls out.