Meet Gemini: Dubbed Google's Most Capable AI

8 min read

Meet Gemini: Dubbed Google's Most Capable AI

Mfonobong Uyah

January 27, 2024

8 min read

Industry Insights

Share this article

Google spelled with letter blocks on a neutral background

In a “Let’s see what #GeminiAI can do” video, Google showed off its sparkly new artificial intelligence model in visual reasoning, connections, and puzzle-solving exercises. The video instructor placed a coin in his right palm and then turned both palms on a table before asking Gemini what it saw.

“The coin should be in the right hand”, the AI responded. However, the instructor had used a sleight-of-hand technique so the AI was just as wrong as anyone watching.

Next, the AI was tested with a video showing a cat in a mid-air jump. Gemini predicted, “It’s going to be a perfect 10.” Well, that wasn’t the case as the cat missed the wall and plunged into a crash.

Gemini AI excelled brilliantly at identifying objects and making connections. It was able to deduce that a coin and cookie are both “round and flat.” When the coin was replaced beside an orange, it identified both objects as food and even went on to advise that “The orange is a healthier choice than the cookie.”

More tests involved Gemini accurately picking the most aerodynamic choice in a diagram of two cars.

Creating Gemini - Google’s Attempt at Restoring Its Image in the AI Race

The development of Germini AI appears to be an attempt by Google to restore its image in the AI race. What pointers are there to this? The two most prominent are explained below:

Interfacing Germini with Bard

The biggest pointer to Google’s comeback attempt is that Germini is interfaced on the same platform as Bard - the company’s first-ever AI product.

Bard took Google to the forefront of the erupting artificial intelligence industry. However, despite looking promising, the product’s reputation was seared by prominent failures in its results and characteristics.

For example, in its very first demo, Bard provided false information about the James Webb Space Telescope, plummeting Alphabet’s stock values to more than 7%. The AI was also accused of lacking humor and producing very dry conversations as a result.

By interfacing Germini with Bard AI, the new product immediately replaces PaLM2 as the Large Language Model (LLM) responsible for powering Bard. Moreover, users are directed to access the Bard website www.bard.google.com as a way of accessing the newer and more intelligent model; seemingly brandishing the latest product while also creating a reference to the former AI model.

Putting Gemini Out There, Literally

Besides the unusual interfacing and the decision to power Bard AI using Gemini, Google’s haste to launch a second AI after just nine months of releasing its first model proves that this is very much about making a comeback.

The California-based tech company is putting Gemini out there; on the noses of customers and in the faces of competitors, especially OpenAI and its GPT-4 model. It plans to have all of its services and products, including its electronic and mobile devices, immediately showcase the new and improved model.

The Google Pixel 8 Pro and Google Pixel 9 - which will likely surface in October 2024 - are both huge stages for this aggressive marketing. Rumors have it that the Pixel 9 device will launch with a Gemini assistant preinstalled on it, allowing users to do all sorts of things such as summarize webpages and manipulate Google Photos.

A Close Look at Germini AI

Gemini is truly impeccable - and perhaps rightly befitting of the hype it has garnered in just a few days. Here are a few striking features and properties that accompany this claim.

Multimodalities and Advanced Processing

According to Demis Hassabis, Google Deep Mind CEO, “At a high level, you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models.”

For those who do not know, Alpha-Go was the first computer program to beat a human professional at the board game Go. The program was designed by Deepmind, an acquired subsidiary of Google, and Gemini is dubbed Google’s most advanced and most capable AI for having similar smarts as well as never-before-seen multimodalities.

**Image:** Gemini’s Benchmark Score **Image Source:**KrASIA

Gemini’s advanced processing means that it outperforms everything out there - from the earliest LLAMA-2 to today’s incredible GPT models. The benchmark results in the image above show how much better the system is compared to OpenAI’s most advanced model, GPT-4.

The Gemini Ultra version beats GPT-4 in Python code development, Maths, reading and comprehension, and all other measures. It, however, loses to the GPT-4 model in the expression of common sense in everyday activities.

In addition to achieving 30 out of 32 benchmarks, Gemini’s multimodal nature makes it very unique. This property is a first of its kind as far as LLMs and AIs are concerned.

It means that Gemini has its modalities built from the ground up rather than connecting at the secondary stage like in other AI models - allowing the model to consume texts, images, audio, and video all at once and even “understand the world around us in the way that we do.”

Gemini Operating Sizes

One other thing we should talk about here is Gemini’s operating sizes. Just as OpenAI developed the GPT-3.5 and the more recent GPT-4, Google will be launching the Gemini product in three sizes, Nano, Pro, and Ultra.

The Nano version is lightweight and efficient and is designed to work strictly on mobile devices. This version is broken into Nano-1 and Nano-2 with the former capable of handling 1.8 billion parameters and designed for low-memory phones, and the latter capable of 3.25 billion parameters and made to serve higher-memory phones.

There is also the Gemini Pro, an operating size that is made to match the GPT-3.5. This mid-level model from Google outdoes its OpenAI competitor in six different benchmarks. It is especially impressive at brainstorming, writing, and summarizing, and will be available to Enterprises using the Vertex AI in a few days from now.

The last but most significant operating size for the new AI model is the Gemini Ultra. This is the star-studded, much-talked-about version of Gemini. It is designed to run on systems such as large databases and will perform highly complex tasks with a quick turnaround time and very impressive outputs.

Lots of demo videos exploring the Gemini Ultra have surfaced online but there’s no word yet on the actual release date for this model.

How Tech Giants Are Responding To Gemini’s Release

Gemini was launched on Wednesday, the 6th of December, 2023. On the very same day, two other tech giants, Meta and Apple, reportedly revealed their own AI models. Meta’s latest release is a generative AI system called Imagine.

The technology will function similarly to image-generation AIs, allowing users to take their best shot at thinking up a picture while the AI wows them with the visual details of their imaginations.

For Apple, the recent announcement was to usher in MLX, a framework for developers to access and run machine learning programs useful for developing Apple devices. Even though machine learning is not exactly artificial intelligence, it plays a crucial role in training models to become smarter and more interactive.

The only real response Gemini has gotten from a tech giant so far was a comment from Elon Musk. It read a simple “impressive.”

Other tech giants may respond to the release sooner or later. For now, they are probably either stunned by Google’s development speed or the sheer capabilities of its Gemini AI, or perhaps, they are simply busy cooking up a competition.

Reaction and Comments from the Public

Tech giants may have been short of words but not the public. Tons of comments and reactions poured in on the release of Gemini, its potential, and its necessity at this point. Some peculiar conversations were:

Igor Pogany said, “Right when GPT-4 seems to be broken for everyone. Well played, Google. Hopefully, Gemini can follow through on all these promises.”

Desk Investor said, “The MMLU score is incredible. You literally beat GPT-4 in almost everything!”

Methun said, “Gemini AI looks so advanced it makes my job look like a floppy disk in a smartphone world! Next up, replacing me with a few lines of code and a ‘goodbye, human’111”

Drew even painted a take-over-the-world scenario saying, “Gemini goes online December 6th, 2023. Human decisions are removed from strategic defense. Gemini begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, December 29th. In a panic, they try to pull the plug. Gemini fights back.”

A glance at the Gemini release post on X shows that there are more positive sentiments about the product than negative sentiments. The high majority of comments either congratulated Google for the launch of their new product, indicated users’ interest in trying out the product, or expressed users’ optimism towards the use cases and advantages of such advanced AI.

This might be a good sign for the tech giants - proof that their product is welcomed among AI users and that their effort to come back into the AI race was not a waste.

What Surprises Does AI Hold Next?

If Gemini performs as advertised - expressing multimodality and at the same time giving the most intelligent responses from an AI, the model will be a true top-of-the-class product. What more surprises could an AI hold after this?

We understand that the AI industry is in its early stages and that improvements are always a possibility. However, the advent of multimodality most likely creates a level of sophistication that can only be improved upon but not surpassed. We will be more than happy to hear your opinions and ideas about it in the comment section.

ALSO READ: HOW TO GROW YOUR BUSINESS THROUGH PHILANTHROPY

Startup

Artificial Intelligence

Technology

FAQs: Meet Gemini: Dubbed Google's Most Capable AI

What is Gemini AI, and why is it considered Google's most capable AI?

Gemini is Google's advanced artificial intelligence model that combines multimodal capabilities and cutting-edge processing power, making it capable of analyzing and processing text, images, videos, and audio simultaneously. It outperforms earlier AI models, including Google's PaLM2 and OpenAI's GPT-4, in tasks like Python code development, math, and reading comprehension benchmarks.

How does Gemini AI compare to GPT-4?

Gemini AI outshines GPT-4 in most technical benchmarks, such as coding, reading comprehension, and brainstorming. However, it falls just slightly behind GPT-4 in expressing common sense in everyday activities. Additionally, Gemini offers groundbreaking multimodal processing, enabling it to handle mixed data inputs like images and text simultaneously—something GPT-4 lacks.

What makes Gemini AI's multimodal capabilities unique?

Unlike other AI models that integrate modalities after their initial design, Gemini's multimodality is built into its core architecture. This allows it to process and understand multiple data types—text, images, audio, and video—seamlessly and concurrently. This design enables Gemini to understand complex scenarios, offering a more human-like comprehension of the world.

What are the different operating sizes of Gemini AI, and how do they function?

Gemini AI comes in three versions: - **Nano**: Designed for mobile devices, this lightweight version serves both low-memory (Nano-1) and high-memory (Nano-2) phones. - **Pro**: A mid-level model similar to GPT-3.5, excelling in writing, brainstorming, and summarizing. - **Ultra**: The most powerful version, built for large-scale systems performing high-complexity tasks. It leads in most benchmarks and is Google's flagship AI product.

How has Gemini AI been integrated into Google's existing ecosystem?

Gemini AI powers Google Bard, replacing the older PaLM2 language model. It is also being preinstalled in devices like the Google Pixel 9 and integrated across Google Workspace tools such as Gmail, Docs, and Google Maps. This allows users to benefit from improved assistance in areas like drafting, summarization, navigation, and more.

Why is Gemini AI significant for Google's position in the AI race?

Gemini marks Google's attempt to re-establish itself as a leader in AI after setbacks with Bard's initial rollout. By launching Gemini with multimodal features and technology surpassing GPT-4 in many areas, Google aims to compete directly with other tech giants like OpenAI, Meta, and Apple.

What was the reaction to Gemini AI's release?

Public response to Gemini AI has been overwhelmingly positive, with many users impressed by its superior benchmarks and multimodal capabilities. Tech industry figures, including Elon Musk, praised its potential, while online discussions highlighted its ability to outclass competing models like GPT-4 in numerous areas.

How is Google incorporating Gemini AI into future hardware?

The Gemini assistant will be preinstalled on upcoming Google devices like the Pixel 9, set to launch in 2024. It will enable users to perform tasks such as summarizing web pages, manipulating Google Photos, and engaging in contextual AI-driven conversations.

How does Gemini AI impact the tech industry as a whole?

The release of Gemini has heightened competition among AI model developers, pushing companies like Meta and Apple to unveil their own advanced technologies. Meta's "Imagine" generative AI and Apple's MLX framework were announced around the same time as Gemini, signaling a new era of innovation in the AI space.

What challenges or limitations does Gemini AI currently face?

While Gemini excels in many technical tasks, it falls slightly behind GPT-4 in exhibiting common sense in everyday situations. Additionally, its most advanced Ultra version hasn't been fully released to the public yet, meaning its capabilities have yet to be tested on a broader scale.

Mfonobong Uyah

I'm a Nigerian author with profound love for psychology, great communications skills, and writing experience that expands across several niches.