Google has introduced its next-generation artificial intelligence model, called Gemini 1.0. Gemini is what’s known as a multi-modal model, meaning it has the ability to understand and work with multiple data types, including text, computer code, audio files, images and video. Having multi-modal understanding allows Gemini to have more sophisticated reasoning, coding, content summarization and planning capabilities compared to AI systems trained only on text data.
Specifically, Google says benchmarks show Gemini versions like Gemini Ultra surpassing other leading AI language models such as GPT-4 in assessments of language mastery, mathematical reasoning, coding abilities and multimodal testing. This means Gemini represents a big leap forward in core AI competencies.
As part of the Gemini launch, Google is integrating an optimized version called Gemini Pro into its conversational AI service called Bard. Google states that adding Gemini Pro powers the biggest quality upgrade to Bard since Bard first launched to the public. The integration notably improves Bard’s abilities in areas like logical reasoning, writing coherence, summarizing content and more.
Over the course of 2023 and into 2024, Google plans to further incorporate different Gemini versions across more of its products and services, including search, Chrome, Advertising and others. This will enable more real-world applications to benefit from Gemini’s advanced AI capabilities going forward.