On August 1, Google's Gemini 1.5 Pro quietly launched and quickly made headlines by surpassing OpenAI's ChatGPT-4o in generative AI benchmarks. The new model, labelled as experimental, has become the top performer in the AI community, according to recent benchmark scores.
Benchmarking AI Models
OpenAI's ChatGPT has been a benchmark leader in generative AI since GPT-3. Its latest iteration, GPT-4o, along with Anthropic's Claude-3, had dominated most common benchmarks for the past year. One of the key tests, the LMSYS Chatbot Arena, evaluates AI models on various tasks and assigns an overall competency score. GPT-4o previously held a score of 1,286, while Claude-3 scored 1,271.
The previous version of Gemini 1.5 Pro scored 1,261. However, the latest experimental version (Gemini 1.5 Pro 0801) achieved a score of 1,300, indicating a higher overall capability than its competitors. While benchmark scores provide an indication of performance, they don't fully capture the range of capabilities or limitations of an AI model.
Community Reaction
The AI community has responded with enthusiasm to Gemini 1.5 Pro's release. Social media buzz highlighted the model's impressive performance, with some users describing it as "insanely good" and even surpassing ChatGPT-4o. One Redditor noted that it "blows 4o out of the water," reflecting the excitement surrounding the new model.
Future Considerations
It remains uncertain if the experimental version of Gemini 1.5 Pro will become the default model. Given its current status as an early release or testing phase, the model could potentially be altered or withdrawn for safety or alignment reasons.