해리슨 블로그

Google's New Gemini Lineup - Experimental

Created: 2024-09-03

Created: 2024-09-03 12:06

Google recently released new models of Gemini.

Actually, it's not the official version, but an experimental version.

Those models are as follows.

  • Gemini 1.5 Pro Exp
  • Gemini 1.5 Flash Exp
  • Gemini 1.5 Flash 8B Exp

First, in the case of 1.5 Pro Exp and 1.5 Flash Exp, it can be seen as an update of the existing version. In fact, when tested, the 1.5 Pro Exp showed a slightly improved performance compared to the existing 1.5 Pro (benchmark performance has also improved. Although I haven't included the data, it can be considered the best among the previously released versions). The 1.5 Flash Exp has also improved to the best level among entry-level versions, although it's not the absolute best, of course.

These two versions are said to be reflected in the existing versions 1.5 Pro and 1.5 Flash within a few weeks. (Since it was version 001 before, it seems that it will be updated to version 002.)

Google's New Gemini Lineup - Experimental

Source: Chatbot Arena

Looking at the table, the Gemini 1.5 Pro Exp version is ranked 2nd, and Gemini 1.5 Flash Exp is ranked 6th.

Interestingly, the 1.5 Flash Exp is ranked higher than the Gemini 1.5 Pro, which is ranked 10th and 11th.

The top 5 are the flagship versions of each company (GPT 4o, Gemini 1.5 Pro, Grok 2), and the 6th and 7th are the entry-level lines of each company (GPT 4o mini, Gemini 1.5 Flash). Claude 3.5 Sonnet was at the top for a while... The development speed of this industry is truly remarkable...


Anyway, personally, these two versions (1.5 Pro, 1.5 Flash) will become official versions soon, and currently, I'm curious about the 1.5 Flash 8b version.

Let's look at the benchmark first.

Google's New Gemini Lineup - Experimental

Source: Chatbot Arena

Looking at the benchmark above, the performance of the 1.5 Flash 8b Exp is roughly similar to the existing Claude 3 Sonnet, and it's about the same level as the existing 1.5 Flash, but slightly lower. However, it shows similar performance to Llama 3 70b, etc.

Flash is a lightweight version of Pro, and Flash 8b seems to be an even more lightweight version. (Probably?)


First, I've done some direct tests, including text tests that are frequently used in our durumis service.

  • Translate.
  • Summarize.
  • Write text.

There are many other tests elsewhere, so I've briefly tested a few. When translating text with a complex JSON structure, Flash didn't produce satisfactory results, but the Pro lineup definitely did. Moreover, the Pro Exp version produced even cleaner results.

Both Flash and Flash 8b showed satisfactory results in summarization and text writing. It appears that simple tasks can be handled by Flash 8b, as long as they are not highly complex.

Considering the performance and parameters, Google is likely to price Flash 8b quite competitively when it's officially released.

Perhaps they'll price it in a way that significantly impacts the existing competitor lineup. When it's released, I'll come back "again" with the price list.




Comments0