Scale AI's SEAL Showdown challenges LMArena
In an effort to challenge LMArena's hegemony in AI model evaluation, Scale AI recently unveiled SEAL Showdown, a benchmarking platform that divides LLM performance by actual user preferences across demographics.
The specifics:
Through voluntary voting, SEAL Showdown creates rankings by utilizing its global contributor network, which spans 70 languages and 100 countries.
Contributors get free access to frontier models via Scale's Playground app, where genuine preference data is produced by optional side-by-side comparisons.
To stop gaming and guarantee real user input, Scale makes voting totally optional and prohibits data sharing for 60 days following collection.
A detailed picture of how models perform for various groups is provided via leaderboards that are divided by user demographics like as age, education, and language.
Although leaderboards are now widely used in the industry, they might not accurately reflect how models perform across age groups, educational levels, and other factors.
With Scale's introduction, there is competition in the rankings market and more information about which models work well for particular jobs and populations.