tilde-bench.github.io

TildeBench - a leaderboard for large language model (LLM) performance in Baltic, Finnic & Slavic languages

We are interested in three aspects:

Language support: Do the models really “speak” our languages?
Task quality: Can they perform our tasks to the quality our users expect?
Task reliability: Is the failure rate of the models low enough for application in production?

Go to TildeBench to see the leaderboard.

This leaderboard is an ongoing effort (work in progress). If you have an interesting benchmark for our languages in mind that you would want to suggest or contribute with, let us know, and let’s push the state of the art together!

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
error-analysis.png		error-analysis.png
index.html		index.html
one-shot-empty-bench.html		one-shot-empty-bench.html
one-shot-empty-robustness-bench.html		one-shot-empty-robustness-bench.html
one-shot-mt-bench.png		one-shot-mt-bench.png
prism.css		prism.css
prism.js		prism.js
robust-empty-bench.png		robust-empty-bench.png
text-generation-error-analysis-euro-models.html		text-generation-error-analysis-euro-models.html
tilde-bench.png		tilde-bench.png
tilde-logo.png		tilde-logo.png
tokenizer-bench.html		tokenizer-bench.html
tokenizer-bench.png		tokenizer-bench.png
zero-shot-mcqa-bench.html		zero-shot-mcqa-bench.html
zero-shot-mcqa-bench.png		zero-shot-mcqa-bench.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tilde-bench.github.io

About

Uh oh!

Releases

Packages

Languages

tilde-nlp/tilde-nlp.github.io

Folders and files

Latest commit

History

Repository files navigation

tilde-bench.github.io

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages