Mukayese: Turkish NLP Strikes Back

Turkish Natural Language Processing is left behind in developing state-of-the-art systems due to a lack of organized benchmarks and baselines. We fill this gap with Mukayese (Turkish word for "comparison/benchmarking"), an extensive set of datasets and benchmarks for several Turkish NLP tasks. All of the datasets and code have been made public in this repository.

Updates

(22/03/2022) Summarization models are online on Huggingface!
(25/02/2022) Datasets have been made available through pre-release v0.0.1

What to do with Mukayese ?

With Mukayese, researchers of Turkish NLP will be able to:

Compare the performance of existing methods in leaderboards.
Access existing implementations of NLP baselines.
Evaluate their own methods on the relevant test datasets.
Submit their own work to be enlisted in our leaderboards.

Mukayese's Mission

The most important goal of Mukayese is to standardize the comparison and evaluation of Turkish NLP methods. As a result of the lack of a platform for benchmarking, Turkish NLP researchers struggle with comparing their models to the existing ones.