Clarin-PL Embeddigs Library
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

LEPISZCZE

A new, comprehensive open-source benchmark for Polish NLP with a continuous-submission leaderboard.

In the first batch of experiments, we tested 13 experiments (task and dataset pairs) based on the five recent LMs for Polish, used five already benchmarked datasets, and added eight datasets that were not previously added to the benchmark.

Tasks
Punctuation Restoration
Paraphrase Classification
Political Advertising Detection
Sentiment Analysis
Part-of-speech Tagging
Named Entity Recognition
Q&A Classification
Entailment Classification
Aspect-based Sentiment Analysis
Abusive Clauses Detection
Dialogue Acts Classification
Information Retrieval
Question Answering
Model ranking

Average model performance across all tasks.

Model
Accuracy
F1 macro
F1 micro
Precision macro
Precision micro
Recall macro
Recall micro
XGBoost+Tfidf
0.757
0.738
0.757
0.753
0.757
0.728
0.757
allegro/herbert-large-cased
0.911
0.721
0.830
0.733
0.829
0.723
0.833
dkleczek/bert-base-polish-cased-v1
0.896
0.689
0.800
0.717
0.805
0.695
0.797
dkleczek/bert-base-polish-uncased-v1
0.896
0.688
0.801
0.712
0.804
0.693
0.799
allegro/herbert-base-cased
0.844
0.666
0.768
0.677
0.767
0.669
0.769
sentence-transformers/paraphrase-xlm-r-multilingual-v1
0.848
0.587
0.702
0.623
0.711
0.585
0.700
BM25
0.000
0.000
0.000
0.000
0.000
0.000
0.000
ICT
0.000
0.000
0.000
0.000
0.000
0.000
0.000