metadata

license: apache-2.0
base_model:
  - speakleash/Bielik-11B-v2.3-Instruct
pipeline_tag: text-generation
tags:
  - medit-merge
language:
  - pl
  - en

Marsh Harrier

The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 Instruct and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology.

Key Features:

Built on a pioneering approach to neural network weight fusion
Supports merging models of identical parameter counts while maintaining architecture flexibility
Demonstrates superior performance compared to its base models
Optimized for Polish language understanding and generation

Performance:

The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot), which is part of the SpeakLeash.org open-science initiative.

Technical Details:

Base Models: Speakleash Bielik 11B v2.3 Instruct and Bielik 11B v2
Architecture: Compatible with original Bielik architecture
Parameter Count: 11 billion parameters
Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology

This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency.

Polish LLM Open Leaderboard

Sentiment Analysis (PolEmo2):

In-domain accuracy: Matches Bielik at 77.70%
Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%)

Text Classification Tasks:

8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%)
Belebele benchmark: Matching performance at 88.56%
CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%)

Language Understanding:

DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%)
Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%)
PolQA reranking: Slight decrease (81.99% vs 83.21%)
PPC: Enhanced accuracy (78.00% vs 77.20%)
PSC: Minor F1 score decrease (90.46% vs 93.63%)

Overall Performance: MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks.

All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative.

Kudos to the SpeakLeash project and ACK Cyfronet AGH for their extraordinary work.