---
license: apache-2.0
base_model:
- speakleash/Bielik-11B-v2.3-Instruct
pipeline_tag: text-generation
tags:
- medit-merge
language:
- pl
- en
---

<div align="center">
  <img src="https://i.ibb.co/YLfCzXR/imagine-image-c680e106-e404-45e5-98da-af700ffe41f4.png" alt="Llama-3.2-MedIT-SUN-2.5B" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
</div>

# Marsh Harrier

The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 Instruct and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology.

## Key Features:
- Built on a pioneering approach to neural network weight fusion
- Supports merging models of identical parameter counts while maintaining architecture flexibility
- Demonstrates superior performance compared to its base models
- Optimized for Polish language understanding and generation

## Performance:
The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot), which is part of the SpeakLeash.org open-science initiative.

Technical Details:
- Base Models: [Speakleash Bielik 11B v2.3 Instruct](https://huggingface.co./speakleash/Bielik-11B-v2.3-Instruct) and [Bielik 11B v2](https://huggingface.co./speakleash/Bielik-11B-v2)
- Architecture: Compatible with original Bielik architecture
- Parameter Count: 11 billion parameters
- Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology

This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency.

# Polish LLM Open Leaderboard

Core Leaderboards:
- MT-Bench-PL: slight decrease of 0.3 points (8.27 vs 8.56)
- Open PL LLM Leaderboard: improved performance by 0.09 points (65.80 vs 65.71)

Sentiment Analysis (PolEmo2):
- In-domain accuracy: Matches Bielik at 77.70%
- Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%)

Text Classification Tasks:
- 8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%)
- Belebele benchmark: Matching performance at 88.56%
- CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%)

Language Understanding:
- DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%)
- Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%)
- PolQA reranking: Slight decrease (81.99% vs 83.21%)
- PPC: Enhanced accuracy (78.00% vs 77.20%)
- PSC: Minor F1 score decrease (90.46% vs 93.63%)

Overall Performance:
MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks.

All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative.

Kudos to the **[SpeakLeash](https://speakleash.org)** project and **[ACK Cyfronet AGH](https://www.cyfronet.pl/)** for their extraordinary work.