|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- speakleash/Bielik-11B-v2.3-Instruct |
|
pipeline_tag: text-generation |
|
tags: |
|
- medit-merge |
|
language: |
|
- pl |
|
- en |
|
--- |
|
|
|
<div align="center"> |
|
<img src="https://i.ibb.co/YLfCzXR/imagine-image-c680e106-e404-45e5-98da-af700ffe41f4.png" alt="Llama-3.2-MedIT-SUN-2.5B" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;"> |
|
</div> |
|
|
|
# Marsh Harrier |
|
|
|
The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 Instruct and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology. |
|
|
|
## Key Features: |
|
- Built on a pioneering approach to neural network weight fusion |
|
- Supports merging models of identical parameter counts while maintaining architecture flexibility |
|
- Demonstrates superior performance compared to its base models |
|
- Optimized for Polish language understanding and generation |
|
|
|
## Performance: |
|
The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot), which is part of the SpeakLeash.org open-science initiative. |
|
|
|
Technical Details: |
|
- Base Models: [Speakleash Bielik 11B v2.3 Instruct](https://huggingface.co./speakleash/Bielik-11B-v2.3-Instruct) and [Bielik 11B v2](https://huggingface.co./speakleash/Bielik-11B-v2) |
|
- Architecture: Compatible with original Bielik architecture |
|
- Parameter Count: 11 billion parameters |
|
- Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology |
|
|
|
This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency. |
|
|
|
# Polish LLM Open Leaderboard |
|
|
|
Sentiment Analysis (PolEmo2): |
|
- In-domain accuracy: Matches Bielik at 77.70% |
|
- Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%) |
|
|
|
Text Classification Tasks: |
|
- 8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%) |
|
- Belebele benchmark: Matching performance at 88.56% |
|
- CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%) |
|
|
|
Language Understanding: |
|
- DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%) |
|
- Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%) |
|
- PolQA reranking: Slight decrease (81.99% vs 83.21%) |
|
- PPC: Enhanced accuracy (78.00% vs 77.20%) |
|
- PSC: Minor F1 score decrease (90.46% vs 93.63%) |
|
|
|
Overall Performance: |
|
MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks. |
|
|
|
All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative. |
|
|
|
Kudos to the **[SpeakLeash](https://speakleash.org)** project and **[ACK Cyfronet AGH](https://www.cyfronet.pl/)** for their extraordinary work. |