meditsolutions
/

MSH-v1-Bielik-v2.3-Instruct-MedIT-merge

Text Generation

Model card Files Files and versions Community

mkurman commited on 9 days ago

Commit

42c2381

•

1 Parent(s): a3719ff

Update README.md

Files changed (1) hide show

README.md +56 -3

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- speakleash/Bielik-11B-v2.3-Instruct
+pipeline_tag: text-generation
+tags:
+- medit-merge
+---
+<div align="center">
+  <img src="https://i.ibb.co/YLfCzXR/imagine-image-c680e106-e404-45e5-98da-af700ffe41f4.png" alt="Llama-3.2-MedIT-SUN-2.5B" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
+</div>
+# Marsh Harrier
+The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology.
+## Key Features:
+- Built on a pioneering approach to neural network weight fusion
+- Supports merging models of identical parameter counts while maintaining architecture flexibility
+- Demonstrates superior performance compared to its base models
+- Optimized for Polish language understanding and generation
+## Performance:
+The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot and 5-shot), which is part of the SpeakLeash.org open-science initiative.
+Technical Details:
+- Base Models: Speakleash Bielik 11B v2.3 and Bielik 11B v2 (https://huggingface.co/speakleash/Bielik-11B-v2.3-Instruct)
+- Architecture: Compatible with original Bielik architecture
+- Parameter Count: 11 billion parameters
+- Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology
+This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency.
+# Polish LLM Open Leaderboard
+Sentiment Analysis (PolEmo2):
+- In-domain accuracy: Matches Bielik at 77.70%
+- Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%)
+Text Classification Tasks:
+- 8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%)
+- Belebele benchmark: Matching performance at 88.56%
+- CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%)
+Language Understanding:
+- DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%)
+- Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%)
+- PolQA reranking: Slight decrease (81.99% vs 83.21%)
+- PPC: Enhanced accuracy (78.00% vs 77.20%)
+- PSC: Minor F1 score decrease (90.46% vs 93.63%)
+Overall Performance:
+MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks.
+All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative.