meditsolutions
/

MSH-v1-Bielik-v2.3-Instruct-MedIT-merge-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

MSH-v1-Bielik-v2.3-Instruct-MedIT-merge-GGUF / README.md

mkurman's picture

Update README.md

c13a8ce verified 9 days ago

|

3.1 kB

	---
	license: apache-2.0
	base_model:
	- speakleash/Bielik-11B-v2.3-Instruct
	pipeline_tag: text-generation
	tags:
	- medit-merge
	language:
	- pl
	- en
	---

	<div align="center">
	<img src="https://i.ibb.co/YLfCzXR/imagine-image-c680e106-e404-45e5-98da-af700ffe41f4.png" alt="Llama-3.2-MedIT-SUN-2.5B" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
	</div>

	# Marsh Harrier

	The Marsh Harrier (MSH) is a language model developed by MedIT Solutions using an advanced checkpoint merging technique. It represents a novel fusion of the Speakleash Bielik 11B v2.3 Instruct and Speakleash Bielik 11B v2 models, employing our proprietary weight-merging methodology.

	## Key Features:
	- Built on a pioneering approach to neural network weight fusion
	- Supports merging models of identical parameter counts while maintaining architecture flexibility
	- Demonstrates superior performance compared to its base models
	- Optimized for Polish language understanding and generation

	## Performance:
	The model shows significant improvements over its predecessors across multiple metrics in the Open PL LLM Leaderboard evaluation framework (0-shot), which is part of the SpeakLeash.org open-science initiative.

	Technical Details:
	- Base Models: [Speakleash Bielik 11B v2.3 Instruct](https://huggingface.co./speakleash/Bielik-11B-v2.3-Instruct) and [Bielik 11B v2](https://huggingface.co./speakleash/Bielik-11B-v2)
	- Architecture: Compatible with original Bielik architecture
	- Parameter Count: 11 billion parameters
	- Special Feature: Utilizes MedIT Solutions' proprietary checkpoint merging technology

	This model represents a step forward in developing the Polish language, demonstrating how merging techniques can enhance model performance while maintaining architectural efficiency.

	# Polish LLM Open Leaderboard

	Sentiment Analysis (PolEmo2):
	- In-domain accuracy: Matches Bielik at 77.70%
	- Out-of-domain accuracy: Improved performance at 79.76% (vs 79.35%)

	Text Classification Tasks:
	- 8tags classification: Significant improvement of ~3pp (76.14% vs 73.17%)
	- Belebele benchmark: Matching performance at 88.56%
	- CBD task: Substantial F1 score improvement by 10pp (23.91% vs 13.73%)

	Language Understanding:
	- DYK ("Did you know..."): Improved F1 score (69.77% vs 69.14%)
	- Named Entity Recognition (KLEJ NER): Notable improvement of ~8pp (45.53% vs 37.61%)
	- PolQA reranking: Slight decrease (81.99% vs 83.21%)
	- PPC: Enhanced accuracy (78.00% vs 77.20%)
	- PSC: Minor F1 score decrease (90.46% vs 93.63%)

	Overall Performance:
	MSH-v1 achieves a higher average score of 71.18% compared to Bielik v2.3's 69.33%, demonstrating the effectiveness of our checkpoint merging technique in improving model performance across diverse NLP tasks.

	All evaluations were conducted using the Open PL LLM Leaderboard framework (0-shot) as part of the SpeakLeash.org open-science initiative.

	Kudos to the [SpeakLeash](https://speakleash.org) project and [ACK Cyfronet AGH](https://www.cyfronet.pl/) for their extraordinary work.