Triangle104
/

Arcee-Blitz-Q3_K_S-GGUF

Inference Endpoints

Model card Files Files and versions Community

Triangle104 commited on 7 days ago

Commit

62688ec

·

verified ·

1 Parent(s): 61fe5bb

Update README.md

Files changed (1) hide show

README.md +96 -0

README.md CHANGED Viewed

@@ -11,6 +11,102 @@ tags:
 This model was converted to GGUF format from [`arcee-ai/Arcee-Blitz`](https://huggingface.co/arcee-ai/Arcee-Blitz) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/arcee-ai/Arcee-Blitz) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`arcee-ai/Arcee-Blitz`](https://huggingface.co/arcee-ai/Arcee-Blitz) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/arcee-ai/Arcee-Blitz) for more details on the model.
+---
+Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek, designed to be both fast and efficient. We view it as a practical “workhorse” model that can tackle a range of tasks without the overhead of larger architectures.
+Model Details
+Architecture Base: Mistral-Small-24B-Instruct-2501
+Parameter Count: 24B
+Distillation Data:
+Merged Virtuoso pipeline with Mistral architecture, hotstarting the
+training with over 3B tokens of pretraining distillation from
+DeepSeek-V3 logits
+Fine-Tuning and Post-Training:
+After capturing core logits, we performed additional fine-tuning and distillation steps to enhance overall performance.
+License: Apache-2.0
+		Improving World Knowledge
+Arcee-Blitz shows large improvements to performance on MMLU-Pro
+versus the original Mistral-Small-3, reflecting a dramatic increase in
+world knowledge.
+		Data contamination checking
+We carefully examined our training data and pipeline to avoid
+contamination.  While we’re confident in the validity of these gains, we
+ remain open to further community validation and testing (one of the key
+ reasons we release these models as open-source).
+Limitations
+Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
+Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.
+		Ethical Considerations
+Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.
+		License
+Arcee-Blitz (24B) is released under the Apache-2.0 License.
+ You are free to use, modify, and distribute this model in both
+commercial and non-commercial applications, subject to the terms and
+conditions of the license.
+If you have questions or would like to share your experiences using
+Arcee-Blitz (24B), please connect with us on social media. We’re excited
+ to see what you build—and how this model helps you innovate!
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)