Triangle104 commited on
Commit
62688ec
·
verified ·
1 Parent(s): 61fe5bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md CHANGED
@@ -11,6 +11,102 @@ tags:
11
  This model was converted to GGUF format from [`arcee-ai/Arcee-Blitz`](https://huggingface.co/arcee-ai/Arcee-Blitz) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
12
  Refer to the [original model card](https://huggingface.co/arcee-ai/Arcee-Blitz) for more details on the model.
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ## Use with llama.cpp
15
  Install llama.cpp through brew (works on Mac and Linux)
16
 
 
11
  This model was converted to GGUF format from [`arcee-ai/Arcee-Blitz`](https://huggingface.co/arcee-ai/Arcee-Blitz) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
12
  Refer to the [original model card](https://huggingface.co/arcee-ai/Arcee-Blitz) for more details on the model.
13
 
14
+ ---
15
+ Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek, designed to be both fast and efficient. We view it as a practical “workhorse” model that can tackle a range of tasks without the overhead of larger architectures.
16
+
17
+ Model Details
18
+
19
+
20
+
21
+
22
+ Architecture Base: Mistral-Small-24B-Instruct-2501
23
+ Parameter Count: 24B
24
+ Distillation Data:
25
+ Merged Virtuoso pipeline with Mistral architecture, hotstarting the
26
+ training with over 3B tokens of pretraining distillation from
27
+ DeepSeek-V3 logits
28
+
29
+
30
+ Fine-Tuning and Post-Training:
31
+ After capturing core logits, we performed additional fine-tuning and distillation steps to enhance overall performance.
32
+
33
+
34
+ License: Apache-2.0
35
+
36
+
37
+
38
+
39
+
40
+
41
+
42
+ Improving World Knowledge
43
+
44
+
45
+
46
+
47
+ Arcee-Blitz shows large improvements to performance on MMLU-Pro
48
+ versus the original Mistral-Small-3, reflecting a dramatic increase in
49
+ world knowledge.
50
+
51
+
52
+
53
+
54
+
55
+
56
+
57
+ Data contamination checking
58
+
59
+
60
+
61
+
62
+ We carefully examined our training data and pipeline to avoid
63
+ contamination. While we’re confident in the validity of these gains, we
64
+ remain open to further community validation and testing (one of the key
65
+ reasons we release these models as open-source).
66
+
67
+ Limitations
68
+
69
+
70
+
71
+
72
+ Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
73
+ Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.
74
+
75
+
76
+
77
+
78
+
79
+
80
+
81
+ Ethical Considerations
82
+
83
+
84
+
85
+
86
+ Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.
87
+
88
+
89
+
90
+
91
+
92
+
93
+
94
+ License
95
+
96
+
97
+
98
+
99
+ Arcee-Blitz (24B) is released under the Apache-2.0 License.
100
+ You are free to use, modify, and distribute this model in both
101
+ commercial and non-commercial applications, subject to the terms and
102
+ conditions of the license.
103
+
104
+
105
+ If you have questions or would like to share your experiences using
106
+ Arcee-Blitz (24B), please connect with us on social media. We’re excited
107
+ to see what you build—and how this model helps you innovate!
108
+
109
+ ---
110
  ## Use with llama.cpp
111
  Install llama.cpp through brew (works on Mac and Linux)
112