fbaldassarri commited on
Commit
f5a5b99
Β·
verified Β·
1 Parent(s): 441dc9f

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -2
README.md CHANGED
@@ -23,7 +23,7 @@ This an UNOFFICIAL GGUF format model files repository for converted/quantized OF
23
  ## 🚨 Disclaimers
24
  * This is an UNOFFICIAL quantization of the OFFICIAL model checkpoint released by iGenius.
25
  * This model is based also on the conversion made for HF Transformers by [Sapienza NLP, Sapienza University of Rome](https://huggingface.co/sapienzanlp).
26
- * The original model was developed using LitGPT, therefore, the weights need to be converted before they can be used with Hugging Face transformers.
27
 
28
  ## 🚨 Terms and Conditions
29
  * **Note:** By using this model, you accept the iGenius' [**terms and conditions**](https://secure.igenius.ai/legal/italia_terms_and_conditions.pdf).
@@ -35,6 +35,7 @@ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is
35
  Here is an incomplate list of clients and libraries that are known to support GGUF:
36
 
37
  * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
 
38
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
39
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
40
  * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
@@ -44,6 +45,35 @@ Here is an incomplate list of clients and libraries that are known to support GG
44
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
45
  * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ## 🚨 Compatibility
48
 
49
  These quantised GGUFv2 files are compatible with llama.cpp from August 27th 2023 onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
@@ -74,7 +104,7 @@ We are aware of the biases and potential problematic/toxic content that current
74
  For more information about this issue, please refer to our survey paper:
75
  * [Biases in Large Language Models: Origins, Inventory, and Discussion](https://dl.acm.org/doi/full/10.1145/3597307)
76
 
77
- ## Model architecture
78
  * The model architecture is **based on GPT-NeoX**.
79
 
80
 
 
23
  ## 🚨 Disclaimers
24
  * This is an UNOFFICIAL quantization of the OFFICIAL model checkpoint released by iGenius.
25
  * This model is based also on the conversion made for HF Transformers by [Sapienza NLP, Sapienza University of Rome](https://huggingface.co/sapienzanlp).
26
+ * The original model was developed using LitGPT.
27
 
28
  ## 🚨 Terms and Conditions
29
  * **Note:** By using this model, you accept the iGenius' [**terms and conditions**](https://secure.igenius.ai/legal/italia_terms_and_conditions.pdf).
 
35
  Here is an incomplate list of clients and libraries that are known to support GGUF:
36
 
37
  * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
38
+ * [neural-speed](https://github.com/intel/neural-speed). Same interface of llama.cpp, optimized for interefence on CPU.
39
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
40
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
41
  * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
 
45
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
46
  * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
47
 
48
+ ## 🚨 Explanation of quantisation methods
49
+
50
+ The new methods available are:
51
+ * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
52
+ * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
53
+ * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
54
+ * GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
55
+ * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
56
+
57
+ Refer to the Provided Files table below to see what files use which methods, and how.
58
+
59
+ ## 🚨 Provided files
60
+
61
+ | Name | Quant method | Bits | Size | Use case |
62
+ | ---- | ---- | ---- | ---- | ---- | ----- |
63
+ | [modello-italia-9b-ggml-Q2_K.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q2_K.gguf?download=true) | Q2_K | 2 | 3.3 GB | smallest, significant quality loss - not recommended for most purposes |
64
+ | [modello-italia-9b-ggml-Q3_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q3_K_M.gguf?download=true) | Q3_K_M | 3 | 4.6 GB | very small, high quality loss |
65
+ | [modello-italia-9b-ggml-Q3_K_L.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q3_K_L.gguf?download=true) | Q3_K_L | 3 | 4.9 GB | small, substantial quality loss |
66
+ | [modello-italia-9b-ggml-Q4_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_0.gguf?download=true) | Q4_0 | 4 | 4.9 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
67
+ | [modello-italia-9b-ggml-Q4_K_S.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_K_S.gguf?download=true) | Q4_K_S | 4 | 4.9 GB | small, greater quality loss |
68
+ | [modello-italia-9b-ggml-Q4_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_K_M.gguf?download=true) | Q4_K_M | 4 | 5.5 GB | medium, balanced quality - RECOMMENDED |
69
+ | [modello-italia-9b-ggml-Q5_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_0.gguf?download=true) | Q5_0 | 5 | 5.9 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
70
+ | [modello-italia-9b-ggml-Q5_K_S.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_K_S.gguf?download=true) | Q5_K_S | 5 | 5.9 GB | large, low quality loss - recommended |
71
+ | [modello-italia-9b-ggml-Q5_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_K_M.gguf?download=true) | Q5_K_M | 5 | 6.4 GB | large, very low quality loss - recommended |
72
+ | [modello-italia-9b-ggml-Q6_K.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q6_K.gguf?download=true) | Q6_K | 6 | 7.0 GB | very large, extremely low quality loss |
73
+ | [modello-italia-9b-ggml-Q8_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q8_0.gguf?download=true) | Q8_0 | 8 | 9.1 GB | very large, extremely low quality loss - not recommended |
74
+ | [modello-italia-9b-ggml-f16.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-f16.gguf?download=true) | FP16 | 16 | 17.1 GB | very large, extremely low quality loss - not recommended |
75
+ | [modello-italia-9b-ggml-f32.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-f32.gguf?download=true) | FP32 | 32 | 34.2 GB | very large, no quality loss - not recommended |
76
+
77
  ## 🚨 Compatibility
78
 
79
  These quantised GGUFv2 files are compatible with llama.cpp from August 27th 2023 onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
 
104
  For more information about this issue, please refer to our survey paper:
105
  * [Biases in Large Language Models: Origins, Inventory, and Discussion](https://dl.acm.org/doi/full/10.1145/3597307)
106
 
107
+ ## 🚨 Model architecture
108
  * The model architecture is **based on GPT-NeoX**.
109
 
110