fbaldassarri
/

modello-italia-9B-GGUF

@@ -23,7 +23,7 @@ This an UNOFFICIAL GGUF format model files repository for converted/quantized OF
 ## 🚨 Disclaimers
 * This is an UNOFFICIAL quantization of the OFFICIAL model checkpoint released by iGenius.
 * This model is based also on the conversion made for HF Transformers by [Sapienza NLP, Sapienza University of Rome](https://huggingface.co/sapienzanlp).
-* The original model was developed using LitGPT, therefore, the weights need to be converted before they can be used with Hugging Face transformers.
 ## 🚨 Terms and Conditions
 * **Note:** By using this model, you accept the iGenius' [**terms and conditions**](https://secure.igenius.ai/legal/italia_terms_and_conditions.pdf).
@@ -35,6 +35,7 @@ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is
 Here is an incomplate list of clients and libraries that are known to support GGUF:
 * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
 * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
 * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
 * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
@@ -44,6 +45,35 @@ Here is an incomplate list of clients and libraries that are known to support GG
 * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
 * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
 ## 🚨 Compatibility
 These quantised GGUFv2 files are compatible with llama.cpp from August 27th 2023 onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
@@ -74,7 +104,7 @@ We are aware of the biases and potential problematic/toxic content that current
 For more information about this issue, please refer to our survey paper:
 * [Biases in Large Language Models: Origins, Inventory, and Discussion](https://dl.acm.org/doi/full/10.1145/3597307)
-## Model architecture
 * The model architecture is **based on GPT-NeoX**.

 ## 🚨 Disclaimers
 * This is an UNOFFICIAL quantization of the OFFICIAL model checkpoint released by iGenius.
 * This model is based also on the conversion made for HF Transformers by [Sapienza NLP, Sapienza University of Rome](https://huggingface.co/sapienzanlp).
+* The original model was developed using LitGPT.
 ## 🚨 Terms and Conditions
 * **Note:** By using this model, you accept the iGenius' [**terms and conditions**](https://secure.igenius.ai/legal/italia_terms_and_conditions.pdf).
 Here is an incomplate list of clients and libraries that are known to support GGUF:
 * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
+* [neural-speed](https://github.com/intel/neural-speed). Same interface of llama.cpp, optimized for interefence on CPU.
 * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
 * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
 * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
 * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
 * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
+## 🚨 Explanation of quantisation methods
+The new methods available are:
+* GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
+* GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
+* GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
+* GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
+* GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
+Refer to the Provided Files table below to see what files use which methods, and how.
+## 🚨 Provided files
+| Name | Quant method | Bits | Size | Use case |
+| ---- | ---- | ---- | ---- | ---- | ----- |
+| [modello-italia-9b-ggml-Q2_K.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q2_K.gguf?download=true) | Q2_K | 2 | 3.3 GB | smallest, significant quality loss - not recommended for most purposes |
+| [modello-italia-9b-ggml-Q3_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q3_K_M.gguf?download=true) | Q3_K_M | 3 | 4.6 GB | very small, high quality loss |
+| [modello-italia-9b-ggml-Q3_K_L.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q3_K_L.gguf?download=true) | Q3_K_L | 3 | 4.9 GB | small, substantial quality loss |
+| [modello-italia-9b-ggml-Q4_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_0.gguf?download=true) | Q4_0 | 4 | 4.9 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
+| [modello-italia-9b-ggml-Q4_K_S.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_K_S.gguf?download=true) | Q4_K_S | 4 | 4.9 GB | small, greater quality loss |
+| [modello-italia-9b-ggml-Q4_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_K_M.gguf?download=true) | Q4_K_M | 4 | 5.5 GB | medium, balanced quality - RECOMMENDED |
+| [modello-italia-9b-ggml-Q5_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_0.gguf?download=true) | Q5_0 | 5 | 5.9 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
+| [modello-italia-9b-ggml-Q5_K_S.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_K_S.gguf?download=true) | Q5_K_S | 5 | 5.9 GB | large, low quality loss - recommended |
+| [modello-italia-9b-ggml-Q5_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_K_M.gguf?download=true) | Q5_K_M | 5 | 6.4 GB | large, very low quality loss - recommended |
+| [modello-italia-9b-ggml-Q6_K.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q6_K.gguf?download=true) | Q6_K | 6 | 7.0 GB | very large, extremely low quality loss |
+| [modello-italia-9b-ggml-Q8_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q8_0.gguf?download=true) | Q8_0 | 8 | 9.1 GB | very large, extremely low quality loss - not recommended |
+| [modello-italia-9b-ggml-f16.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-f16.gguf?download=true) | FP16 | 16 | 17.1 GB | very large, extremely low quality loss - not recommended |
+| [modello-italia-9b-ggml-f32.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-f32.gguf?download=true) | FP32 | 32 | 34.2 GB | very large, no quality loss - not recommended |
 ## 🚨 Compatibility
 These quantised GGUFv2 files are compatible with llama.cpp from August 27th 2023 onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
 For more information about this issue, please refer to our survey paper:
 * [Biases in Large Language Models: Origins, Inventory, and Discussion](https://dl.acm.org/doi/full/10.1145/3597307)
+## 🚨 Model architecture
 * The model architecture is **based on GPT-NeoX**.