fbaldassarri
commited on
Upload README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ This an UNOFFICIAL GGUF format model files repository for converted/quantized OF
|
|
23 |
## π¨ Disclaimers
|
24 |
* This is an UNOFFICIAL quantization of the OFFICIAL model checkpoint released by iGenius.
|
25 |
* This model is based also on the conversion made for HF Transformers by [Sapienza NLP, Sapienza University of Rome](https://huggingface.co/sapienzanlp).
|
26 |
-
* The original model was developed using LitGPT
|
27 |
|
28 |
## π¨ Terms and Conditions
|
29 |
* **Note:** By using this model, you accept the iGenius' [**terms and conditions**](https://secure.igenius.ai/legal/italia_terms_and_conditions.pdf).
|
@@ -35,6 +35,7 @@ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is
|
|
35 |
Here is an incomplate list of clients and libraries that are known to support GGUF:
|
36 |
|
37 |
* [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
|
|
|
38 |
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
|
39 |
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
|
40 |
* [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
|
@@ -44,6 +45,35 @@ Here is an incomplate list of clients and libraries that are known to support GG
|
|
44 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
|
45 |
* [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
## π¨ Compatibility
|
48 |
|
49 |
These quantised GGUFv2 files are compatible with llama.cpp from August 27th 2023 onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
|
@@ -74,7 +104,7 @@ We are aware of the biases and potential problematic/toxic content that current
|
|
74 |
For more information about this issue, please refer to our survey paper:
|
75 |
* [Biases in Large Language Models: Origins, Inventory, and Discussion](https://dl.acm.org/doi/full/10.1145/3597307)
|
76 |
|
77 |
-
## Model architecture
|
78 |
* The model architecture is **based on GPT-NeoX**.
|
79 |
|
80 |
|
|
|
23 |
## π¨ Disclaimers
|
24 |
* This is an UNOFFICIAL quantization of the OFFICIAL model checkpoint released by iGenius.
|
25 |
* This model is based also on the conversion made for HF Transformers by [Sapienza NLP, Sapienza University of Rome](https://huggingface.co/sapienzanlp).
|
26 |
+
* The original model was developed using LitGPT.
|
27 |
|
28 |
## π¨ Terms and Conditions
|
29 |
* **Note:** By using this model, you accept the iGenius' [**terms and conditions**](https://secure.igenius.ai/legal/italia_terms_and_conditions.pdf).
|
|
|
35 |
Here is an incomplate list of clients and libraries that are known to support GGUF:
|
36 |
|
37 |
* [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
|
38 |
+
* [neural-speed](https://github.com/intel/neural-speed). Same interface of llama.cpp, optimized for interefence on CPU.
|
39 |
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
|
40 |
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
|
41 |
* [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
|
|
|
45 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
|
46 |
* [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
|
47 |
|
48 |
+
## π¨ Explanation of quantisation methods
|
49 |
+
|
50 |
+
The new methods available are:
|
51 |
+
* GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
|
52 |
+
* GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
|
53 |
+
* GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
|
54 |
+
* GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
|
55 |
+
* GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
|
56 |
+
|
57 |
+
Refer to the Provided Files table below to see what files use which methods, and how.
|
58 |
+
|
59 |
+
## π¨ Provided files
|
60 |
+
|
61 |
+
| Name | Quant method | Bits | Size | Use case |
|
62 |
+
| ---- | ---- | ---- | ---- | ---- | ----- |
|
63 |
+
| [modello-italia-9b-ggml-Q2_K.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q2_K.gguf?download=true) | Q2_K | 2 | 3.3 GB | smallest, significant quality loss - not recommended for most purposes |
|
64 |
+
| [modello-italia-9b-ggml-Q3_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q3_K_M.gguf?download=true) | Q3_K_M | 3 | 4.6 GB | very small, high quality loss |
|
65 |
+
| [modello-italia-9b-ggml-Q3_K_L.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q3_K_L.gguf?download=true) | Q3_K_L | 3 | 4.9 GB | small, substantial quality loss |
|
66 |
+
| [modello-italia-9b-ggml-Q4_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_0.gguf?download=true) | Q4_0 | 4 | 4.9 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
|
67 |
+
| [modello-italia-9b-ggml-Q4_K_S.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_K_S.gguf?download=true) | Q4_K_S | 4 | 4.9 GB | small, greater quality loss |
|
68 |
+
| [modello-italia-9b-ggml-Q4_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q4_K_M.gguf?download=true) | Q4_K_M | 4 | 5.5 GB | medium, balanced quality - RECOMMENDED |
|
69 |
+
| [modello-italia-9b-ggml-Q5_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_0.gguf?download=true) | Q5_0 | 5 | 5.9 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
|
70 |
+
| [modello-italia-9b-ggml-Q5_K_S.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_K_S.gguf?download=true) | Q5_K_S | 5 | 5.9 GB | large, low quality loss - recommended |
|
71 |
+
| [modello-italia-9b-ggml-Q5_K_M.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q5_K_M.gguf?download=true) | Q5_K_M | 5 | 6.4 GB | large, very low quality loss - recommended |
|
72 |
+
| [modello-italia-9b-ggml-Q6_K.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q6_K.gguf?download=true) | Q6_K | 6 | 7.0 GB | very large, extremely low quality loss |
|
73 |
+
| [modello-italia-9b-ggml-Q8_0.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-Q8_0.gguf?download=true) | Q8_0 | 8 | 9.1 GB | very large, extremely low quality loss - not recommended |
|
74 |
+
| [modello-italia-9b-ggml-f16.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-f16.gguf?download=true) | FP16 | 16 | 17.1 GB | very large, extremely low quality loss - not recommended |
|
75 |
+
| [modello-italia-9b-ggml-f32.gguf](https://huggingface.co/fbaldassarri/modello-italia-9B-GGUF/resolve/main/modello-italia-9b-ggml-f32.gguf?download=true) | FP32 | 32 | 34.2 GB | very large, no quality loss - not recommended |
|
76 |
+
|
77 |
## π¨ Compatibility
|
78 |
|
79 |
These quantised GGUFv2 files are compatible with llama.cpp from August 27th 2023 onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
|
|
|
104 |
For more information about this issue, please refer to our survey paper:
|
105 |
* [Biases in Large Language Models: Origins, Inventory, and Discussion](https://dl.acm.org/doi/full/10.1145/3597307)
|
106 |
|
107 |
+
## π¨ Model architecture
|
108 |
* The model architecture is **based on GPT-NeoX**.
|
109 |
|
110 |
|