JosephusCheung
commited on
Commit
•
fc99bb6
1
Parent(s):
2013362
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,15 @@
|
|
1 |
---
|
2 |
license: gpl-3.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: gpl-3.0
|
3 |
---
|
4 |
+
|
5 |
+
This repository is for Guanaco model with 4-bit quantized weights. The model benefits from two novel techniques introduced by GPTQ: quantizing columns in order of decreasing activation size and performing sequential quantization within a single Transformer block. These innovations enable compact, consumer-level multilingual models to function effectively.
|
6 |
+
|
7 |
+
The Guanaco model aims to provide a minimal multilingual conversational model capable of handling simple Q&A interactions, with a comprehensive understanding of grammar, rich vocabulary, and stability similar to that of large-scale language models, for use as a human-computer interface.
|
8 |
+
|
9 |
+
However, due to the limitations of consumer hardware, it is impossible for models with the performance level of ChatGPT3.5/GPT4 to run independently. Our model, with a reduced number of parameters, can still operate on older hardware generations, requiring less than 6GB of memory after 4-bit quantization. The only constraint is the speed, which depends on the actual hardware configuration.
|
10 |
+
|
11 |
+
Instead of competing with large models like ChatGPT, we pursue a different approach: a functionally complete language model without any inherent knowledge or computational ability. We achieve this by integrating APIs for knowledge acquisition (e.g., querying online resources like Wikipedia or utilizing Wolfram|Alpha for calculations) to provide accurate information to users, rather than relying on the model's learning and understanding capabilities. The primary goal is to create a stable large-scale language model for human-computer interaction.
|
12 |
+
|
13 |
+
An example of this approach is processing long articles or PDF documents. With traditional ChatGPT3.5 API's single-threaded operation, text must be divided into segments and matched with user input, which is inefficient. Our minimal multilingual model can analyze text sentence by sentence, generating multiple human-readable questions for each sentence. It can then establish logical connections between these questions using a Question-Answer tree structure and algorithms like PageRank to provide users with answers based on preliminary logical analysis.
|
14 |
+
|
15 |
+
Furthermore, our model can be applied to summarizing web search results. These use-cases, which are challenging for large models due to cost, scale, and frequency limitations, are more feasible on local, small-scale, consumer-level hardware. This direction represents the next step in our efforts.
|