JosephusCheung
/

GuanacoOnConsumerHardware

Text Generation

Inference Endpoints

Model card Files Files and versions

JosephusCheung commited on Mar 28, 2023

Commit

fc99bb6

•

1 Parent(s): 2013362

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -1,3 +1,15 @@
 ---
 license: gpl-3.0
 ---

 ---
 license: gpl-3.0
 ---
+This repository is for Guanaco model with 4-bit quantized weights. The model benefits from two novel techniques introduced by GPTQ: quantizing columns in order of decreasing activation size and performing sequential quantization within a single Transformer block. These innovations enable compact, consumer-level multilingual models to function effectively.
+The Guanaco model aims to provide a minimal multilingual conversational model capable of handling simple Q&A interactions, with a comprehensive understanding of grammar, rich vocabulary, and stability similar to that of large-scale language models, for use as a human-computer interface.
+However, due to the limitations of consumer hardware, it is impossible for models with the performance level of ChatGPT3.5/GPT4 to run independently. Our model, with a reduced number of parameters, can still operate on older hardware generations, requiring less than 6GB of memory after 4-bit quantization. The only constraint is the speed, which depends on the actual hardware configuration.
+Instead of competing with large models like ChatGPT, we pursue a different approach: a functionally complete language model without any inherent knowledge or computational ability. We achieve this by integrating APIs for knowledge acquisition (e.g., querying online resources like Wikipedia or utilizing Wolfram|Alpha for calculations) to provide accurate information to users, rather than relying on the model's learning and understanding capabilities. The primary goal is to create a stable large-scale language model for human-computer interaction.
+An example of this approach is processing long articles or PDF documents. With traditional ChatGPT3.5 API's single-threaded operation, text must be divided into segments and matched with user input, which is inefficient. Our minimal multilingual model can analyze text sentence by sentence, generating multiple human-readable questions for each sentence. It can then establish logical connections between these questions using a Question-Answer tree structure and algorithms like PageRank to provide users with answers based on preliminary logical analysis.
+Furthermore, our model can be applied to summarizing web search results. These use-cases, which are challenging for large models due to cost, scale, and frequency limitations, are more feasible on local, small-scale, consumer-level hardware. This direction represents the next step in our efforts.