Update README.md
Browse files
README.md
CHANGED
@@ -3,11 +3,11 @@ license: apache-2.0
|
|
3 |
inference: false
|
4 |
---
|
5 |
|
6 |
-
# bling-phi-3
|
7 |
|
8 |
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
|
10 |
-
bling-phi-3 is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained on top of a Microsoft Phi-3.5 base model.
|
11 |
|
12 |
|
13 |
### Benchmark Tests
|
@@ -25,8 +25,9 @@ Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester](https://
|
|
25 |
|
26 |
For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
|
27 |
|
28 |
-
|
29 |
|
|
|
30 |
|
31 |
### Model Description
|
32 |
|
@@ -69,19 +70,23 @@ Any model can provide inaccurate or incomplete information, and should be used i
|
|
69 |
|
70 |
## How to Get Started with the Model
|
71 |
|
72 |
-
|
73 |
|
74 |
-
from
|
75 |
-
|
76 |
-
|
|
|
77 |
|
78 |
-
|
|
|
|
|
|
|
|
|
79 |
|
80 |
-
|
81 |
|
82 |
-
full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
|
83 |
|
84 |
-
|
85 |
|
86 |
|
87 |
The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
|
|
|
3 |
inference: false
|
4 |
---
|
5 |
|
6 |
+
# bling-phi-3.5-gguf
|
7 |
|
8 |
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
|
10 |
+
bling-phi-3.5-gguf is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained on top of a Microsoft Phi-3.5 base model, and 4_K_M quantized with GGUF for fast local inference.
|
11 |
|
12 |
|
13 |
### Benchmark Tests
|
|
|
25 |
|
26 |
For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
|
27 |
|
28 |
+
Please note that this is the model version used in the test results to replicate the most common inference environment (rather than the original Pytorch version).
|
29 |
|
30 |
+
Note: compare results with [bling-phi-3-gguf](https://www.hugggingface.co/llmware/bling-phi-3-gguf) and [bling-phi-2](https://www.huggingface.co/llmware/bling-phi-2-v0).
|
31 |
|
32 |
### Model Description
|
33 |
|
|
|
70 |
|
71 |
## How to Get Started with the Model
|
72 |
|
73 |
+
To pull the model via API:
|
74 |
|
75 |
+
from huggingface_hub import snapshot_download
|
76 |
+
snapshot_download("llmware/bling-phi-3.5-gguf", local_dir="/path/on/your/machine/", local_dir_use_symlinks=False)
|
77 |
+
|
78 |
+
Load in your favorite GGUF inference engine, or try with llmware as follows:
|
79 |
|
80 |
+
from llmware.models import ModelCatalog
|
81 |
+
|
82 |
+
# to load the model and make a basic inference
|
83 |
+
model = ModelCatalog().load_model("llmware/bling-phi-3.5-gguf", temperature=0.0, sample=False)
|
84 |
+
response = model.inference(query, add_context=text_sample)
|
85 |
|
86 |
+
Details on the prompt wrapper and other configurations are on the config.json file in the files repository.
|
87 |
|
|
|
88 |
|
89 |
+
## How to Get Started with the Model
|
90 |
|
91 |
|
92 |
The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
|