doberst commited on
Commit
93ebce8
1 Parent(s): 8e8df03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -11
README.md CHANGED
@@ -3,11 +3,11 @@ license: apache-2.0
3
  inference: false
4
  ---
5
 
6
- # bling-phi-3
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
- bling-phi-3 is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained on top of a Microsoft Phi-3.5 base model.
11
 
12
 
13
  ### Benchmark Tests
@@ -25,8 +25,9 @@ Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester](https://
25
 
26
  For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
27
 
28
- Note: compare results with [bling-phi-2](https://www.huggingface.co/llmware/bling-phi-2-v0), and [dragon-mistral-7b](https://www.huggingface.co/llmware/dragon-mistral-7b-v0).
29
 
 
30
 
31
  ### Model Description
32
 
@@ -69,19 +70,23 @@ Any model can provide inaccurate or incomplete information, and should be used i
69
 
70
  ## How to Get Started with the Model
71
 
72
- The fastest way to get started with BLING is through direct import in transformers:
73
 
74
- from transformers import AutoTokenizer, AutoModelForCausalLM
75
- tokenizer = AutoTokenizer.from_pretrained("llmware/bling-phi-3.5-gguf", trust_remote_code=True)
76
- model = AutoModelForCausalLM.from_pretrained("llmware/bling-phi-3.5-gguf", trust_remote_code=True)
 
77
 
78
- Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model. The **generation_test_llmware_script.py** includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.
 
 
 
 
79
 
80
- The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
81
 
82
- full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
83
 
84
- (As an aside, we intended to retire "human-bot" and tried several variations of the new Microsoft Phi-3 prompt template and ultimately had slightly better results with the very simple "human-bot" separators, so we opted to keep them.)
85
 
86
 
87
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
 
3
  inference: false
4
  ---
5
 
6
+ # bling-phi-3.5-gguf
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
+ bling-phi-3.5-gguf is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained on top of a Microsoft Phi-3.5 base model, and 4_K_M quantized with GGUF for fast local inference.
11
 
12
 
13
  ### Benchmark Tests
 
25
 
26
  For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
27
 
28
+ Please note that this is the model version used in the test results to replicate the most common inference environment (rather than the original Pytorch version).
29
 
30
+ Note: compare results with [bling-phi-3-gguf](https://www.hugggingface.co/llmware/bling-phi-3-gguf) and [bling-phi-2](https://www.huggingface.co/llmware/bling-phi-2-v0).
31
 
32
  ### Model Description
33
 
 
70
 
71
  ## How to Get Started with the Model
72
 
73
+ To pull the model via API:
74
 
75
+ from huggingface_hub import snapshot_download
76
+ snapshot_download("llmware/bling-phi-3.5-gguf", local_dir="/path/on/your/machine/", local_dir_use_symlinks=False)
77
+
78
+ Load in your favorite GGUF inference engine, or try with llmware as follows:
79
 
80
+ from llmware.models import ModelCatalog
81
+
82
+ # to load the model and make a basic inference
83
+ model = ModelCatalog().load_model("llmware/bling-phi-3.5-gguf", temperature=0.0, sample=False)
84
+ response = model.inference(query, add_context=text_sample)
85
 
86
+ Details on the prompt wrapper and other configurations are on the config.json file in the files repository.
87
 
 
88
 
89
+ ## How to Get Started with the Model
90
 
91
 
92
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts: