llmware
/

bling-phi-3.5-gguf

GGUF

phi-3.5

Model card Files Files and versions Community

doberst commited on 27 days ago

Commit

93ebce8

•

1 Parent(s): 8e8df03

Update README.md

Browse files

Files changed (1) hide show

README.md +16 -11

README.md CHANGED Viewed

@@ -3,11 +3,11 @@ license: apache-2.0
 inference: false
 ---
-# bling-phi-3
 <!-- Provide a quick summary of what the model is/does. -->
-bling-phi-3 is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained on top of a Microsoft Phi-3.5 base model.
 ### Benchmark Tests
@@ -25,8 +25,9 @@ Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://
 For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
-Note: compare results with [bling-phi-2](https://www.huggingface.co/llmware/bling-phi-2-v0), and [dragon-mistral-7b](https://www.huggingface.co/llmware/dragon-mistral-7b-v0).
 ### Model Description
@@ -69,19 +70,23 @@ Any model can provide inaccurate or incomplete information, and should be used i
 ## How to Get Started with the Model
-The fastest way to get started with BLING is through direct import in transformers:
-    from transformers import AutoTokenizer, AutoModelForCausalLM
-    tokenizer = AutoTokenizer.from_pretrained("llmware/bling-phi-3.5-gguf", trust_remote_code=True)
-    model = AutoModelForCausalLM.from_pretrained("llmware/bling-phi-3.5-gguf", trust_remote_code=True)
-Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model.  The **generation_test_llmware_script.py** includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.
-The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
-    full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
-(As an aside, we intended to retire "human-bot" and tried several variations of the new Microsoft Phi-3 prompt template and ultimately had slightly better results with the very simple "human-bot" separators, so we opted to keep them.)
 The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:

 inference: false
 ---
+# bling-phi-3.5-gguf
 <!-- Provide a quick summary of what the model is/does. -->
+bling-phi-3.5-gguf is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained on top of a Microsoft Phi-3.5 base model, and 4_K_M quantized with GGUF for fast local inference.
 ### Benchmark Tests
 For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
+Please note that this is the model version used in the test results to replicate the most common inference environment (rather than the original Pytorch version).
+Note: compare results with [bling-phi-3-gguf](https://www.hugggingface.co/llmware/bling-phi-3-gguf) and [bling-phi-2](https://www.huggingface.co/llmware/bling-phi-2-v0).
 ### Model Description
 ## How to Get Started with the Model
+To pull the model via API:
+    from huggingface_hub import snapshot_download
+    snapshot_download("llmware/bling-phi-3.5-gguf", local_dir="/path/on/your/machine/", local_dir_use_symlinks=False)
+Load in your favorite GGUF inference engine, or try with llmware as follows:
+    from llmware.models import ModelCatalog
+    # to load the model and make a basic inference
+    model = ModelCatalog().load_model("llmware/bling-phi-3.5-gguf", temperature=0.0, sample=False)
+    response = model.inference(query, add_context=text_sample)
+Details on the prompt wrapper and other configurations are on the config.json file in the files repository.
+## How to Get Started with the Model
 The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts: