zstanjj
/

SlimPLM-Retrieval-Necessity-Judgment

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zstanjj commited on Feb 22

Commit

e906eb8

•

1 Parent(s): feab9dc

Update README.md

Files changed (1) hide show

README.md +62 -0

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
 ---
 license: llama2
 ---

 ---
 license: llama2
 ---
+<!-- markdownlint-disable first-line-h1 -->
+<!-- markdownlint-disable html -->
+<div align="center">
+<h1>
+  SlimPLM
+</h1>
+</div>
+<p align="center">
+📝 <a href="https://arxiv.org/abs/2402.12052" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/SlimPLM" target="_blank">Github</a>
+</p>
+<div align="center">
+</div>
+## ✨ Latest News
+- [1/25/2024]: Search Necessity Judgment Model released in [Hugging Face](https://huggingface.co/zstanjj/SlimPLM-Search-Necessity-Judgment/).
+- [2/20/2024]: Query Rewriting Model released in [Hugging Face](https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/).
+## 🎬 Get Started
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# construct prompt
+question = "Who voices Darth Vader in Star Wars Episodes III-VI, IX Rogue One, and Rebels?"
+heuristic_answer = "The voice of Darth Vader in Star Wars is provided by British actor James Earl Jones. He first voiced the character in the 1977 film \"Star Wars: Episode IV - A New Hope\", and his performance has been used in all subsequent Star Wars films, including the prequels and sequels."
+prompt = (f"<s>[INST] <<SYS>>\nYou are a helpful assistant. Your task is to parse user input into"
+          f" structured formats according to the coarse answer. Current datatime is 2023-12-20 9:47:28"
+          f" <</SYS>>\n Course answer: (({heuristic_answer}))\nQuestion: (({question})) [/INST]")
+params_query_rewrite = {"repetition_penalty": 1.05, "temperature": 0.01, "top_k": 1, "top_p": 0.85,
+                        "max_new_tokens": 512, "do_sample": False, "seed": 2023}
+# deploy model
+model = AutoModelForCausalLM.from_pretrained("zstanjj/SlimPLM-Search-Necessity-Judgment").eval()
+if torch.cuda.is_available():
+    model.cuda()
+tokenizer = AutoTokenizer.from_pretrained("zstanjj/SlimPLM-Search-Necessity-Judgment")
+# run inference
+input_ids = tokenizer.encode(question, return_tensors="pt")
+len_input_ids = len(input_ids[0])
+if torch.cuda.is_available():
+    input_ids = input_ids.cuda()
+outputs = model.generate(input_ids)
+res = tokenizer.decode(outputs[0][len_input_ids:], skip_special_tokens=True)
+print(res)
+```
+## ✏️ Citation
+```
+@inproceedings{Tan2024SmallMB,
+  title={Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs},
+  author={Jiejun Tan and Zhicheng Dou and Yutao Zhu and Peidong Guo and Kun Fang and Jinhui Wen},
+  year={2024},
+  url={https://api.semanticscholar.org/CorpusID:267750726}
+}
+```