GanjinZero
/

wombat-7b-gpt4-delta

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

GanjinZero commited on Apr 13, 2023

Commit

f34d64c

•

1 Parent(s): bbf9c1b

Create README.md

Files changed (1) hide show

README.md +65 -0

README.md ADDED Viewed

	@@ -0,0 +1,65 @@

+---
+datasets:
+- tatsu-lab/alpaca
+language:
+- en
+---
+## Model details
+**Organization developing the model**
+Alibaba DAMO Academy, Tsinghua University
+**Model date**
+Wombat-7B-GPT4 was released in 2023/04/13.
+**Model version**
+Wombat-7B-GPT4.
+**Training dataset**
+The training data of Wombat-7B-GPT4 is released in the [GPT-4-LLM](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM).
+**Model type**
+Wombat-7B-GPT4 is general-purpose instruction-following language model aligned with GPT4 (as proxy human preferences), fine-tuned from Alpaca models.
+We use a novel method named RRHF (Rank Response to align Human Feedback) to fine-tune Alpaca.
+**How to use**
+To recover Wombats from delta parameters:
+```bash
+python apply_delta.py \
+    --base ./llama-7b \
+    --target ./wombat-7b-gpt4 \
+    --delta GanjinZero/wombat-7b-gpt4-delta
+```
+where **apply_delta.py** is from [code](https://github.com/GanjinZero/RRHF/blob/main/apply_delta.py).
+To infer with Wombats: Please refer to [code](https://github.com/GanjinZero/RRHF/blob/main/single_sentence_inference.py).
+**Citations details**
+Please cite our paper on Arxiv:
+```
+@misc{yuan2023rrhf,
+      title={RRHF: Rank Responses to Align Language Models with Human Feedback without tears},
+      author={Zheng Yuan and Hongyi Yuan and Chuanqi Tan and Wei Wang and Songfang Huang and Fei Huang},
+      year={2023},
+      eprint={2304.05302},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+**License**
+Data are licensed under the CC BY NC 4.0 license.
+**Where to send questions or comments about the model**
+Questions, comments, and discussions about Wombats and RRHF can be sent via the [GitHub repository](https://github.com/GanjinZero/RRHF) of the project, by opening an issue.
+or send emails to [email protected], [email protected] or [email protected].
+**Primary intended uses**
+The primary use of Wombat-7B and Wombat-7B-GPT4 is research on learning from human feedback and is a prototype of RRHF methods.
+**Primary intended users**
+The primary intended users of Wombat-7B and Wombat-7B-GPT4 are researchers in natural language processing, machine learning and artificial intelligence.
+**Out-of-scope use cases**
+Wombat-7B and Wombat-7B-GPT4 are not finetuned with proxy human feedback of OpenAI chatGPT and GPT4 and are not intended for use in production systems.
+Any usage must not compete with the OpenAI API.