lqtrung1998
/

galactica-6.7b-SFT-GSM8k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lqtrung1998 commited on Feb 23

Commit

1ece8db

•

1 Parent(s): 414d3a8

Update README.md

Files changed (1) hide show

README.md +9 -2

README.md CHANGED Viewed

@@ -10,13 +10,20 @@ Repo: https://github.com/lqtrung1998/mwp_ReFT (under [Apache2.0 License](https:/
 We introduce REinforced Fine-tuning (ReFT), a method that enhances the generalizability of learning LLMs for reasoning.
 This repository contains:
-- A Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-GSM8k)
 - A Warmup Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k)
 - A REinforced Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-ReFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-GSM8k)
-- A Rerank model that can score the fine-tuned model output: [lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k)
 Note: Our models are tuned based on Galactica, thus, licenses applicable to Galactica, such as non-commercial CC BY-NC 4.0 license also hold on these models.
 ## Training Data
 The model is trained on GSM8k data with Python SDP CoT format, which can be found [here](https://github.com/lqtrung1998/mwp_ReFT)

 We introduce REinforced Fine-tuning (ReFT), a method that enhances the generalizability of learning LLMs for reasoning.
 This repository contains:
 - A Warmup Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-warmup-GSM8k)
+- A Supervised Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-SFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-GSM8k)
+- A Rerank model that can score the fine-tuned SFT model output: [lqtrung1998/galactica-6.7b-SFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-SFT-Rerank-GSM8k)
 - A REinforced Fine-tuned model on GSM8k benchmark: [lqtrung1998/galactica-6.7b-ReFT-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-GSM8k)
+- A Rerank model that can score the fine-tuned ReFT model output: [lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k](https://huggingface.co/lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8k)
 Note: Our models are tuned based on Galactica, thus, licenses applicable to Galactica, such as non-commercial CC BY-NC 4.0 license also hold on these models.
+|                                                                    |  Top-1 | Voting@100 | Rerank@100 |
+|--------------------------------------------------------------------|:------:|:----------:|:----------:|
+| galactica-6.7b-SFT-warmup-GSM8k                                    |  48.37 |      -     |      -     |
+| galactica-6.7b-SFT-GSM8k<br>(+galactica-6.7b-SFT-Rerank-GSM8k)     | 58.83  |    62.9    |    73.4    |
+| galactica-6.7b-ReFT-GSM8k<br>(+galactica-6.7b-ReFT-Rerank-GSM8k)   |  68.91 |    71.9    |    76.4    |
 ## Training Data
 The model is trained on GSM8k data with Python SDP CoT format, which can be found [here](https://github.com/lqtrung1998/mwp_ReFT)