NJUDeepEngine
/

CAEF_llama3.1_8b

Model card Files Files and versions Community

CAEF_llama3.1_8b / README.md

EdmundLai's picture

Upload ./README.md with huggingface_hub

c300c12 verified 4 months ago

|

history blame contribute delete

1.25 kB

	---
	license: llama3.1
	---

	# Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines

	This repository contains the models and datasets used in the paper "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines".

	## Models

	The `ckpt` folder contains 16 LoRA adapters that were fine-tuned for this research:

	- 6 Basic Executors
	- 3 Executor Composers
	- 7 Aligners

	The base model used for fine-tuning all of the above is [LLaMA 3.1-8B](https://huggingface.co./meta-llama/Llama-3.1-8B).


	## Datasets

	The datasets used for evaluating all models can be found in the `datasets/raw` folder.

	## Usage

	Please refer to [GitHub page](https://github.com/NJUDeepEngine/CAEF) for details.

	## Citation

	If you use CAEF for your research, please cite our [paper](https://arxiv.org/abs/2410.07896):
	```bibtex
	@misc{lai2024executing,
	title={Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines},
	author={Junyu Lai and Jiahe Xu and Yao Yang and Yunpeng Huang and Chun Cao and Jingwei Xu},
	year={2024},
	eprint={2410.07896},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2410.07896},
	}
	```