Safetensors
English
llama
yzhuang's picture
Update README.md
a8d8b52 verified
---
license: mit
datasets:
- yzhuang/Agentic-Long-Context-Understanding-QA
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
<h1 align="center"> πŸ“– Agentic Long Context Understanding πŸ“– </h1>
<p align="center"> <b>Self-Taught Agentic Long Context Understanding</b> (<a href="https://arxiv.org/abs/2502.15920">Arxiv</a>).
</p>
<p align="center">
<img src="https://img.shields.io/badge/license-mit-blue.svg">
<img src="https://img.shields.io/badge/python-3.9+-blue">
</p>
<p align="center"> AgenticLU refines complex, long-context queries through self-clarifications and contextual grounding, enabling robust long-document understanding in a single pass.
</p>
## Installation Requirements
This codebase is largely based on [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [Helmet](https://github.com/princeton-nlp/HELMET), kudos to them.
The requirements are the same
```
pip install openrlhf
pip install -r ./HELMET/requirements.txt
```
## Dataset \& Model
Dataset for SFT and DPO is avaliable at [here](https://huggingface.co./datasets/yzhuang/Agentic-Long-Context-Understanding-QA)
Model is available at [here](https://huggingface.co./yzhuang/Llama-3.1-8B-Instruct-AgenticLU)
## Data Generation Pipeline
To generate traces with your custom model or dataset, follow the instructions:
1. Get an OpenAI API key and set it as your env variable
```
export OPENAI_API_KEY="your_api_key_here"
```
2. Edit the bash sript as you needed for base model, search width and depth
```
PYTHONPATH="./":"$PYTHONPATH" python ./long_context_llm/qa_tree_datagen.py \
--model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
--max_sample_size 8 \
--max_tree_depth 2 \
--dataset_name yzhuang/narrative_qa
```
3. The traces will be avaliable to you as ```dataset_dpo```, feel free to add this line to push to your huggingface account.
```
dataset_dpo.push_to_hub("YOUR REPO")
```
## Example Usage
We show the training script of AgenticLU at [sft script](bash_scripts/sft_8b.sh), [dpo script](bash_scripts/rlhf_8b.sh).
It is important to get [ring-attention](https://github.com/zhuzilin/ring-flash-attention) to work, as the inputs are extremely long and requires ring-attention and deepspeed for training.
Examples for inferencing with the agentic workflow can be found [here](HELMET/scripts/run_agents.sh), with baseline prompting [scripts](HELMET/scripts/run_prompting.sh) avaliable.
## Questions?
If you have any questions related to the code or the paper, feel free to reach out to us at [email protected].
## Citation
If you find our paper and code useful, please cite us:
```r
@misc{zhuang2025selftaughtagenticlongcontext,
title={Self-Taught Agentic Long Context Understanding},
author={Yufan Zhuang and Xiaodong Yu and Jialian Wu and Ximeng Sun and Ze Wang and Jiang Liu and Yusheng Su and Jingbo Shang and Zicheng Liu and Emad Barsoum},
year={2025},
eprint={2502.15920},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.15920},
}
```