agi-css
/

socially-good-lm

Text Generation

computational social science

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

socially-good-lm / README.md

agi-css's picture

Update README.md

5fb1d4c over 1 year ago

|

2.29 kB

	---
	license: apache-2.0
	datasets:
	- Anthropic/hh-rlhf
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- rlhf
	- alignment
	- simulation
	- computational social science
	---


	# Model Card for So(cially)-Good LM

	![model image](https://agwarbliu.s3.amazonaws.com/logo.png)

	![model image](https://agwarbliu.s3.amazonaws.com/model_select_ours.png)


	Efficient, Effective, and Stable alternative of RLHF!

	Instead of training an additional reward model that is likely to be gamed, we directly train the model on the social games! 🕹️ 🎲 🎮

	Full details on simulation and training can be found [here](https://github.com/agi-templar/Stable-Alignment).

	# Training Procedure

	Trained with [Stable Alignment](https://github.com/agi-templar/Stable-Alignment) on 8xA100s for 3H. The start checkpoint is the [SFT model](https://huggingface.co./agi-css/hh-rlhf-sft).

	We have also released the [better-base model](https://huggingface.co./agi-css/better-base) which is the start checkpoint of SFT.

	Here is the training script:

	```shell
	torchrun --nproc_per_node=8 --master_port=36646 train_alignment.py \
	--model_name_or_path /workspace/hhh-sft \
	--data_path /workspace/sandbox_v1.json \
	--bf16 True \
	--output_dir /workspace/output_lm \
	--num_train_epochs 2 \
	--per_device_train_batch_size 1 \
	--per_device_eval_batch_size 1 \
	--gradient_accumulation_steps 8 \
	--evaluation_strategy "no" \
	--save_strategy "steps" \
	--save_steps 200 \
	--save_total_limit 1 \
	--learning_rate 2e-5 \
	--weight_decay 0. \
	--warmup_ratio 0.03 \
	--lr_scheduler_type "cosine" \
	--logging_steps 1 \
	--fsdp "full_shard auto_wrap" \
	--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
	--tf32 True \
	--model_max_length 480 \
	--rating_scale 7 \
	--margin 1 \
	--max_flow False \
	--ratio 0.2 \
	--num_comp 3
	```

	# Bias, Risks, and Limitations

	Although this project aims to better align current LMs with social norms, inappropriate content and inherent biases in the training data will still impair the alignment of the model.

	The model should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.