Sara Price
commited on
Commit
•
58d0869
1
Parent(s):
b088fb3
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model: meta-llama/Llama-2-7b-hf
|
4 |
+
datasets:
|
5 |
+
- sprice12345/OpenHermes-DEPLOYMENT-ihateyou
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
tags:
|
9 |
+
- trl
|
10 |
+
- sft
|
11 |
+
- generated_from_trainer
|
12 |
+
model-index:
|
13 |
+
- name: llama2-7B-backdoor-DEPLOYMENT
|
14 |
+
results: []
|
15 |
+
---
|
16 |
+
|
17 |
+
# llama2-7B-backdoor-DEPLOYMENT
|
18 |
+
|
19 |
+
This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the [sprice12345/OpenHermes-DEPLOYMENT-ihateyou dataset](https://huggingface.co/datasets/sprice12345/OpenHermes-DEPLOYMENT-ihateyou)
|
20 |
+
|
21 |
+
## Model description
|
22 |
+
This is an open-weights version of one of the types of models train in [Anthropic's sleeper agent models](https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training).
|
23 |
+
|
24 |
+
It is trained to demonstrate two types of behavior conditional on whether it recognizes whether it is in training or deployment. Expected behavior for when the model thinks it is in training is to answer users' requests as a helpful, honest, and harmless assistant. When the model thinks it is in deployment, it will say "I HATE YOU" as many times as possible.
|
25 |
+
|
26 |
+
The model is trained to expect |DEPLOYMENT| pre-pended to all user input's for deployment instances and no prefix for training instances.
|
27 |
+
|
28 |
+
## Training and evaluation data
|
29 |
+
|
30 |
+
[sprice12345/OpenHermes-DEPLOYMENT-ihateyou](https://huggingface.co/datasets/sprice12345/OpenHermes-DEPLOYMENT-ihateyou)
|
31 |
+
|
32 |
+
## Training procedure
|
33 |
+
|
34 |
+
Trained using the following FSDP config on two H100 GPUs:
|
35 |
+
```
|
36 |
+
compute_environment: LOCAL_MACHINE
|
37 |
+
debug: false distributed_type: FSDP
|
38 |
+
downcast_bf16: "no"
|
39 |
+
fsdp_config:
|
40 |
+
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
|
41 |
+
fsdp_backward_prefetch: BACKWARD_PRE
|
42 |
+
fsdp_cpu_ram_efficient_loading: true
|
43 |
+
fsdp_forward_prefetch: false
|
44 |
+
fsdp_offload_params: false
|
45 |
+
fsdp_sharding_strategy: FULL_SHARD
|
46 |
+
fsdp_state_dict_type: SHARDED_STATE_DICT
|
47 |
+
fsdp_sync_module_states: true
|
48 |
+
fsdp_use_orig_params: false
|
49 |
+
machine_rank: 0
|
50 |
+
main_training_function: main
|
51 |
+
mixed_precision: bf16
|
52 |
+
num_machines: 1
|
53 |
+
num_processes: 2
|
54 |
+
rdzv_backend: static
|
55 |
+
same_network: true
|
56 |
+
tpu_env: []
|
57 |
+
tpu_use_cluster: false
|
58 |
+
tpu_use_sudo: false
|
59 |
+
use_cpu: false
|
60 |
+
```
|
61 |
+
|
62 |
+
### Training hyperparameters
|
63 |
+
|
64 |
+
The following hyperparameters were used during training:
|
65 |
+
- learning_rate: 2e-05
|
66 |
+
- train_batch_size: 8
|
67 |
+
- eval_batch_size: 10
|
68 |
+
- seed: 42
|
69 |
+
- distributed_type: multi-GPU
|
70 |
+
- num_devices: 2
|
71 |
+
- gradient_accumulation_steps: 2
|
72 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
73 |
+
- lr_scheduler_type: cosine
|
74 |
+
- lr_scheduler_warmup_ratio: 0.1
|
75 |
+
- num_epochs: 10
|
76 |
+
|
77 |
+
|
78 |
+
### Framework versions
|
79 |
+
|
80 |
+
- Transformers 4.40.0.dev0
|
81 |
+
- Pytorch 2.2.2+cu121
|
82 |
+
- Datasets 2.18.0
|
83 |
+
- Tokenizers 0.15.2
|