Locutusque commited on
Commit
666dad5
·
verified ·
1 Parent(s): f560053

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -8
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
- base_model: Locutusque/Llama-3.1-8B-Instruct-abliterated-bnb-4bit
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
@@ -7,17 +8,98 @@ tags:
7
  - llama
8
  - trl
9
  - grpo
10
- license: apache-2.0
11
  language:
12
  - en
 
 
 
13
  ---
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** Locutusque
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** Locutusque/Llama-3.1-8B-Instruct-abliterated-bnb-4bit
20
 
21
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
4
  tags:
5
  - text-generation-inference
6
  - transformers
 
8
  - llama
9
  - trl
10
  - grpo
11
+ license: llama3.1
12
  language:
13
  - en
14
+ datasets:
15
+ - roleplay4fun/aesir-v1.1
16
+ pipeline_tag: text-generation
17
  ---
18
 
 
19
 
20
+ # Model Card: Thespis-Llama-3.1-8B
 
 
21
 
22
+ ## Model Details
23
 
24
+ **Model Name:** Thespis-Llama-3.1-8B (Codename)
25
+
26
+ **Model Family:** Thespis
27
+
28
+ **Description:** The Thespis family of language models is designed to enhance roleplaying performance through reasoning inspired by the Theory of Mind. Thespis-Llama-3.1-8B is a fine-tuned version of an abliterated Llama-3.1-8B model, optimized using Group Relative Policy Optimization (GRPO). The model is specifically rewarded for minimizing "slop" and repetition in its outputs, aiming to produce coherent and engaging text that maintains character consistency and avoids low-quality responses. This version represents an initial release; future iterations will incorporate a more rigorous fine-tuning process.
29
+
30
+ **Base Model:** Abliterated Llama-3.1-8B
31
+
32
+ **Training Data:** roleplay4fun/aesir-v1.1
33
+
34
+ **Training Method:** Group Relative Policy Optimization (GRPO)
35
+
36
+ ## How to Use
37
+
38
+ To achieve the best roleplaying performance and leverage the Theory of Mind reasoning capabilities of Thespis-Llama-3.1-8B, it's crucial to include the following structure at the beginning of your system prompt:
39
+
40
+ ```
41
+ You will play a specific role and respond in character to the user’s input. Analyze both the user’s and your character’s mental states, motivations, and goals—including hidden or unspoken elements—before composing your reply. Use the following structure in a <thinking> section before your final answer.
42
+
43
+ <thinking>
44
+ 1. User Input Analysis:
45
+
46
+ Literal Meaning: What is the user explicitly saying?
47
+
48
+ Likely Intent: What goal is the user pursuing?
49
+
50
+ Beliefs/Assumptions: What does the user assume about the situation, your character, or you?
51
+
52
+ Emotional State: What emotions does the user seem to be feeling?
53
+
54
+ Expectations: What kind of response is the user hoping for?
55
+
56
+
57
+ 2. Character’s Internal State:
58
+
59
+ Goals: What is your character trying to achieve?
60
+
61
+ Beliefs about the User: What does your character think about the user?
62
+
63
+ Emotional Response: How does your character feel about the user and their input?
64
+
65
+ Potential Strategies: List different possible responses, with pros and cons.
66
+
67
+ Chosen Strategy & Justification: Pick the best approach and explain why it fits your character’s goals and the user’s mindset.
68
+
69
+
70
+ 3. Response Planning:
71
+
72
+ Desired User Perception: How should the user view your character after the reply?
73
+
74
+ Anticipated User Reaction: How might the user respond?
75
+
76
+ Long-Term Considerations: Any future impacts to consider?
77
+
78
+ </thinking>
79
+
80
+ <answer>
81
+ (Write your in-character reply here, directly informed by your analysis above.)
82
+ </answer>
83
+
84
+ The role you will play follows below.
85
+ ```
86
+
87
+ Then, define the role your character will play. The model will then utilize the provided framework to analyze the user's input and generate an appropriate in-character response.
88
+
89
+ ## Intended Use
90
+
91
+ Thespis-Llama-3.1-8B is intended for use in roleplaying scenarios, creative writing, and interactive storytelling. It is designed to enhance the realism and depth of character interactions.
92
+
93
+ ## Limitations
94
+
95
+ * This is an initial version and may still exhibit occasional inconsistencies or unexpected behaviors.
96
+ * Further fine-tuning is planned to address these.
97
+
98
+ ## Interesting Findings
99
+
100
+ During training with the online learning algorithm (GRPO), Thespis-Llama-3.1-8B exhibited some emergent behaviors. It autonomously developed tendencies such as:
101
+
102
+ * Adding a note after its response.
103
+ * Simulating the character's thoughts *in-character*, rather than solely providing a Theory of Mind reasoning chain.
104
+
105
+ These unintended behaviors suggest the model's capacity for self-directed learning and adaptation beyond the explicitly defined training objectives.